performing clustering analysis: Topics by Science.gov

Sample records for performing clustering analysis

A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

PubMed

Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

2016-03-01

In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.
A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

PubMed Central

Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

2016-01-01

In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

PubMed Central

2010-01-01

Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles

PubMed Central

Peterson, Leif E

2002-01-01

CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
Cluster Correspondence Analysis.

PubMed

van de Velden, M; D'Enza, A Iodice; Palumbo, F

2017-03-01

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Sulfur in Cometary Dust

NASA Technical Reports Server (NTRS)

Fomenkova, M. N.

1997-01-01

The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
cluster trials v. 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitchell, John; Castillo, Andrew

2016-09-21

This software contains a set of python modules – input, search, cluster, analysis; these modules read input files containing spatial coordinates and associated attributes which can be used to perform nearest neighbor search (spatial indexing via kdtree), cluster analysis/identification, and calculation of spatial statistics for analysis.
A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.

PubMed

Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve

2015-01-01

Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, P<0.001). CD4 count group below 200 was associated with the all high occurrence rate symptom cluster (Fisher's exact=41, P<0.001). Symptom clusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high-symptom occurrence rates having a higher risk of poorer outcomes. Identification of symptom clusters could provide insights into commonly co-occurring symptoms that should be jointly targeted for management in patients with multiple complaints.
Orbit Clustering Based on Transfer Cost

NASA Technical Reports Server (NTRS)

Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

2013-01-01

We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Ten-year performance of ponderosa pine provenances in the Great Plains of North America

Treesearch

Ralph A. Read

1983-01-01

A cluster and discriminant analysis based on nine of the best plantations, partitioned the seed provenance populations into six geographic clusters according to their consistency of performance in the plantations.The Northcentral Nebraska cluster of three provenances performed consistently well above average in all plantations. These easternmost...
Development of small scale cluster computer for numerical analysis

NASA Astrophysics Data System (ADS)

Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

2017-09-01

In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
Cluster analysis of novel isometric strength measures produces a valid and evidence-based classification structure for wheelchair track racing.

PubMed

Connick, Mark J; Beckman, Emma; Vanlandewijck, Yves; Malone, Laurie A; Blomqvist, Sven; Tweedy, Sean M

2017-11-25

The Para athletics wheelchair-racing classification system employs best practice to ensure that classes comprise athletes whose impairments cause a comparable degree of activity limitation. However, decision-making is largely subjective and scientific evidence which reduces this subjectivity is required. To evaluate whether isometric strength tests were valid for the purposes of classifying wheelchair racers and whether cluster analysis of the strength measures produced a valid classification structure. Thirty-two international level, male wheelchair racers from classes T51-54 completed six isometric strength tests evaluating elbow extensors, shoulder flexors, trunk flexors and forearm pronators and two wheelchair performance tests-Top-Speed (0-15 m) and Top-Speed (absolute). Strength tests significantly correlated with wheelchair performance were included in a cluster analysis and the validity of the resulting clusters was assessed. All six strength tests correlated with performance (r=0.54-0.88). Cluster analysis yielded four clusters with reasonable overall structure (mean silhouette coefficient=0.58) and large intercluster strength differences. Six athletes (19%) were allocated to clusters that did not align with their current class. While the mean wheelchair racing performance of the resulting clusters was unequivocally hierarchical, the mean performance of current classes was not, with no difference between current classes T53 and T54. Cluster analysis of isometric strength tests produced classes comprising athletes who experienced a similar degree of activity limitation. The strength tests reported can provide the basis for a new, more transparent, less subjective wheelchair racing classification system, pending replication of these findings in a larger, representative sample. This paper also provides guidance for development of evidence-based systems in other Para sports. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

PubMed

Stefurak, Tres; Calhoun, Georgia B

2007-01-01

The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.
Neuro- and social-cognitive clustering highlights distinct profiles in adults with anorexia nervosa.

PubMed

Renwick, Beth; Musiat, Peter; Lose, Anna; DeJong, Hannah; Broadbent, Hannah; Kenyon, Martha; Loomes, Rachel; Watson, Charlotte; Ghelani, Shreena; Serpell, Lucy; Richards, Lorna; Johnson-Sabine, Eric; Boughton, Nicky; Treasure, Janet; Schmidt, Ulrike

2015-01-01

This study aimed to explore the neuro- and social-cognitive profile of a consecutive series of adult outpatients with anorexia nervosa (AN) when compared with widely available age and gender matched historical control data. The relationship between performance profiles, clinical characteristics, service utilization, and treatment adherence was also investigated. Consecutively recruited outpatients with a broad diagnosis of AN (restricting subtype AN-R: n = 44, binge-purge subtype AN-BP: n = 33 or Eating Disorder Not Otherwise Specified-AN subtype EDNOS-AN: n = 23) completed a comprehensive set of neurocognitive (set-shifting, central coherence) and social-cognitive measures (Emotional Theory of Mind). Data were subjected to hierarchical cluster analysis and a discriminant function analysis. Three separate, meaningful clusters emerged. Cluster 1 (n = 45) showed overall average to high average neuro- and social- cognitive performance, Cluster 2 (n = 38) showed mixed performance characterized by distinct strengths and weaknesses, and Cluster 3 (n = 17) showed poor overall performance (Autism Spectrum disorder (ASD) like cluster). The three clusters did not differ in terms of eating disorder symptoms, comorbid features or service utilization and treatment adherence. A discriminant function analysis confirmed that the clusters were best characterized by performance in perseveration and set-shifting measures. The findings suggest that considerable neuro- and social-cognitive heterogeneity exists in patients with AN, with a subset showing ASD-like features. The value of this method of profiling in predicting longer term patient outcomes and in guiding development of etiologically targeted treatments remains to be seen. © 2014 Wiley Periodicals, Inc.
Cost/Performance Ratio Achieved by Using a Commodity-Based Cluster

NASA Technical Reports Server (NTRS)

Lopez, Isaac

2001-01-01

Researchers at the NASA Glenn Research Center acquired a commodity cluster based on Intel Corporation processors to compare its performance with a traditional UNIX cluster in the execution of aeropropulsion applications. Since the cost differential of the clusters was significant, a cost/performance ratio was calculated. After executing a propulsion application on both clusters, the researchers demonstrated a 9.4 cost/performance ratio in favor of the Intel-based cluster. These researchers utilize the Aeroshark cluster as one of the primary testbeds for developing NPSS parallel application codes and system software. The Aero-shark cluster provides 64 Intel Pentium II 400-MHz processors, housed in 32 nodes. Recently, APNASA - a code developed by a Government/industry team for the design and analysis of turbomachinery systems was used for a simulation on Glenn's Aeroshark cluster.
Cluster signal-to-noise analysis for evaluation of the information content in an image.

PubMed

Weerawanich, Warangkana; Shimizu, Mayumi; Takeshita, Yohei; Okamura, Kazutoshi; Yoshida, Shoko; Yoshiura, Kazunori

2018-01-01

(1) To develop an observer-free method of analysing image quality related to the observer performance in the detection task and (2) to analyse observer behaviour patterns in the detection of small mass changes in cone-beam CT images. 13 observers detected holes in a Teflon phantom in cone-beam CT images. Using the same images, we developed a new method, cluster signal-to-noise analysis, to detect the holes by applying various cut-off values using ImageJ and reconstructing cluster signal-to-noise curves. We then evaluated the correlation between cluster signal-to-noise analysis and the observer performance test. We measured the background noise in each image to evaluate the relationship with false positive rates (FPRs) of the observers. Correlations between mean FPRs and intra- and interobserver variations were also evaluated. Moreover, we calculated true positive rates (TPRs) and accuracies from background noise and evaluated their correlations with TPRs from observers. Cluster signal-to-noise curves were derived in cluster signal-to-noise analysis. They yield the detection of signals (true holes) related to noise (false holes). This method correlated highly with the observer performance test (R 2 = 0.9296). In noisy images, increasing background noise resulted in higher FPRs and larger intra- and interobserver variations. TPRs and accuracies calculated from background noise had high correlation with actual TPRs from observers; R 2 was 0.9244 and 0.9338, respectively. Cluster signal-to-noise analysis can simulate the detection performance of observers and thus replace the observer performance test in the evaluation of image quality. Erroneous decision-making increased with increasing background noise.
Comparing the performance of biomedical clustering methods.

PubMed

Wiwie, Christian; Baumbach, Jan; Röttger, Richard

2015-11-01

Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.
Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

PubMed

Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

2017-02-01

Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
Associations Among Health Care Workplace Safety, Resident Satisfaction, and Quality of Care in Long-Term Care Facilities.

PubMed

Boakye-Dankwa, Ernest; Teeple, Erin; Gore, Rebecca; Punnett, Laura

2017-11-01

We performed an integrated cross-sectional analysis of relationships between long-term care work environments, employee and resident satisfaction, and quality of patient care. Facility-level data came from a network of 203 skilled nursing facilities in 13 states in the eastern United States owned or managed by one company. K-means cluster analysis was applied to investigate clustered associations between safe resident handling program (SRHP) performance, resident care outcomes, employee satisfaction, rates of workers' compensation claims, and resident satisfaction. Facilities in the better-performing cluster were found to have better patient care outcomes and resident satisfaction; lower rates of workers compensation claims; better SRHP performance; higher employee retention; and greater worker job satisfaction and engagement. The observed clustered relationships support the utility of integrated performance assessment in long-term care facilities.
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

PubMed

Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

2012-10-01

The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

A hybrid monkey search algorithm for clustering analysis.

PubMed

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Conveyor Performance based on Motor DC 12 Volt Eg-530ad-2f using K-Means Clustering

NASA Astrophysics Data System (ADS)

Arifin, Zaenal; Artini, Sri DP; Much Ibnu Subroto, Imam

2017-04-01

To produce goods in industry, a controlled tool to improve production is required. Separation process has become a part of production process. Separation process is carried out based on certain criteria to get optimum result. By knowing the characteristics performance of a controlled tools in separation process the optimum results is also possible to be obtained. Clustering analysis is popular method for clustering data into smaller segments. Clustering analysis is useful to divide a group of object into a k-group in which the member value of the group is homogeny or similar. Similarity in the group is set based on certain criteria. The work in this paper based on K-Means method to conduct clustering of loading in the performance of a conveyor driven by a dc motor 12 volt eg-530-2f. This technique gives a complete clustering data for a prototype of conveyor driven by dc motor to separate goods in term of height. The parameters involved are voltage, current, time of travelling. These parameters give two clusters namely optimal cluster with center of cluster 10.50 volt, 0.3 Ampere, 10.58 second, and unoptimal cluster with center of cluster 10.88 volt, 0.28 Ampere and 40.43 second.
A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

PubMed

Murray, Nicholas P; Hunfalvay, Melissa

2017-02-01

Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.

PubMed

Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

2017-09-27

A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis

PubMed Central

Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

2017-01-01

A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation. PMID:28953231
Goal Profiles, Mental Toughness and its Influence on Performance Outcomes among Wushu Athletes

PubMed Central

Roy, Jolly

2007-01-01

This study examined the association between goal orientations and mental toughness and its influence on performance outcomes in competition. Wushu athletes (n = 40) competing in Intervarsity championships in Malaysia completed Task and Ego Orientations in Sport Questionnaire (TEOSQ) and Psychological Performance Inventory (PPI). Using cluster analysis techniques including hierarchical methods and the non-hierarchical method (k-means cluster) to examine goal profiles, a three cluster solution emerged viz. cluster 1 - high task and moderate ego (HT/ME), cluster 2 - moderate task and low ego (MT/LE) and, cluster 3 - moderate task and moderate ego (MT/ME). Analysis of the fundamental areas of mental toughness based on goal profiles revealed that athletes in cluster 1 scored significantly higher on negative energy control than athletes in cluster 2. Further, athletes in cluster 1 also scored significantly higher on positive energy control than athletes in cluster 3. Chi-square (χ2) test revealed no significant differences among athletes with different goal profiles on performance outcomes in the competition. However, significant differences were observed between athletes (medallist and non medallist) in self- confidence (p = 0.001) and negative energy control (p = 0.042). Medallist’s scored significantly higher on self-confidence (mean = 21.82 ± 2.72) and negative energy control (mean = 19.59 ± 2.32) than the non-medallists (self confidence-mean = 18.76 ± 2.49; negative energy control mean = 18.14 ± 1.91). Key points Mental toughness can be influenced by certain goal profile combination. Athletes with successful outcomes in performance (medallist) displayed greater mental toughness. PMID:24198700
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

PubMed Central

Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters. PMID:27391786
Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

PubMed

Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

2016-11-01

To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.
Validity analysis on merged and averaged data using within and between analysis: focus on effect of qualitative social capital on self-rated health.

PubMed

Shin, Sang Soo; Shin, Young-Jeon

2016-01-01

With an increasing number of studies highlighting regional social capital (SC) as a determinant of health, many studies are using multi-level analysis with merged and averaged scores of community residents' survey responses calculated from community SC data. Sufficient examination is required to validate if the merged and averaged data can represent the community. Therefore, this study analyzes the validity of the selected indicators and their applicability in multi-level analysis. Within and between analysis (WABA) was performed after creating community variables using merged and averaged data of community residents' responses from the 2013 Community Health Survey in Korea, using subjective self-rated health assessment as a dependent variable. Further analysis was performed following the model suggested by WABA result. Both E-test results (1) and WABA results (2) revealed that single-level analysis needs to be performed using qualitative SC variable with cluster mean centering. Through single-level multivariate regression analysis, qualitative SC with cluster mean centering showed positive effect on self-rated health (0.054, p<0.001), although there was no substantial difference in comparison to analysis using SC variables without cluster mean centering or multi-level analysis. As modification in qualitative SC was larger within the community than between communities, we validate that relational analysis of individual self-rated health can be performed within the group, using cluster mean centering. Other tests besides the WABA can be performed in the future to confirm the validity of using community variables and their applicability in multi-level analysis.
Clustering stocks using partial correlation coefficients

NASA Astrophysics Data System (ADS)

Jung, Sean S.; Chang, Woojin

2016-11-01

A partial correlation analysis is performed on the Korean stock market (KOSPI). The difference between Pearson correlation and the partial correlation is analyzed and it is found that when conditioned on the market return, Pearson correlation coefficients are generally greater than those of the partial correlation, which implies that the market return tends to drive up the correlation between stock returns. A clustering analysis is then performed to study the market structure given by the partial correlation analysis and the members of the clusters are compared with the Global Industry Classification Standard (GICS). The initial hypothesis is that the firms in the same GICS sector are clustered together since they are in a similar business and environment. However, the result is inconsistent with the hypothesis and most clusters are a mix of multiple sectors suggesting that the traditional approach of using sectors to determine the proximity between stocks may not be sufficient enough to diversify a portfolio.
A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

PubMed

Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

2016-02-01

Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.
An effective fuzzy kernel clustering analysis approach for gene expression data.

PubMed

Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

2015-01-01

Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

ERIC Educational Resources Information Center

Carter, Randy L.; And Others

1989-01-01

The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
Connecting Different Data Sources to Assess the Interconnections between Biosecurity, Health, Welfare, and Performance in Commercial Pig Farms in Great Britain.

PubMed

Pandolfi, Fanny; Edwards, Sandra A; Maes, Dominiek; Kyriazakis, Ilias

2018-01-01

This study aimed to provide an overview of the interconnections between biosecurity, health, welfare, and performance in commercial pig farms in Great Britain. We collected on-farm data about the level of biosecurity and animal performance in 40 fattening pig farms and 28 breeding pig farms between 2015 and 2016. We identified interconnections between these data, slaughterhouse health indicators, and welfare indicator records in fattening pig farms. After achieving the connections between databases, a secondary data analysis was performed to assess the interconnections between biosecurity, health, welfare, and performance using correlation analysis, principal component analysis, and hierarchical clustering. Although we could connect the different data sources the final sample size was limited, suggesting room for improvement in database connection to conduct secondary data analyses. The farm biosecurity scores ranged from 40 to 90 out of 100, with internal biosecurity scores being lower than external biosecurity scores. Our analysis suggested several interconnections between health, welfare, and performance. The initial correlation analysis showed that the prevalence of lameness and severe tail lesions was associated with the prevalence of enzootic pneumonia-like lesions and pyaemia, and the prevalence of severe body marks was associated with several disease indicators, including peritonitis and milk spots ( r > 0.3; P < 0.05). Higher average daily weight gain (ADG) was associated with lower prevalence of pleurisy ( r > 0.3; P < 0.05), but no connection was identified between mortality and health indicators. A subsequent cluster analysis enabled identification of patterns which considered concurrently indicators of health, welfare, and performance. Farms from cluster 1 had lower biosecurity scores, lower ADG, and higher prevalence of several disease and welfare indicators. Farms from cluster 2 had higher biosecurity scores than cluster 1, but a higher prevalence of pigs requiring hospitalization and lameness which confirmed the correlation between biosecurity and the prevalence of pigs requiring hospitalization ( r > 0.3; P < 0.05). Farms from cluster 3 had higher biosecurity, higher ADG, and lower prevalence for some disease and welfare indicators. The study suggests a smaller impact of biosecurity on issues such as mortality, prevalence of lameness, and pig requiring hospitalization. The correlations and the identified clusters suggested the importance of animal welfare for the pig industry.
Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

PubMed

Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B

2016-01-01

Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.
Gathering Real World Evidence with Cluster Analysis for Clinical Decision Support.

PubMed

Xia, Eryu; Liu, Haifeng; Li, Jing; Mei, Jing; Li, Xuejun; Xu, Enliang; Li, Xiang; Hu, Gang; Xie, Guotong; Xu, Meilin

2017-01-01

Clinical decision support systems are information technology systems that assist clinical decision-making tasks, which have been shown to enhance clinical performance. Cluster analysis, which groups similar patients together, aims to separate patient cases into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. Useful as it is, the application of cluster analysis in clinical decision support systems is less reported. Here, we describe the usage of cluster analysis in clinical decision support systems, by first dividing patient cases into similar groups and then providing diagnosis or treatment suggestions based on the group profiles. This integration provides data for clinical decisions and compiles a wide range of clinical practices to inform the performance of individual clinicians. We also include an example usage of the system under the scenario of blood lipid management in type 2 diabetes. These efforts represent a step toward promoting patient-centered care and enabling precision medicine.
The Social Life of Learning Analytics: Cluster Analysis and the 'Performance' of Algorithmic Education

ERIC Educational Resources Information Center

Perrotta, Carlo; Williamson, Ben

2018-01-01

This paper argues that methods used for the classification and measurement of online education are not neutral and objective, but involved in the creation of the educational realities they claim to measure. In particular, the paper draws on material semiotics to examine cluster analysis as a 'performative device' that, to a significant extent,…
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

PubMed

Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Research on retailer data clustering algorithm based on Spark

NASA Astrophysics Data System (ADS)

Huang, Qiuman; Zhou, Feng

2017-03-01

Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Using Cluster Bootstrapping to Analyze Nested Data With a Few Clusters.

PubMed

Huang, Francis L

2018-04-01

Cluster randomized trials involving participants nested within intact treatment and control groups are commonly performed in various educational, psychological, and biomedical studies. However, recruiting and retaining intact groups present various practical, financial, and logistical challenges to evaluators and often, cluster randomized trials are performed with a low number of clusters (~20 groups). Although multilevel models are often used to analyze nested data, researchers may be concerned of potentially biased results due to having only a few groups under study. Cluster bootstrapping has been suggested as an alternative procedure when analyzing clustered data though it has seen very little use in educational and psychological studies. Using a Monte Carlo simulation that varied the number of clusters, average cluster size, and intraclass correlations, we compared standard errors using cluster bootstrapping with those derived using ordinary least squares regression and multilevel models. Results indicate that cluster bootstrapping, though more computationally demanding, can be used as an alternative procedure for the analysis of clustered data when treatment effects at the group level are of primary interest. Supplementary material showing how to perform cluster bootstrapped regressions using R is also provided.

Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

PubMed

Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

2017-12-01

To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

NASA Astrophysics Data System (ADS)

Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

2018-04-01

Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
Transcriptional and Chromatin Dynamics of Muscle Regeneration After Severe Trauma

DTIC Science & Technology

2016-10-12

performed pathway analysis of the time-clustered RNA- Seq data16 and showed an initial burst of pro-inflammatory and immune-response transcripts in the...143 showed dynamic behavior (See Methods) and analysis of the dynamic miRNAs reinforced many of the results observed from the RNA-Seq datasets...excellent agreement was viewed. Hierarchical clustering of the datasets through time revealed 5 clusters, and gene ontology (GO) analysis of the
Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

PubMed

Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

2015-02-01

Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.
Joint Analysis of X-Ray and Sunyaev-Zel'Dovich Observations of Galaxy Clusters Using an Analytic Model of the Intracluster Medium

NASA Technical Reports Server (NTRS)

Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Lamb, James W.; Hawkins, David; Hennessy, Ryan;

2012-01-01

We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.

Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

PubMed Central

Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

2016-01-01

Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796
Changing the paradigm: messages for hand hygiene education and audit from cluster analysis.

PubMed

Gould, D J; Navaie, D; Purssell, E; Drey, N S; Creedon, S

2018-04-01

Hand hygiene is considered to be the foremost infection prevention measure. How healthcare workers accept and make sense of the hand hygiene message is likely to contribute to the success and sustainability of initiatives to improve performance, which is often poor. A survey of nurses in critical care units in three National Health Service trusts in England was undertaken to explore opinions about hand hygiene, use of alcohol hand rubs, audit with performance feedback, and other key hand-hygiene-related issues. Data were analysed descriptively and subjected to cluster analysis. Three main clusters of opinion were visualized, each forming a significant group: positive attitudes, pragmatism and scepticism. A smaller cluster suggested possible guilt about ability to perform hand hygiene. Cluster analysis identified previously unsuspected constellations of beliefs about hand hygiene that offer a plausible explanation for behaviour. Healthcare workers might respond to education and audit differently according to these beliefs. Those holding predominantly positive opinions might comply with hand hygiene policy and perform well as infection prevention link nurses and champions. Those holding pragmatic attitudes are likely to respond favourably to the need for professional behaviour and need to protect themselves from infection. Greater persuasion may be needed to encourage those who are sceptical about the importance of hand hygiene to comply with guidelines. Interventions to increase compliance should be sufficiently broad in scope to tackle different beliefs. Alternatively, cluster analysis of hand hygiene beliefs could be used to identify the most effective educational and monitoring strategies for a particular clinical setting. Copyright © 2017 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
[Raman spectroscopy fluorescence background correction and its application in clustering analysis of medicines].

PubMed

Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei

2010-08-01

During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.
A ground truth based comparative study on clustering of gene expression data.

PubMed

Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

2008-05-01

Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
Comparisons of non-Gaussian statistical models in DNA methylation analysis.

PubMed

Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-06-16

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

PubMed Central

Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-01-01

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

PubMed

Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

2017-12-01

Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.
Performance analysis of clustering techniques over microarray data: A case study

NASA Astrophysics Data System (ADS)

Dash, Rasmita; Misra, Bijan Bihari

2018-03-01

Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

NASA Astrophysics Data System (ADS)

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-04-01

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

PubMed

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-03-13

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-07-01

In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis.

PubMed

El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane

2018-01-01

Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.
Gas and galaxies in filaments between clusters of galaxies. The study of A399-A401

NASA Astrophysics Data System (ADS)

Bonjean, V.; Aghanim, N.; Salomé, P.; Douspis, M.; Beelen, A.

2018-01-01

We have performed a multi-wavelength analysis of two galaxy cluster systems selected with the thermal Sunyaev-Zel'dovich (tSZ) effect and composed of cluster pairs and an inter-cluster filament. We have focused on one pair of particular interest: A399-A401 at redshift z 0.073 seperated by 3 Mpc. We have also performed the first analysis of one lower-significance newly associated pair: A21-PSZ2 G114.09-34.34 at z 0.094, separated by 4.2 Mpc. We have characterised the intra-cluster gas using the tSZ signal from Planck and, when possible, the galaxy optical and infrared (IR) properties based on two photometric redshift catalogues: 2MPZ and WISExSCOS. From the tSZ data, we measured the gas pressure in the clusters and in the inter-cluster filaments. In the case of A399-A401, the results are in perfect agreement with previous studies and, using the temperature measured from the X-rays, we further estimate the gas density in the filament and find n0 = (4.3 ± 0.7) × 10-4 cm-3. The optical and IR colour-colour and colour-magnitude analyses of the galaxies selected in the cluster system, together with their star formation rate, show no segregation between galaxy populations, both in the clusters and in the filament of A399-A401. Galaxies are all passive, early type, and red and dead. The gas and galaxy properties of this system suggest that the whole system formed at the same time and corresponds to a pre-merger, with a cosmic filament gas heated by the collapse. For the other cluster system, the tSZ analysis was performed and the pressure in the clusters and in the inter-cluster filament was constrained. However, the limited or nonexistent optical and IR data prevent us from concluding on the presence of an actual cosmic filament or from proposing a scenario.
Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

NASA Astrophysics Data System (ADS)

Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

2015-03-01

Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

PubMed Central

Diaz-Ordaz, Karla; Bartlett, Jonathan W

2016-01-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885

Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

PubMed

Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

2017-06-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

PubMed

Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

2017-01-01

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Atlas-guided cluster analysis of large tractography datasets.

PubMed

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
Clustering of Health Behaviors and Cardiorespiratory Fitness Among U.S. Adolescents.

PubMed

Hartz, Jacob; Yingling, Leah; Ayers, Colby; Adu-Brimpong, Joel; Rivers, Joshua; Ahuja, Chaarushi; Powell-Wiley, Tiffany M

2018-05-01

Decreased cardiorespiratory fitness (CRF) is associated with an increased risk of cardiovascular disease. However, little is known how the interaction of diet, physical activity (PA), and sedentary time (ST) affects CRF among adolescents. By using a nationally representative sample of U.S. adolescents, we used cluster analysis to investigate the interactions of these behaviors with CRF. We hypothesized that distinct clustering patterns exist and that less healthy clusters are associated with lower CRF. We used 2003-2004 National Health and Nutrition Examination Survey data for persons aged 12-19 years (N = 1,225). PA and ST were measured objectively by an accelerometer, and the American Heart Association Healthy Diet Score quantified diet quality. Maximal oxygen consumption (V˙O 2 max) was measured by submaximal treadmill exercise test. We performed cluster analysis to identify sex-specific clustering of diet, PA, and ST. Adjusting for accelerometer wear time, age, body mass index, race/ethnicity, and the poverty-to-income ratio, we performed sex-stratified linear regression analysis to evaluate the association of cluster with V˙O 2 max. Three clusters were identified for girls and boys. For girls, there was no difference across clusters for age (p = .1), weight (p = .3), and BMI (p = .5), and no relationship between clusters and V˙O 2 max. For boys, the youngest cluster (p < .01) had three healthy behaviors, weighed less, and was associated with a higher V˙O 2 max compared with the two older clusters. We observed clustering of diet, PA, and ST in U.S. adolescents. Specific patterns were associated with lower V˙O 2 max for boys, suggesting that our clusters may help identify adolescent boys most in need of interventions. Published by Elsevier Inc.
Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

PubMed

Ruzik, L; Obarski, N; Papierz, A; Mojski, M

2015-06-01

High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia

NASA Astrophysics Data System (ADS)

Novak, Vibor; Renema, Willem

2018-01-01

To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
Descriptive Statistics and Cluster Analysis for Extreme Rainfall in Java Island

NASA Astrophysics Data System (ADS)

E Komalasari, K.; Pawitan, H.; Faqih, A.

2017-03-01

This study aims to describe regional pattern of extreme rainfall based on maximum daily rainfall for period 1983 to 2012 in Java Island. Descriptive statistics analysis was performed to obtain centralization, variation and distribution of maximum precipitation data. Mean and median are utilized to measure central tendency data while Inter Quartile Range (IQR) and standard deviation are utilized to measure variation of data. In addition, skewness and kurtosis used to obtain shape the distribution of rainfall data. Cluster analysis using squared euclidean distance and ward method is applied to perform regional grouping. Result of this study show that mean (average) of maximum daily rainfall in Java Region during period 1983-2012 is around 80-181mm with median between 75-160mm and standard deviation between 17 to 82. Cluster analysis produces four clusters and show that western area of Java tent to have a higher annual maxima of daily rainfall than northern area, and have more variety of annual maximum value.
Optimal wavelength band clustering for multispectral iris recognition.

PubMed

Gong, Yazhuo; Zhang, David; Shi, Pengfei; Yan, Jingqi

2012-07-01

This work explores the possibility of clustering spectral wavelengths based on the maximum dissimilarity of iris textures. The eventual goal is to determine how many bands of spectral wavelengths will be enough for iris multispectral fusion and to find these bands that will provide higher performance of iris multispectral recognition. A multispectral acquisition system was first designed for imaging the iris at narrow spectral bands in the range of 420 to 940 nm. Next, a set of 60 human iris images that correspond to the right and left eyes of 30 different subjects were acquired for an analysis. Finally, we determined that 3 clusters were enough to represent the 10 feature bands of spectral wavelengths using the agglomerative clustering based on two-dimensional principal component analysis. The experimental results suggest (1) the number, center, and composition of clusters of spectral wavelengths and (2) the higher performance of iris multispectral recognition based on a three wavelengths-bands fusion.
Cluster analysis in phenotyping a Portuguese population.

PubMed

Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

2015-09-03

Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.
Presentation on systems cluster research

NASA Technical Reports Server (NTRS)

Morgenthaler, George W.

1989-01-01

This viewgraph presentation presents an overview of systems cluster research performed by the Center for Space Construction. The goals of the research are to develop concepts, insights, and models for space construction and to develop systems engineering/analysis curricula for training future aerospace engineers. The following topics are covered: CSC systems analysis/systems engineering (SIMCON) model, CSC systems cluster schedule, system life-cycle, model optimization techniques, publications, cooperative efforts, and sponsored research.
Clinical Phenotype of Diabetic Peripheral Neuropathy and Relation to Symptom Patterns: Cluster and Factor Analysis in Patients with Type 2 Diabetes in Korea.

PubMed

Won, Jong Chul; Im, Yong-Jin; Lee, Ji-Hyun; Kim, Chong Hwa; Kwon, Hyuk Sang; Cha, Bong-Yun; Park, Tae Sun

2017-01-01

Patients with diabetic peripheral neuropathy (DPN) is the most common complication. However, patients are usually suffering from not only diverse sensory deficit but also neuropathy-related discomforts. The aim of this study is to identify distinct groups of patients with DPN with respect to its clinical impacts on symptom patterns and comorbidities. A hierarchical cluster analysis and factor analysis were performed to identify relevant subgroups of patients with DPN ( n = 1338) and symptom patterns. Patients with DPN were divided into three clusters: asymptomatic (cluster 1, n = 448, 33.5%), moderate symptoms with disturbed sleep (cluster 2, n = 562, 42.0%), and severe symptoms with decreased quality of life (cluster 3, n = 328, 24.5%). Patients in cluster 3, compared with clusters 1 and 2, were characterized by higher levels of HbA1c and more severe pain and physical impairments. Patients in cluster 2 had moderate pain levels but disturbed sleep patterns comparable to those in cluster 3. The frequency of symptoms on each item of MNSI by "painful" symptom pattern showed a similar distribution pattern with increasing intensities along the three clusters. Cluster and factor analysis endorsed the use of comprehensive and symptomatic subgrouping to individualize the evaluation of patients with DPN.
The `TTIME' Package: Performance Evaluation in a Cluster Computing Environment

NASA Astrophysics Data System (ADS)

Howe, Marico; Berleant, Daniel; Everett, Albert

2011-06-01

The objective of translating developmental event time across mammalian species is to gain an understanding of the timing of human developmental events based on known time of those events in animals. The potential benefits include improvements to diagnostic and intervention capabilities. The CRAN `ttime' package provides the functionality to infer unknown event timings and investigate phylogenetic proximity utilizing hierarchical clustering of both known and predicted event timings. The original generic mammalian model included nine eutherian mammals: Felis domestica (cat), Mustela putorius furo (ferret), Mesocricetus auratus (hamster), Macaca mulatta (monkey), Homo sapiens (humans), Mus musculus (mouse), Oryctolagus cuniculus (rabbit), Rattus norvegicus (rat), and Acomys cahirinus (spiny mouse). However, the data for this model is expected to grow as more data about developmental events is identified and incorporated into the analysis. Performance evaluation of the `ttime' package across a cluster computing environment versus a comparative analysis in a serial computing environment provides an important computational performance assessment. A theoretical analysis is the first stage of a process in which the second stage, if justified by the theoretical analysis, is to investigate an actual implementation of the `ttime' package in a cluster computing environment and to understand the parallelization process that underlies implementation.
A comparison of IQ and memory cluster solutions in moderate and severe pediatric traumatic brain injury.

PubMed

Thaler, Nicholas S; Terranova, Jennifer; Turner, Alisa; Mayfield, Joan; Allen, Daniel N

2015-01-01

Recent studies have examined heterogeneous neuropsychological outcomes in childhood traumatic brain injury (TBI) using cluster analysis. These studies have identified homogeneous subgroups based on tests of IQ, memory, and other cognitive abilities that show some degree of association with specific cognitive, emotional, and behavioral outcomes, and have demonstrated that the clusters derived for children with TBI are different from those observed in normal populations. However, the extent to which these subgroups are stable across abilities has not been examined, and this has significant implications for the generalizability and clinical utility of TBI clusters. The current study addressed this by comparing IQ and memory profiles of 137 children who sustained moderate-to-severe TBI. Cluster analysis of IQ and memory scores indicated that a four-cluster solution was optimal for the IQ scores and a five-cluster solution was optimal for the memory scores. Three clusters on each battery differed primarily by level of performance, while the others had pattern variations. Cross-plotting the clusters across respective IQ and memory test scores indicated that clusters defined by level were generally stable, while clusters defined by pattern differed. Notably, children with slower processing speed exhibited low-average to below-average performance on memory indexes. These results provide some support for the stability of previously identified memory and IQ clusters and provide information about the relationship between IQ and memory in children with TBI.
SEARCHING FOR BULK MOTIONS IN THE INTRACLUSTER MEDIUM OF MASSIVE, MERGING CLUSTERS WITH CHANDRA CCD DATA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Ang; Yu, Heng; Tozzi, Paolo

2016-04-10

We search for bulk motions in the intracluster medium (ICM) of massive clusters showing evidence of an ongoing or recent major merger with spatially resolved spectroscopy in Chandra CCD data. We identify a sample of six merging clusters with >150 ks Chandra exposure in the redshift range 0.1 < z < 0.3. By performing X-ray spectral analysis of projected ICM regions selected according to their surface brightness, we obtain the projected redshift maps for all of these clusters. After performing a robust analysis of the statistical and systematic uncertainties in the measured X-ray redshift z{sub X}, we check whether or not themore » global z{sub X} distribution differs from that expected when the ICM is at rest. We find evidence of significant bulk motions at more than 3σ in A2142 and A115, and less than 2σ in A2034 and A520. Focusing on single regions, we identify significant localized velocity differences in all of the merger clusters. We also perform the same analysis on two relaxed clusters with no signatures of recent mergers, finding no signs of bulk motions, as expected. Our results indicate that deep Chandra CCD data enable us to identify the presence of bulk motions at the level of v{sub BM} > 1000 km s{sup −1} in the ICM of massive merging clusters at 0.1 < z < 0.3. Although the CCD spectral resolution is not sufficient for a detailed analysis of the ICM dynamics, Chandra CCD data constitute a key diagnostic tool complementing X-ray bolometers on board future X-ray missions.« less
Upgrading of the LGD cluster at JINR to support DLNP experiments

NASA Astrophysics Data System (ADS)

Bednyakov, I. V.; Dolbilov, A. G.; Ivanov, Yu. P.

2017-01-01

Since its construction in 2005, the Computing Cluster of the Dzhelepov Laboratory of Nuclear Problems has been mainly used to perform calculations (data analysis, simulation, etc.) for various scientific collaborations in which DLNP scientists take an active part. The Cluster also serves to train specialists. Much has changed in the past decades, and the necessity has arisen to upgrade the cluster, increasing its power and replacing the outdated equipment to maintain its reliability and modernity. In this work we describe the experience of performing this upgrading, which can be helpful for system administrators to put new equipment for clusters of this type into operation quickly and efficiently.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

PubMed

Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

2015-01-01

To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
Elements concentration analysis in groundwater from the North Serra Geral aquifer in Santa Helena-Brazil using SR-TXRF spectrometer.

PubMed

Justen, Gisele C; Espinoza-Quiñones, Fernando R; Módenes, Aparecido Nivaldo; Bergamasco, Rosangela

2012-01-01

In this work the analysis of elements concentration in groundwater was performed using the synchrotron radiation total-reflection X-ray fluorescence (SR-TXRF) technique. A set of nine tube-wells with serious risk of contamination was chosen to monitor the mean concentration of elements in groundwater from the North Serra Geral aquifer in Santa Helena, Brazil, during 1 year. Element concentrations were determined applying a SR-TXRF methodology. The accuracy of SR-TXRF technique was validated by analysis of a certified reference material. As the groundwater composition in the North Serra Geral aquifer showed heterogeneity in the spatial distribution of eight major elements, a hierarchical clustering to the data was performed. By a similarity in their compositions, two of the nine wells were grouped in a first cluster, while the other seven were grouped in a second cluster. Calcium was the major element in all wells, with higher Ca concentration in the second cluster than in the first cluster. However, concentrations of Ti, V, Cr in the first cluster are slightly higher than those in the second cluster. The findings of this study within a monitoring program of tube-wells could provide a useful assessment of controls over groundwater composition and support management at regional level.
Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Professional Nurse (Associate Degree). Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lake County Area Vocational Center, Grayslake, IL.

This document contains a task analysis for health occupations (professional nurse) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…
Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Home Health Aide. Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lake County Area Vocational Center, Grayslake, IL.

This document contains a task analysis for health occupations (home health aid) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

HRLSim: a high performance spiking neural network simulator for GPGPU clusters.

PubMed

Minkovich, Kirill; Thibeault, Corey M; O'Brien, Michael John; Nogin, Aleksey; Cho, Youngkwan; Srinivasa, Narayan

2014-02-01

Modeling of large-scale spiking neural models is an important tool in the quest to understand brain function and subsequently create real-world applications. This paper describes a spiking neural network simulator environment called HRL Spiking Simulator (HRLSim). This simulator is suitable for implementation on a cluster of general purpose graphical processing units (GPGPUs). Novel aspects of HRLSim are described and an analysis of its performance is provided for various configurations of the cluster. With the advent of inexpensive GPGPU cards and compute power, HRLSim offers an affordable and scalable tool for design, real-time simulation, and analysis of large-scale spiking neural networks.
Clustering performance comparison using K-means and expectation maximization algorithms.

PubMed

Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

2014-11-14

Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Academic Performance and Lifestyle Behaviors in Australian School Children: A Cluster Analysis.

PubMed

Dumuid, Dorothea; Olds, Timothy; Martín-Fernández, Josep-Antoni; Lewis, Lucy K; Cassidy, Leah; Maher, Carol

2017-12-01

Poor academic performance has been linked with particular lifestyle behaviors, such as unhealthy diet, short sleep duration, high screen time, and low physical activity. However, little is known about how lifestyle behavior patterns (or combinations of behaviors) contribute to children's academic performance. We aimed to compare academic performance across clusters of children with common lifestyle behavior patterns. We clustered participants (Australian children aged 9-11 years, n = 284) into four mutually exclusive groups of distinct lifestyle behavior patterns, using the following lifestyle behaviors as cluster inputs: light, moderate, and vigorous physical activity; sedentary behavior and sleep, derived from 24-hour accelerometry; self-reported screen time and diet. Differences in academic performance (measured by a nationally administered standardized test) were detected across the clusters, with scores being lowest in the Junk Food Screenies cluster (unhealthy diet/high screen time) and highest in the Sitters cluster (high nonscreen sedentary behavior/low physical activity). These findings suggest that reduction in screen time and an improved diet may contribute positively to academic performance. While children with high nonscreen sedentary time performed better academically in this study, they also accumulated low levels of physical activity. This warrants further investigation, given the known physical and mental benefits of physical activity.
Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

NASA Technical Reports Server (NTRS)

Ballew, G.

1977-01-01

The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
Atlas-Guided Cluster Analysis of Large Tractography Datasets

PubMed Central

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

PubMed Central

Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

2014-01-01

Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis

ERIC Educational Resources Information Center

Blashfield, Roger

1977-01-01

Three different software programs which contain hierarchical agglomerative cluster analysis procedures were shown to generate different solutions on the same data set using apparently the same options. The basis for the differences in the solutions was the formulae used to calculate Euclidean distance. (Author/JKS)
Identifying Peer Institutions Using Cluster Analysis

ERIC Educational Resources Information Center

Boronico, Jess; Choksi, Shail S.

2012-01-01

The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…
ADHD and Reading Disabilities: A Cluster Analytic Approach for Distinguishing Subgroups.

ERIC Educational Resources Information Center

Bonafina, Marcela A.; Newcorn, Jeffrey H.; McKay, Kathleen E.; Koda, Vivian H.; Halperin, Jeffrey M.

2000-01-01

Using cluster analysis, a study empirically divided 54 children with attention-deficit/hyperactivity disorder (ADHD) based on their Full Scale IQ and reading ability. Clusters had different patterns of cognitive, behavioral, and neurochemical functions, as determined by discrepancies in Verbal-Performance IQ, academic achievement, parent…
Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

NASA Astrophysics Data System (ADS)

Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

2017-07-01

Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.
The applicability and effectiveness of cluster analysis

NASA Technical Reports Server (NTRS)

Ingram, D. S.; Actkinson, A. L.

1973-01-01

An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-11-01

In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Screen media usage, sleep time and academic performance in adolescents: clustering a self-organizing maps analysis.

PubMed

Peiró-Velert, Carmen; Valencia-Peris, Alexandra; González, Luis M; García-Massó, Xavier; Serra-Añó, Pilar; Devís-Devís, José

2014-01-01

Screen media usage, sleep time and socio-demographic features are related to adolescents' academic performance, but interrelations are little explored. This paper describes these interrelations and behavioral profiles clustered in low and high academic performance. A nationally representative sample of 3,095 Spanish adolescents, aged 12 to 18, was surveyed on 15 variables linked to the purpose of the study. A Self-Organizing Maps analysis established non-linear interrelationships among these variables and identified behavior patterns in subsequent cluster analyses. Topological interrelationships established from the 15 emerging maps indicated that boys used more passive videogames and computers for playing than girls, who tended to use mobile phones to communicate with others. Adolescents with the highest academic performance were the youngest. They slept more and spent less time using sedentary screen media when compared to those with the lowest performance, and they also showed topological relationships with higher socioeconomic status adolescents. Cluster 1 grouped boys who spent more than 5.5 hours daily using sedentary screen media. Their academic performance was low and they slept an average of 8 hours daily. Cluster 2 gathered girls with an excellent academic performance, who slept nearly 9 hours per day, and devoted less time daily to sedentary screen media. Academic performance was directly related to sleep time and socioeconomic status, but inversely related to overall sedentary screen media usage. Profiles from the two clusters were strongly differentiated by gender, age, sedentary screen media usage, sleep time and academic achievement. Girls with the highest academic results had a medium socioeconomic status in Cluster 2. Findings may contribute to establishing recommendations about the timing and duration of screen media usage in adolescents and appropriate sleep time needed to successfully meet the demands of school academics and to improve interventions targeting to affect behavioral change.
Screen Media Usage, Sleep Time and Academic Performance in Adolescents: Clustering a Self-Organizing Maps Analysis

PubMed Central

Peiró-Velert, Carmen; Valencia-Peris, Alexandra; González, Luis M.; García-Massó, Xavier; Serra-Añó, Pilar; Devís-Devís, José

2014-01-01

Screen media usage, sleep time and socio-demographic features are related to adolescents' academic performance, but interrelations are little explored. This paper describes these interrelations and behavioral profiles clustered in low and high academic performance. A nationally representative sample of 3,095 Spanish adolescents, aged 12 to 18, was surveyed on 15 variables linked to the purpose of the study. A Self-Organizing Maps analysis established non-linear interrelationships among these variables and identified behavior patterns in subsequent cluster analyses. Topological interrelationships established from the 15 emerging maps indicated that boys used more passive videogames and computers for playing than girls, who tended to use mobile phones to communicate with others. Adolescents with the highest academic performance were the youngest. They slept more and spent less time using sedentary screen media when compared to those with the lowest performance, and they also showed topological relationships with higher socioeconomic status adolescents. Cluster 1 grouped boys who spent more than 5.5 hours daily using sedentary screen media. Their academic performance was low and they slept an average of 8 hours daily. Cluster 2 gathered girls with an excellent academic performance, who slept nearly 9 hours per day, and devoted less time daily to sedentary screen media. Academic performance was directly related to sleep time and socioeconomic status, but inversely related to overall sedentary screen media usage. Profiles from the two clusters were strongly differentiated by gender, age, sedentary screen media usage, sleep time and academic achievement. Girls with the highest academic results had a medium socioeconomic status in Cluster 2. Findings may contribute to establishing recommendations about the timing and duration of screen media usage in adolescents and appropriate sleep time needed to successfully meet the demands of school academics and to improve interventions targeting to affect behavioral change. PMID:24941009
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

PubMed Central

Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

2015-01-01

Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
The Awareness and Educational Status on Oral Health of Elite Athletes: A Cross-Sectional Study with Cluster Analysis

ERIC Educational Resources Information Center

Ozgur, Bahar Odabas

2016-01-01

In this cross-sectional survey, this study aimed to determine the factors associated with oral health of elite athletes and to determine the clustering tendency of the variables by dendrogram, and to determine the relationship between predefined clusters and see how these clusters can converge. A total of 97 elite (that is, top-level performing)…
Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

PubMed

Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

2017-11-01

Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
An application of cluster analysis for determining homogeneous subregions: The agroclimatological point of view. [Rio Grande do Sul, Brazil

NASA Technical Reports Server (NTRS)

Parada, N. D. J. (Principal Investigator); Cappelletti, C. A.

1982-01-01

A stratification oriented to crop area and yield estimation problems was performed using an algorithm of clustering. The variables used were a set of agroclimatological characteristics measured in each one of the 232 municipalities of the State of Rio Grande do Sul, Brazil. A nonhierarchical cluster analysis was used and the pseudo F-statistics criterion was implemented for determining the "cut point" in the number of strata.
Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

NASA Astrophysics Data System (ADS)

Black, Joshua A.; Knowles, Peter J.

2018-06-01

The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.
Performance analysis of unsupervised optimal fuzzy clustering algorithm for MRI brain tumor segmentation.

PubMed

Blessy, S A Praylin Selva; Sulochana, C Helen

2015-01-01

Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.

Partially supervised speaker clustering.

PubMed

Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

2012-05-01

Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.
Symptom clustering and quality of life in patients with ovarian cancer undergoing chemotherapy.

PubMed

Nho, Ju-Hee; Reul Kim, Sung; Nam, Joo-Hyun

2017-10-01

The symptom clusters in patients with ovarian cancer undergoing chemotherapy have not been well evaluated. We investigated the symptom clusters and effects of symptom clusters on the quality of life of patients with ovarian cancer. We recruited 210 ovarian cancer patients being treated with chemotherapy and used a descriptive cross-sectional study design to collect information on their symptoms. To determine inter-relationships among symptoms, a principal component analysis with varimax rotation was performed based on the patient's symptoms (fatigue, pain, sleep disturbance, chemotherapy-induced peripheral neuropathy, anxiety, depression, and sexual dysfunction). All patients had experienced at least two domains of concurrent symptoms, and there were two types of symptom clusters. The first symptom cluster consisted of anxiety, depression, fatigue, and sleep disturbance symptoms, while the second symptom cluster consisted of pain and chemotherapy-induced peripheral neuropathy symptoms. Our subgroup cluster analysis showed that ovarian cancer patients with higher-scoring symptoms had significantly poorer quality of life in both symptom cluster 1 and 2 subgroups, with subgroup-specific patterns. The symptom clusters were different depending on age, age at disease onset, disease duration, recurrence, and performance status of patients with ovarian cancer. In addition, ovarian cancer patients experienced different symptom clusters according to cancer stage. The current study demonstrated that there is a specific pattern of symptom clusters, and symptom clusters negatively influence the quality of life in patients with ovarian cancer. Identifying symptom clusters of ovarian cancer patients may have clinical implications in improving symptom management. Copyright © 2017 Elsevier Ltd. All rights reserved.
Using conjoint and cluster analysis in developing new product for micro, small and medium enterprises (SMEs) based on customer preferences (Case study: Lampung province's banana chips)

NASA Astrophysics Data System (ADS)

Kosasih, Wilson; Salomon, Lithrone Laricha; Hutomo, Reynaldo

2017-08-01

This paper discusses the development of new products of Micro, Small and Medium Entreprises (SMEs) to identify what attributes are considered by consumers, as well as combinations of attributes that need to be analyzed into the main preferences of consumers. The purpose of this research is to increase the added value and competitiveness of SMEs through product innovation. The object of this study is banana chips produced by SMEs from the province of Lampung which it considered to be unique souvenirs of the province. The research data were collected by distributing questionnaires in Jakarta which has heterogeneous population, in order to develop banana chip's marketing and increase its market share in Indonesia. Data processing was performed using conjoint analysis and cluster analysis. Segmentation was performed using conjoint analysis based on the importance level of attributes and part-worth of level attributes of each cluster. Finally, characteristics and consumer preferences of each cluster will be a consideration in determining the product development and marketing strategies.
Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

PubMed Central

Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

2015-01-01

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

PubMed

Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

2015-01-01

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, Wucherl; Koo, Michelle; Cao, Yu

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less
Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks.

PubMed

Dinov, Martin; Leech, Robert

2017-01-01

Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks

PubMed Central

Dinov, Martin; Leech, Robert

2017-01-01

Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

PubMed Central

Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

2013-01-01

Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674
Dynamic competitive probabilistic principal components analysis.

PubMed

López-Rubio, Ezequiel; Ortiz-DE-Lazcano-Lobato, Juan Miguel

2009-04-01

We present a new neural model which extends the classical competitive learning (CL) by performing a Probabilistic Principal Components Analysis (PPCA) at each neuron. The model also has the ability to learn the number of basis vectors required to represent the principal directions of each cluster, so it overcomes a drawback of most local PCA models, where the dimensionality of a cluster must be fixed a priori. Experimental results are presented to show the performance of the network with multispectral image data.
A Cluster Analysis of Bronchial Asthma Patients with Depressive Symptoms.

PubMed

Seino, Yo; Hasegawa, Takashi; Koya, Toshiyuki; Sakagami, Takuro; Mashima, Ichiro; Shimizu, Natsue; Muramatsu, Yoshiyuki; Muramatsu, Kumiko; Suzuki, Eiichi; Kikuchi, Toshiaki

2018-03-09

Objective Whether or not depression affects the control or severity of asthma is unclear. We performed a cluster analysis of asthma patients with depressive symptoms to clarify their characteristics. Methods and subjects Multiple medical institutions in Niigata Prefecture, Japan, were surveyed in 2014. We recorded the age, disease duration, body mass index (BMI), medications, and surveyed asthma control status and severity, as well as depressive symptoms and adherence to treatment using questionnaires. A hierarchical cluster analysis was performed on the group of patients assessed as having depression. Results Of 2,273 patients, 128 were assessed as being positive for depressive symptoms (DS[+]). Thirty-three were excluded because of missing data, and the remaining 95 DS[+] patients were classified into 3 clusters (A, B, and C). The patients in cluster A (n=19) were elderly, had severe, poorly controlled asthma, and demonstrated possible adherence barriers; those in cluster B (n=26) were elderly with a low BMI and had no significant adherence barriers but had severe, poorly controlled asthma; and those in cluster C (n=50) were younger, with a high BMI, no significant adherence barriers, well-controlled asthma, and few were severely affected. The scores for depressive symptoms were not significantly different between clusters. Conclusion About half of the patients in the DS[+] group had severe, poorly controlled asthma, and these clusters were able to be distinguished by their ASK-12 score, which reflects adherence barriers. The control status and severity of asthma may also be related to the age, disease duration, and BMI in the DS[+] group.
Clustering cancer gene expression data by projective clustering ensemble

PubMed Central

Yu, Xianxue; Yu, Guoxian

2017-01-01

Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Identification and characterization of near-fatal asthma phenotypes by cluster analysis.

PubMed

Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V

2015-09-01

Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Subspace K-means clustering.

PubMed

Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

2013-12-01

To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

PubMed

Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

2018-07-01

Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging

NASA Astrophysics Data System (ADS)

Spinelli, Antonello E.; Boschi, Federico

2011-12-01

Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.
Identification of chronic rhinosinusitis phenotypes using cluster analysis.

PubMed

Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J

2015-05-01

Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.
Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

PubMed

Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

2018-04-01

Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).
Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of Landsat digital data. [mapping of hydrothermally altered volcanic rocks

NASA Technical Reports Server (NTRS)

Ballew, G.

1977-01-01

The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed using Johnson's HICLUS program. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
College Students' Perceptions of Job Demands, Recommended Retirement Ages, and Age of Optimal Performance in Selected Occupations

ERIC Educational Resources Information Center

Panek, Paul E.; Staats, Sara; Hiles, Amanda

2006-01-01

Two studies were conducted. In study one 100 participants rated 60 occupations on the amount of cognitive/intellectual, physical, sensory-perceptual, and perceptual-motor demands they perceived as required for successful performance in that particular occupation. Results of a cluster analysis determined four clusters of occupations on the basis of…

Elucidation of the Pattern of the Onset of Male Lower Urinary Tract Symptoms Using Cluster Analysis: Efficacy of Tamsulosin in Each Symptom Group.

PubMed

Aikawa, Ken; Kataoka, Masao; Ogawa, Soichiro; Akaihata, Hidenori; Sato, Yuichi; Yabe, Michihiro; Hata, Junya; Koguchi, Tomoyuki; Kojima, Yoshiyuki; Shiragasawa, Chihaya; Kobayashi, Toshimitsu; Yamaguchi, Osamu

2015-08-01

To present a new grouping of male patients with lower urinary tract symptoms (LUTS) based on symptom patterns and clarify whether the therapeutic effect of α1-blocker differs among the groups. We performed secondary analysis of anonymous data from 4815 patients enrolled in a postmarketing surveillance study of tamsulosin in Japan. Data on 7 International Prostate Symptom Score (IPSS) items at the initial visit were used in the cluster analysis. IPSS and quality of life (QOL) scores before and after tamsulosin treatment for 12 weeks were assessed in each cluster. Partial correlation coefficients were also obtained for IPSS and QOL scores based on changes before and after treatment. Five symptom groups were identified by cluster analysis of IPSS. On their symptom profile, each cluster was labeled as minimal type (cluster 1), multiple severe type (cluster 2), weak stream type (cluster 3), storage type (cluster 4), and voiding type (cluster 5). Prevalence and the mean symptom score were significantly improved in almost all symptoms in all clusters by tamsulosin treatment. Nocturia and weak stream had the strongest effect on QOL in clusters 1, 2, and 4 and clusters 3 and 5, respectively. The study clarified that 5 characteristic symptom patterns exist by cluster analysis of IPSS in male patients with LUTS. Tamsulosin improved various symptoms and QOL in each symptom group. The study reports many male patients with LUTS being satisfied with monotherapy using tamsulosin and suggests the usefulness of α1-blockers as a drug of first choice. Copyright © 2015 Elsevier Inc. All rights reserved.
Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

PubMed Central

Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

2016-01-01

Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230
Cognitive profiles in euthymic patients with bipolar disorders: results from the FACE-BD cohort.

PubMed

Roux, Paul; Raust, Aurélie; Cannavo, Anne Sophie; Aubin, Valérie; Aouizerate, Bruno; Azorin, Jean-Michel; Bellivier, Frank; Belzeaux, Raoul; Bougerol, Thierry; Cussac, Iréna; Courtet, Philippe; Etain, Bruno; Gard, Sébastien; Job, Sophie; Kahn, Jean-Pierre; Leboyer, Marion; Olié, Emilie; Henry, Chantal; Passerieux, Christine

2017-03-01

Although cognitive deficits are a well-established feature of bipolar disorders (BD), even during periods of euthymia, little is known about cognitive phenotype heterogeneity among patients with BD. We investigated neuropsychological performance in 258 euthymic patients with BD recruited via the French network of expert centers for BD. We used a test battery assessing six domains of cognition. Hierarchical cluster analysis of the cross-sectional data was used to determine the optimal number of subgroups and to assign each patient to a specific cognitive cluster. Subsequently, subjects from each cluster were compared on demographic, clinical functioning, and pharmacological variables. A four-cluster solution was identified. The global cognitive performance was above normal in one cluster and below normal in another. The other two clusters had a near-normal cognitive performance, with above and below average verbal memory, respectively. Among the four clusters, significant differences were observed in estimated intelligence quotient and social functioning, which were lower for the low cognitive performers compared to the high cognitive performers. These results confirm the existence of several distinct cognitive profiles in BD. Identification of these profiles may help to develop profile-specific cognitive remediation programs, which might improve functioning in BD. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Optimizing R with SparkR on a commodity cluster for biomedical research.

PubMed

Sedlmayr, Martin; Würfl, Tobias; Maier, Christian; Häberle, Lothar; Fasching, Peter; Prokosch, Hans-Ulrich; Christoph, Jan

2016-12-01

Medical researchers are challenged today by the enormous amount of data collected in healthcare. Analysis methods such as genome-wide association studies (GWAS) are often computationally intensive and thus require enormous resources to be performed in a reasonable amount of time. While dedicated clusters and public clouds may deliver the desired performance, their use requires upfront financial efforts or anonymous data, which is often not possible for preliminary or occasional tasks. We explored the possibilities to build a private, flexible cluster for processing scripts in R based on commodity, non-dedicated hardware of our department. For this, a GWAS-calculation in R on a single desktop computer, a Message Passing Interface (MPI)-cluster, and a SparkR-cluster were compared with regards to the performance, scalability, quality, and simplicity. The original script had a projected runtime of three years on a single desktop computer. Optimizing the script in R already yielded a significant reduction in computing time (2 weeks). By using R-MPI and SparkR, we were able to parallelize the computation and reduce the time to less than three hours (2.6 h) on already available, standard office computers. While MPI is a proven approach in high-performance clusters, it requires rather static, dedicated nodes. SparkR and its Hadoop siblings allow for a dynamic, elastic environment with automated failure handling. SparkR also scales better with the number of nodes in the cluster than MPI due to optimized data communication. R is a popular environment for clinical data analysis. The new SparkR solution offers elastic resources and allows supporting big data analysis using R even on non-dedicated resources with minimal change to the original code. To unleash the full potential, additional efforts should be invested to customize and improve the algorithms, especially with regards to data distribution. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

PubMed

Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

2014-11-01

Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Accelerating epistasis analysis in human genetics with consumer graphics hardware.

PubMed

Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

2009-07-24

Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
Profiles of More and Less Successful L2 Learners: A Cluster Analysis Study

ERIC Educational Resources Information Center

Sparks, Richard L.; Patton, Jon; Ganschow, Leonore

2012-01-01

This retrospective study examined L1 achievement, intelligence, L2 aptitude, and L2 proficiency profiles of 208 students completing two years of high school L2 courses. A cluster analysis was performed to determine whether distinct cognitive and achievement profiles of more and less successful L2 learners would emerge. The results of…
Student Motivational Profiles in an Introductory MIS Course: An Exploratory Cluster Analysis

ERIC Educational Resources Information Center

Nelson, Klara

2014-01-01

This study profiles students in an introductory MIS course according to a variety of variables associated with choice of academic major. The data were collected through a survey administered to 12 sections of the course. A two-step cluster analysis was performed with gender as a categorical variable and students' perceptions of task value…
The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

NASA Astrophysics Data System (ADS)

Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

2014-05-01

Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.
Cluster analysis of multiple planetary flow regimes

NASA Technical Reports Server (NTRS)

Mo, Kingtse; Ghil, Michael

1987-01-01

A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

PubMed

Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

2017-01-01

Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

PubMed

Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

2015-09-01

We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.
Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study

PubMed Central

Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

2018-01-01

Abstract Background A huge amount of literature suggests that adolescents’ health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. Methods The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students’ data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Results Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: ‘Alcohol drinking’, ‘Smoking’, ‘Physical activity’, ‘Screen time’, ‘Signs & symptoms’, ‘Healthy eating’, ‘Violence’ and ‘Sweet tooth’. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Conclusions Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany. PMID:27908972
Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study.

PubMed

Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

2018-03-01

A huge amount of literature suggests that adolescents' health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students' data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: 'Alcohol drinking', 'Smoking', 'Physical activity', 'Screen time', 'Signs & symptoms', 'Healthy eating', 'Violence' and 'Sweet tooth'. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany.
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering

NASA Technical Reports Server (NTRS)

Rizvi, Farheen

2013-01-01

The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
Chaotic map clustering algorithm for EEG analysis

NASA Astrophysics Data System (ADS)

Bellotti, R.; De Carlo, F.; Stramaglia, S.

2004-03-01

The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Systematic detection and classification of earthquake clusters in Italy

NASA Astrophysics Data System (ADS)

Poli, P.; Ben-Zion, Y.; Zaliapin, I. V.

2017-12-01

We perform a systematic analysis of spatio-temporal clustering of 2007-2017 earthquakes in Italy with magnitudes m>3. The study employs the nearest-neighbor approach of Zaliapin and Ben-Zion [2013a, 2013b] with basic data-driven parameters. The results indicate that seismicity in Italy (an extensional tectonic regime) is dominated by clustered events, with smaller proportion of background events than in California. Evaluation of internal cluster properties allows separation of swarm-like from burst-like seismicity. This classification highlights a strong geographical coherence of cluster properties. Swarm-like seismicity are dominant in regions characterized by relatively slow deformation with possible elevated temperature and/or fluids (e.g. Alto Tiberina, Pollino), while burst-like seismicity are observed in crystalline tectonic regions (Alps and Calabrian Arc) and in Central Italy where moderate to large earthquakes are frequent (e.g. L'Aquila, Amatrice). To better assess the variation of seismicity style across Italy, we also perform a clustering analysis with region-specific parameters. This analysis highlights clear spatial changes of the threshold separating background and clustered seismicity, and permits better resolution of different clusters in specific geological regions. For example, a large proportion of repeaters is found in the Etna region as expected for volcanic-induced seismicity. A similar behavior is observed in the northern Apennines with high pore pressure associated with mantle degassing. The observed variations of earthquakes properties highlight shortcomings of practices using large-scale average seismic properties, and points to connections between seismicity and local properties of the lithosphere. The observations help to improve the understanding of the physics governing the occurrence of earthquakes in different regions.
Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

NASA Astrophysics Data System (ADS)

Yen, Chi-Fu; Sivasankar, Sanjeevi

2018-03-01

Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.
Open-Source Sequence Clustering Methods Improve the State Of the Art.

PubMed

Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

2016-01-01

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Cluster analysis of Southeastern U.S. climate stations

NASA Astrophysics Data System (ADS)

Stooksbury, D. E.; Michaels, P. J.

1991-09-01

A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

Choosing appropriate analysis methods for cluster randomised cross-over trials with a binary outcome.

PubMed

Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C

2017-01-30

In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oesterling, Patrick; Heine, Christian; Weber, Gunther H.

2012-05-04

Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

NASA Astrophysics Data System (ADS)

Yasid, A.

2018-01-01

Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.
Clusters of Word Properties as Predictors of Elementary School Children's Performance on Two Word Tasks

ERIC Educational Resources Information Center

Tellings, Agnes; Coppens, Karien; Gelissen, John; Schreuder, Rob

2013-01-01

Often, the classification of words does not go beyond "difficult" (i.e., infrequent, late-learned, nonimageable, etc.) or "easy" (i.e., frequent, early-learned, imageable, etc.) words. In the present study, we used a latent cluster analysis to divide 703 Dutch words with scores for eight word properties into seven clusters of words. Each cluster…
Supervised group Lasso with applications to microarray data analysis

PubMed Central

Ma, Shuangge; Song, Xiao; Huang, Jian

2007-01-01

Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
The X-ray cluster survey with eRosita: forecasts for cosmology, cluster physics and primordial non-Gaussianity

NASA Astrophysics Data System (ADS)

Pillepich, Annalisa; Porciani, Cristiano; Reiprich, Thomas H.

2012-05-01

Starting in late 2013, the eRosita telescope will survey the X-ray sky with unprecedented sensitivity. Assuming a detection limit of 50 photons in the (0.5-2.0) keV energy band with a typical exposure time of 1.6 ks, we predict that eRosita will detect ˜9.3 × 104 clusters of galaxies more massive than 5 × 1013 h-1 M⊙, with the currently planned all-sky survey. Their median redshift will be z≃ 0.35. We perform a Fisher-matrix analysis to forecast the constraining power of ? on the Λ cold dark matter (ΛCDM) cosmology and, simultaneously, on the X-ray scaling relations for galaxy clusters. Special attention is devoted to the possibility of detecting primordial non-Gaussianity. We consider two experimental probes: the number counts and the angular clustering of a photon-count limited sample of clusters. We discuss how the cluster sample should be split to optimize the analysis and we show that redshift information of the individual clusters is vital to break the strong degeneracies among the model parameters. For example, performing a 'tomographic' analysis based on photometric-redshift estimates and combining one- and two-point statistics will give marginal 1σ errors of Δσ8≃ 0.036 and ΔΩm≃ 0.012 without priors, and improve the current estimates on the slope of the luminosity-mass relation by a factor of 3. Regarding primordial non-Gaussianity, ? clusters alone will give ΔfNL≃ 9, 36 and 144 for the local, orthogonal and equilateral model, respectively. Measuring redshifts with spectroscopic accuracy would further tighten the constraints by nearly 40 per cent (barring fNL which displays smaller improvements). Finally, combining ? data with the analysis of temperature anisotropies in the cosmic microwave background by the Planck satellite should give sensational constraints on both the cosmology and the properties of the intracluster medium.
Exploring relationships between Dairy Herd Improvement monitors of performance and the Transition Cow Index in Wisconsin dairy herds.

PubMed

Schultz, K K; Bennett, T B; Nordlund, K V; Döpfer, D; Cook, N B

2016-09-01

Transition cow management has been tracked via the Transition Cow Index (TCI; AgSource Cooperative Services, Verona, WI) since 2006. Transition Cow Index was developed to measure the difference between actual and predicted milk yield at first test day to evaluate the relative success of the transition period program. This project aimed to assess TCI in relation to all commonly used Dairy Herd Improvement (DHI) metrics available through AgSource Cooperative Services. Regression analysis was used to isolate variables that were relevant to TCI, and then principal components analysis and network analysis were used to determine the relative strength and relatedness among variables. Finally, cluster analysis was used to segregate herds based on similarity of relevant variables. The DHI data were obtained from 2,131 Wisconsin dairy herds with test-day mean ≥30 cows, which were tested ≥10 times throughout the 2014 calendar year. The original list of 940 DHI variables was reduced through expert-driven selection and regression analysis to 23 variables. The K-means cluster analysis produced 5 distinct clusters. Descriptive statistics were calculated for the 23 variables per cluster grouping. Using principal components analysis, cluster analysis, and network analysis, 4 parameters were isolated as most relevant to TCI; these were energy-corrected milk, 3 measures of intramammary infection (dry cow cure rate, linear somatic cell count score in primiparous cows, and new infection rate), peak ratio, and days in milk at peak milk production. These variables together with cow and newborn calf survival measures form a group of metrics that can be used to assist in the evaluation of overall transition period performance. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA

NASA Astrophysics Data System (ADS)

Perren, G. I.; Piatti, A. E.; Vázquez, R. A.

2017-06-01

Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89
Performance Analysis of Cluster Formation in Wireless Sensor Networks.

PubMed

Montiel, Edgar Romo; Rivero-Angeles, Mario E; Rubino, Gerardo; Molina-Lozano, Heron; Menchaca-Mendez, Rolando; Menchaca-Mendez, Ricardo

2017-12-13

Clustered-based wireless sensor networks have been extensively used in the literature in order to achieve considerable energy consumption reductions. However, two aspects of such systems have been largely overlooked. Namely, the transmission probability used during the cluster formation phase and the way in which cluster heads are selected. Both of these issues have an important impact on the performance of the system. For the former, it is common to consider that sensor nodes in a clustered-based Wireless Sensor Network (WSN) use a fixed transmission probability to send control data in order to build the clusters. However, due to the highly variable conditions experienced by these networks, a fixed transmission probability may lead to extra energy consumption. In view of this, three different transmission probability strategies are studied: optimal, fixed and adaptive. In this context, we also investigate cluster head selection schemes, specifically, we consider two intelligent schemes based on the fuzzy C-means and k-medoids algorithms and a random selection with no intelligence. We show that the use of intelligent schemes greatly improves the performance of the system, but their use entails higher complexity and selection delay. The main performance metrics considered in this work are energy consumption, successful transmission probability and cluster formation latency. As an additional feature of this work, we study the effect of errors in the wireless channel and the impact on the performance of the system under the different transmission probability schemes.
Performance Analysis of Cluster Formation in Wireless Sensor Networks

PubMed Central

Montiel, Edgar Romo; Rivero-Angeles, Mario E.; Rubino, Gerardo; Molina-Lozano, Heron; Menchaca-Mendez, Rolando; Menchaca-Mendez, Ricardo

2017-01-01

Clustered-based wireless sensor networks have been extensively used in the literature in order to achieve considerable energy consumption reductions. However, two aspects of such systems have been largely overlooked. Namely, the transmission probability used during the cluster formation phase and the way in which cluster heads are selected. Both of these issues have an important impact on the performance of the system. For the former, it is common to consider that sensor nodes in a clustered-based Wireless Sensor Network (WSN) use a fixed transmission probability to send control data in order to build the clusters. However, due to the highly variable conditions experienced by these networks, a fixed transmission probability may lead to extra energy consumption. In view of this, three different transmission probability strategies are studied: optimal, fixed and adaptive. In this context, we also investigate cluster head selection schemes, specifically, we consider two intelligent schemes based on the fuzzy C-means and k-medoids algorithms and a random selection with no intelligence. We show that the use of intelligent schemes greatly improves the performance of the system, but their use entails higher complexity and selection delay. The main performance metrics considered in this work are energy consumption, successful transmission probability and cluster formation latency. As an additional feature of this work, we study the effect of errors in the wireless channel and the impact on the performance of the system under the different transmission probability schemes. PMID:29236065
Characteristics of airflow and particle deposition in COPD current smokers

NASA Astrophysics Data System (ADS)

Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

2017-11-01

A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.
Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

PubMed

Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

2011-05-01

Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.
A clustering approach to segmenting users of internet-based risk calculators.

PubMed

Harle, C A; Downs, J S; Padman, R

2011-01-01

Risk calculators are widely available Internet applications that deliver quantitative health risk estimates to consumers. Although these tools are known to have varying effects on risk perceptions, little is known about who will be more likely to accept objective risk estimates. To identify clusters of online health consumers that help explain variation in individual improvement in risk perceptions from web-based quantitative disease risk information. A secondary analysis was performed on data collected in a field experiment that measured people's pre-diabetes risk perceptions before and after visiting a realistic health promotion website that provided quantitative risk information. K-means clustering was performed on numerous candidate variable sets, and the different segmentations were evaluated based on between-cluster variation in risk perception improvement. Variation in responses to risk information was best explained by clustering on pre-intervention absolute pre-diabetes risk perceptions and an objective estimate of personal risk. Members of a high-risk overestimater cluster showed large improvements in their risk perceptions, but clusters of both moderate-risk and high-risk underestimaters were much more muted in improving their optimistically biased perceptions. Cluster analysis provided a unique approach for segmenting health consumers and predicting their acceptance of quantitative disease risk information. These clusters suggest that health consumers were very responsive to good news, but tended not to incorporate bad news into their self-perceptions much. These findings help to quantify variation among online health consumers and may inform the targeted marketing of and improvements to risk communication tools on the Internet.
Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy

NASA Astrophysics Data System (ADS)

Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.

2012-11-01

Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.
[Prognostic differences of phenotypes in pT1-2N0 invasive breast cancer: a large cohort study with cluster analysis].

PubMed

Wang, Z; Wang, W H; Wang, S L; Jin, J; Song, Y W; Liu, Y P; Ren, H; Fang, H; Tang, Y; Chen, B; Qi, S N; Lu, N N; Li, N; Tang, Y; Liu, X F; Yu, Z H; Li, Y X

2016-06-23

To find phenotypic subgroups of patients with pT1-2N0 invasive breast cancer by means of cluster analysis and estimate the prognosis and clinicopathological features of these subgroups. From 1999 to 2013, 4979 patients with pT1-2N0 invasive breast cancer were recruited for hierarchical clustering analysis. Age (≤40, 41-70, 70+ years), size of primary tumor, pathological type, grade of differentiation, microvascular invasion, estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER-2) were chosen as distance metric between patients. Hierarchical cluster analysis was performed using Ward's method. Cophenetic correlation coefficient (CPCC) and Spearman correlation coefficient were used to validate clustering structures. The CPCC was 0.603. The Spearman correlation coefficient was 0.617 (P<0.001), which indicated a good fit of hierarchy to the data. A twelve-cluster model seemed to best illustrate our patient cohort. Patients in cluster 5, 9 and 12 had best prognosis and were characterized by age >40 years, smaller primary tumor, lower histologic grade, positive ER and PR status, and mainly negative HER-2. Patients in the cluster 1 and 11 had the worst prognosis, The cluster 1 was characterized by a larger tumor, higher grade and negative ER and PR status, while the cluster 11 was characterized by positive microvascular invasion. Patients in other 7 clusters had a moderate prognosis, and patients in each cluster had distinctive clinicopathological features and recurrent patterns. This study identified distinctive clinicopathologic phenotypes in a large cohort of patients with pT1-2N0 breast cancer through hierarchical clustering and revealed different prognosis. This integrative model may help physicians to make more personalized decisions regarding adjuvant therapy.
Autoantibodies in pediatric systemic lupus erythematosus: ethnic grouping, cluster analysis, and clinical correlations.

PubMed

Jurencák, Roman; Fritzler, Marvin; Tyrrell, Pascal; Hiraki, Linda; Benseler, Susanne; Silverman, Earl

2009-02-01

(1) To evaluate the spectrum of serum autoantibodies in pediatric-onset systemic lupus erythematosus (pSLE) with a focus on ethnic differences; (2) using cluster analysis, to identify patients with similar autoantibody patterns and to determine their clinical associations. A single-center cohort study of all patients with newly diagnosed pSLE seen over an 8-year period was performed. Ethnicity, clinical, and serological data were prospectively collected from 156/169 patients (92%). The frequencies of 10 selected autoantibodies among ethnic groups were compared. Cluster analysis identified groups of patients with similar autoantibody profiles. Associations of these groups with clinical and laboratory features of pSLE were examined. Among our 5 ethnic groups, there were differences only in the prevalence of anti-U1RNP and anti-Sm antibodies, which occurred more frequently in non-Caucasian patients (p < 0.0001, p < 0.01, respectively). Cluster analysis revealed 3 autoantibody clusters. Cluster 1 consisted of anti-dsDNA antibodies. Cluster 2 consisted of anti-dsDNA, antichromatin, antiribosomal P, anti-U1RNP, anti-Sm, anti-Ro and anti-La autoantibody. Cluster 3 consisted of anti-dsDNA, anti-RNP, and anti-Sm autoantibody. The highest proportion of Caucasians was in cluster 1 (p < 0.05), which was characterized by a mild disease with infrequent major organ involvement compared to cluster 2, which had the highest frequency of nephritis, renal failure, serositis, and hemolytic anemia, or cluster 3, which was characterized by frequent neuropsychiatric disease and nephritis. We observed ethnic differences in autoantibody profiles in pSLE. Autoantibodies tended to cluster together and these clusters were associated with different clinical courses.
Relationship between Procedural Tactical Knowledge and Specific Motor Skills in Young Soccer Players

PubMed Central

Aquino, Rodrigo; Marques, Renato Francisco R.; Petiot, Grégory Hallé; Gonçalves, Luiz Guilherme C.; Moraes, Camila; Santiago, Paulo Roberto P.; Puggina, Enrico Fuini

2016-01-01

The purpose of this study was to investigate the association between offensive tactical knowledge and the soccer-specific motor skills performance. Fifteen participants were submitted to two evaluation tests, one to assess their technical and tactical analysis. The motor skills performance was measured through four tests of technical soccer skills: ball control, shooting, passing and dribbling. The tactical performance was based on a tactical assessment system called FUT-SAT (Analyses of Procedural Tactical Knowledge in Soccer). Afterwards, technical and tactical evaluation scores were ranked with and without the use of the cluster method. A positive, weak correlation was perceived in both analyses (rho = 0.39, not significant p = 0.14 (with cluster analysis); and rho = 0.35; not significant p = 0.20 (without cluster analysis)). We can conclude that there was a weak association between the technical and the offensive tactical knowledge. This shows the need to reflect on the use of such tests to assess technical skills in team sports since they do not take into account the variability and unpredictability of game actions and disregard the inherent needs to assess such skill performance in the game. PMID:29910300
Assessment of cluster yield components by image analysis.

PubMed

Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

2015-04-01

Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
Effect of Policy Analysis on Indonesia’s Maritime Cluster Development Using System Dynamics Modeling

NASA Astrophysics Data System (ADS)

Nursyamsi, A.; Moeis, A. O.; Komarudin

2018-03-01

As an archipelago with two third of its territory consist of water, Indonesia should address more attention to its maritime industry development. One of the catalyst to fasten the maritime industry growth is by developing a maritime cluster. The purpose of this research is to gain understanding of the effect if Indonesia implement maritime cluster policy to the growth of maritime economic and its role to enhance the maritime cluster performance, hence enhancing Indonesia’s maritime industry as well. The result of the constructed system dynamic model simulation shows that with the effect of maritime cluster, the growth of employment rate and maritime economic is much bigger that the business as usual case exponentially. The result implies that the government should act fast to form a legitimate cluster maritime organizer institution so that there will be a synergize, sustainable, and positive maritime cluster environment that will benefit the performance of Indonesia’s maritime industry.
Iterative Stable Alignment and Clustering of 2D Transmission Electron Microscope Images

PubMed Central

Yang, Zhengfan; Fang, Jia; Chittuluru, Johnathan; Asturias, Francisco J.; Penczek, Pawel A.

2012-01-01

SUMMARY Identification of homogeneous subsets of images in a macromolecular electron microscopy (EM) image data set is a critical step in single-particle analysis. The task is handled by iterative algorithms, whose performance is compromised by the compounded limitations of image alignment and K-means clustering. Here we describe an approach, iterative stable alignment and clustering (ISAC) that, relying on a new clustering method and on the concepts of stability and reproducibility, can extract validated, homogeneous subsets of images. ISAC requires only a small number of simple parameters and, with minimal human intervention, can eliminate bias from two-dimensional image clustering and maximize the quality of group averages that can be used for ab initio three-dimensional structural determination and analysis of macromolecular conformational variability. Repeated testing of the stability and reproducibility of a solution within ISAC eliminates heterogeneous or incorrect classes and introduces critical validation to the process of EM image clustering. PMID:22325773

Functional clustering of time series gene expression data by Granger causality

PubMed Central

2012-01-01

Background A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them. PMID:23107425
Substructures in DAFT/FADA survey clusters based on XMM and optical data

NASA Astrophysics Data System (ADS)

Durret, F.; DAFT/FADA Team

2014-07-01

The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.
ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices.

PubMed

Wilderjans, Tom F; Ceulemans, Eva; Van Mechelen, Iven; Depril, Dirk

2011-03-01

In many areas of psychology, one is interested in disclosing the underlying structural mechanisms that generated an object by variable data set. Often, based on theoretical or empirical arguments, it may be expected that these underlying mechanisms imply that the objects are grouped into clusters that are allowed to overlap (i.e., an object may belong to more than one cluster). In such cases, analyzing the data with Mirkin's additive profile clustering model may be appropriate. In this model: (1) each object may belong to no, one or several clusters, (2) there is a specific variable profile associated with each cluster, and (3) the scores of the objects on the variables can be reconstructed by adding the cluster-specific variable profiles of the clusters the object in question belongs to. Until now, however, no software program has been publicly available to perform an additive profile clustering analysis. For this purpose, in this article, the ADPROCLUS program, steered by a graphical user interface, is presented. We further illustrate its use by means of the analysis of a patient by symptom data matrix.
Open star clusters and Galactic structure

NASA Astrophysics Data System (ADS)

Joshi, Yogesh C.

2018-04-01

In order to understand the Galactic structure, we perform a statistical analysis of the distribution of various cluster parameters based on an almost complete sample of Galactic open clusters yet available. The geometrical and physical characteristics of a large number of open clusters given in the MWSC catalogue are used to study the spatial distribution of clusters in the Galaxy and determine the scale height, solar offset, local mass density and distribution of reddening material in the solar neighbourhood. We also explored the mass-radius and mass-age relations in the Galactic open star clusters. We find that the estimated parameters of the Galactic disk are largely influenced by the choice of cluster sample.
The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

PubMed

Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

2013-01-01

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

PubMed Central

Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob

2013-01-01

We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Unsupervised spike sorting based on discriminative subspace learning.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2014-01-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
Phylogenetic relationship of Ornithobacterium rhinotracheale strains.

PubMed

DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo

2018-04-10

The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.
Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

PubMed

Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

2018-06-07

Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
Comparative study of two protocols for quantitative image-analysis of serotonin transporter clustering in lymphocytes, a putative biomarker of therapeutic efficacy in major depression.

PubMed

Romay-Tallon, Raquel; Rivera-Baltanas, Tania; Allen, Josh; Olivares, Jose M; Kalynchuk, Lisa E; Caruncho, Hector J

2017-01-01

The pattern of serotonin transporter clustering on the plasma membrane of lymphocytes extracted from human whole blood samples has been identified as a putative biomarker of therapeutic efficacy in major depression. Here we evaluated the possibility of performing a similar analysis using blood smears obtained from rats, and from control human subjects and depression patients. We hypothesized that we could optimize a protocol to make the analysis of serotonin protein clustering in blood smears comparable to the analysis of serotonin protein clustering using isolated lymphocytes. Our data indicate that blood smears require a longer fixation time and longer times of incubation with primary and secondary antibodies. In addition, one needs to optimize the image analysis settings for the analysis of smears. When these steps are followed, the quantitative analysis of both the number and size of serotonin transporter clusters on the plasma membrane of lymphocytes is similar using both blood smears and isolated lymphocytes. The development of this novel protocol will greatly facilitate the collection of appropriate samples by eliminating the necessity and cost of specialized personnel for drawing blood samples, and by being a less invasive procedure. Therefore, this protocol will help us advance the validation of membrane protein clustering in lymphocytes as a biomarker of therapeutic efficacy in major depression, and bring it closer to its clinical application.
Determining the Optimal Number of Clusters with the Clustergram

NASA Technical Reports Server (NTRS)

Fluegemann, Joseph K.; Davies, Misty D.; Aguirre, Nathan D.

2011-01-01

Cluster analysis aids research in many different fields, from business to biology to aerospace. It consists of using statistical techniques to group objects in large sets of data into meaningful classes. However, this process of ordering data points presents much uncertainty because it involves several steps, many of which are subject to researcher judgment as well as inconsistencies depending on the specific data type and research goals. These steps include the method used to cluster the data, the variables on which the cluster analysis will be operating, the number of resulting clusters, and parts of the interpretation process. In most cases, the number of clusters must be guessed or estimated before employing the clustering method. Many remedies have been proposed, but none is unassailable and certainly not for all data types. Thus, the aim of current research for better techniques of determining the number of clusters is generally confined to demonstrating that the new technique excels other methods in performance for several disparate data types. Our research makes use of a new cluster-number-determination technique based on the clustergram: a graph that shows how the number of objects in the cluster and the cluster mean (the ordinate) change with the number of clusters (the abscissa). We use the features of the clustergram to make the best determination of the cluster-number.
Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

PubMed

Sauzet, Odile; Peacock, Janet L

2017-07-20

The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
Combining self-organizing mapping and supervised affinity propagation clustering approach to investigate functional brain networks involved in motor imagery and execution with fMRI measurements.

PubMed

Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin

2015-01-01

Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.
Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

PubMed

Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

2017-10-01

Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P < .0001 for all comparisons). We identified 3 clinical clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
DSRC standards testing : 5MHz band-plan analysis, clustered system architecture and communication in emergency scenarios.

DOT National Transportation Integrated Search

2011-12-01

Researchers performed a system level technical study of physical layer and network layer performance of vehicular communication in a specially licensed Dedicated Short Range Communication (DSRC) 5.9 GHz frequency band. Physical layer analysis provide...
Subphenotypes of mild-to-moderate COPD by factor and cluster analysis of pulmonary function, CT imaging and breathomics in a population-based survey.

PubMed

Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J

2013-06-01

Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.
Plug cluster module demonstration

NASA Technical Reports Server (NTRS)

Rousar, D. C.

1978-01-01

The low pressure, film cooled rocket engine design concept developed during two previous ALRC programs was re-evaluated for application as a module for a plug cluster engine capable of performing space shuttle OTV missions. The nominal engine mixture ratio was 5.5 and the engine life requirements were 1200 thermal cycles and 10 hours total operating life. The program consisted of pretest analysis; engine tests, performed using residual components; and posttest analysis. The pretest analysis indicated that operation of the operation of the film cooled engine at O/F = 5.5 was feasible. During the engine tests, steady state wall temperature and performance measurement were obtained over a range of film cooling flow rates, and the durability of the engine was demonstrated by firing the test engine 1220 times at a nominal performance ranging from 430 - 432 seconds. The performance of the test engine was limited by film coolant sleeve damage which had occurred during previous testing. The post-test analyses indicated that the nominal performance level can be increased to 436 seconds.
Clustering of Dietary Patterns, Lifestyles, and Overweight among Spanish Children and Adolescents in the ANIBES Study

PubMed Central

Pérez-Rodrigo, Carmen; Gil, Ángel; González-Gross, Marcela; Ortega, Rosa M.; Serra-Majem, Lluis; Varela-Moreiras, Gregorio; Aranceta-Bartrina, Javier

2015-01-01

Weight gain has been associated with behaviors related to diet, sedentary lifestyle, and physical activity. We investigated dietary patterns and possible meaningful clustering of physical activity, sedentary behavior, and sleep time in Spanish children and adolescents and whether the identified clusters could be associated with overweight. Analysis was based on a subsample (n = 415) of the cross-sectional ANIBES study in Spain. We performed exploratory factor analysis and subsequent cluster analysis of dietary patterns, physical activity, sedentary behaviors, and sleep time. Logistic regression analysis was used to explore the association between the cluster solutions and overweight. Factor analysis identified four dietary patterns, one reflecting a profile closer to the traditional Mediterranean diet. Dietary patterns, physical activity behaviors, sedentary behaviors and sleep time on weekdays in Spanish children and adolescents clustered into two different groups. A low physical activity-poorer diet lifestyle pattern, which included a higher proportion of girls, and a high physical activity, low sedentary behavior, longer sleep duration, healthier diet lifestyle pattern. Although increased risk of being overweight was not significant, the Prevalence Ratios (PRs) for the low physical activity-poorer diet lifestyle pattern were >1 in children and in adolescents. The healthier lifestyle pattern included lower proportions of children and adolescents from low socioeconomic status backgrounds. PMID:26729155
Cluster and constraint analysis in tetrahedron packings

NASA Astrophysics Data System (ADS)

Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

2015-04-01

The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.
Estudio de la población estelar de varios cúmulos en Carina

NASA Astrophysics Data System (ADS)

Molina-Lera, J. A.; Baume, G. L.; Carraro, G.; Costa, E.

2015-08-01

Based on deep photometric data in the bands, complemented with infrared 2MASS data, we conducted an analysis of the fundamental parameters of six open clusters located in the Carina region. To perform a systematic study we developed a specialized code. In particular, we investigated the behavior of the respective lower main sequences. Our analysis indicated the presence of a significant population of pre-sequence stars in several of the clusters. We therefore obtained estimated values of contraction ages. Furthermore, we have determined the slopes of the initial mass functions of the studied clusters.

Prediction of chemotherapeutic response in bladder cancer using K-means clustering of dynamic contrast-enhanced (DCE)-MRI pharmacokinetic parameters.

PubMed

Nguyen, Huyen T; Jia, Guang; Shah, Zarine K; Pohar, Kamal; Mortazavi, Amir; Zynger, Debra L; Wei, Lai; Yang, Xiangyu; Clark, Daniel; Knopp, Michael V

2015-05-01

To apply k-means clustering of two pharmacokinetic parameters derived from 3T dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to predict the chemotherapeutic response in bladder cancer at the mid-cycle timepoint. With the predetermined number of three clusters, k-means clustering was performed on nondimensionalized Amp and kep estimates of each bladder tumor. Three cluster volume fractions (VFs) were calculated for each tumor at baseline and mid-cycle. The changes of three cluster VFs from baseline to mid-cycle were correlated with the tumor's chemotherapeutic response. Receiver-operating-characteristics curve analysis was used to evaluate the performance of each cluster VF change as a biomarker of chemotherapeutic response in bladder cancer. The k-means clustering partitioned each bladder tumor into cluster 1 (low kep and low Amp), cluster 2 (low kep and high Amp), cluster 3 (high kep and low Amp). The changes of all three cluster VFs were found to be associated with bladder tumor response to chemotherapy. The VF change of cluster 2 presented with the highest area-under-the-curve value (0.96) and the highest sensitivity/specificity/accuracy (96%/100%/97%) with a selected cutoff value. The k-means clustering of the two DCE-MRI pharmacokinetic parameters can characterize the complex microcirculatory changes within a bladder tumor to enable early prediction of the tumor's chemotherapeutic response. © 2014 Wiley Periodicals, Inc.
Prediction of chemotherapeutic response in bladder cancer using k-means clustering of DCE-MRI pharmacokinetic parameters

PubMed Central

Nguyen, Huyen T.; Jia, Guang; Shah, Zarine K.; Pohar, Kamal; Mortazavi, Amir; Zynger, Debra L.; Wei, Lai; Yang, Xiangyu; Clark, Daniel; Knopp, Michael V.

2015-01-01

Purpose To apply k-means clustering of two pharmacokinetic parameters derived from 3T DCE-MRI to predict chemotherapeutic response in bladder cancer at the mid-cycle time-point. Materials and Methods With the pre-determined number of 3 clusters, k-means clustering was performed on non-dimensionalized Amp and kep estimates of each bladder tumor. Three cluster volume fractions (VFs) were calculated for each tumor at baseline and mid-cycle. The changes of three cluster VFs from baseline to mid-cycle were correlated with the tumor’s chemotherapeutic response. Receiver-operating-characteristics curve analysis was used to evaluate the performance of each cluster VF change as a biomarker of chemotherapeutic response in bladder cancer. Results k-means clustering partitioned each bladder tumor into cluster 1 (low kep and low Amp), cluster 2 (low kep and high Amp), cluster 3 (high kep and low Amp). The changes of all three cluster VFs were found to be associated with bladder tumor response to chemotherapy. The VF change of cluster 2 presented with the highest area-under-the-curve value (0.96) and the highest sensitivity/specificity/accuracy (96%/100%/97%) with a selected cutoff value. Conclusion k-means clustering of the two DCE-MRI pharmacokinetic parameters can characterize the complex microcirculatory changes within a bladder tumor to enable early prediction of the tumor’s chemotherapeutic response. PMID:24943272
Cluster headache and the hypocretin receptor 2 reconsidered: a genetic association study and meta-analysis.

PubMed

Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje

2015-08-01

Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.
Personalized Medicine in Veterans with Traumatic Brain Injuries

DTIC Science & Technology

2013-05-01

Pair-Group Method using Arithmetic averages ( UPGMA ) based on cosine correlation of row mean centered log2 signal values; this was the top 50%-tile...cluster- ing was performed by the UPGMA method using Cosine correlation as the similarity metric. For comparative purposes, clustered heat maps included...non-mTBI cases were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as the similarity
Personalized Medicine in Veterans with Traumatic Brain Injuries

DTIC Science & Technology

2014-07-01

9 control cases are subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as the similarity...in unsu- pervised hierarchical clustering by the Un- weighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on cosine correlation of row...of log2 trans- formed MAS5.0 signal values; probe set cluster- ing was performed by the UPGMA method using Cosine correlation as the similarity
A new physical performance classification system for elite handball players: cluster analysis

PubMed Central

Chirosa, Ignacio J.; Robinson, Joseph E.; van der Tillaar, Roland; Chirosa, Luis J.; Martín, Isidoro Martínez

2016-01-01

Abstract The aim of the present study was to identify different cluster groups of handball players according to their physical performance level assessed in a series of physical assessments, which could then be used to design a training program based on individual strengths and weaknesses, and to determine which of these variables best identified elite performance in a group of under-19 [U19] national level handball players. Players of the U19 National Handball team (n=16) performed a set of tests to determine: 10 m (ST10) and 20 m (ST20) sprint time, ball release velocity (BRv), countermovement jump (CMJ) height and squat jump (SJ) height. All players also performed an incremental-load bench press test to determine the 1 repetition maximum (1RMest), the load corresponding to maximum mean power (LoadMP), the mean propulsive phase power at LoadMP (PMPPMP) and the peak power at LoadMP (PPEAKMP). Cluster analyses of the test results generated four groupings of players. The variables best able to discriminate physical performance were BRv, ST20, 1RMest, PPEAKMP and PMPPMP. These variables could help coaches identify talent or monitor the physical performance of athletes in their team. Each cluster of players has a particular weakness related to physical performance and therefore, the cluster results can be applied to a specific training programmed based on individual needs. PMID:28149376
Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.

PubMed

Gorzalczany, Marian B; Rudzinski, Filip

2017-06-07

This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.
An Assessment of the Condition of Coral Reefs off the Former Navy Bombing Ranges at Isla De Culebra and Isla De Vieques, Puerto Rico

DTIC Science & Technology

2005-04-01

Bray-Curtis distance measure with an Unweighted Pair Group Method with Arithmetic Averages ( UPGMA ) linkage method to perform a cluster analysis of the...59 35 Comparison of reef condition indicators clustering by UPGMA analysis...Polyvinyl Chloride RBD Red-band Disease SACEX Supporting Arms Coordination Exercise SAV Submerged Aquatic Vegetation SD Standard Deviation UPGMA
Low physical activity as a key differentiating factor in the potential high-risk profile for depressive symptoms in older adults.

PubMed

Holmquist, Sofie; Mattsson, Sabina; Schele, Ingrid; Nordström, Peter; Nordström, Anna

2017-09-01

The identification of potential high-risk groups for depression is of importance. The purpose of the present study was to identify high-risk profiles for depressive symptoms in older individuals, with a focus on functional performance. The population-based Healthy Ageing Initiative included 2,084 community-dwelling individuals (49% women) aged 70. Explorative cluster analysis was used to group participants according to functional performance level, using measures of basic mobility skills, gait variability, and grip strength. Intercluster differences in depressive symptoms (measured by the Geriatric Depression Scale [GDS]-15), physical activity (PA; measured objectively with the ActiGraph GT3X+), and a rich set of covariates were examined. The cluster analysis yielded a seven-cluster solution. One potential high-risk cluster was identified, with overrepresentation of individuals with GDS scores >5 (15.1 vs. 2.7% expected; relative risk = 6.99, P < .001); the prevalence of depressive symptoms was significantly lower in the other clusters (all P < .01). The potential high-risk cluster had significant overrepresentations of obese individuals (39.7 vs. 17.4% expected) and those with type 2 diabetes (24.7 vs. 8.5% expected), and underrepresentation of individuals who fulfilled the World Health Organization's PA recommendations (15.6 vs. 59.1% expected; all P < .01), as well as low levels of functional performance. The present study provided a potential high-risk profile for depressive symptoms among elderly community-dwelling individuals, which included low levels functional performance combined with low levels of PA. Including PA in medical screening of the elderly may aid in identification of potential high-risk individuals for depressive symptoms. © 2017 Wiley Periodicals, Inc.
Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.

PubMed

Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita

2016-04-01

Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Motivational and emotional profiles in university undergraduates: a self-determination theory perspective.

PubMed

González, Antonio; Paoloni, Verónica; Donolo, Danilo; Rinaudo, Cristina

2012-11-01

Previous research has focused on specific forms of self-determined motivation or discrete class-related emotions, but few studies have simultaneously examined both constructs. The aim of this study on 472 undergraduates was twofold: to perform cluster analysis to identify homogeneous groups of motivation in the sample; and to determine the profile of each cluster for emotions and academic achievement. Cluster analysis configured four groups in terms of motivation: controlled, autonomous, both high, and both low. Each cluster revealed a distinct emotional profile, autonomous motivation being the most adaptable with high scores for academic achievement and pleasant emotions and low values for unpleasant emotions. The results are discussed in the light of their implications for academic adjustment.
Comparative Investigation of Shared Filesystems for the LHCb Online Cluster

NASA Astrophysics Data System (ADS)

Vijay Kartik, S.; Neufeld, Niko

2012-12-01

This paper describes the investigative study undertaken to evaluate shared filesystem performance and suitability in the LHCb Online environment. Particular focus is given to the measurements and field tests designed and performed on an in-house OpenAFS setup; related comparisons with NFSv4 and GPFS (a clustered filesystem from IBM) are presented. The motivation for the investigation and the test setup arises from the need to serve common user-space like home directories, experiment software and control areas, and clustered log areas. Since the operational requirements on such user-space are stringent in terms of read-write operations (in frequency and access speed) and unobtrusive data relocation, test results are presented with emphasis on file-level performance, stability and “high-availability” of the shared filesystems. Use cases specific to the experiment operation in LHCb, including the specific handling of shared filesystems served to a cluster of 1500 diskless nodes, are described. Issues of prematurely expiring authenticated sessions are explicitly addressed, keeping in mind long-running analysis jobs on the Online cluster. In addition, quantitative test results are also presented with alternatives including NFSv4. Comparative measurements of filesystem performance benchmarks are presented, which are seen to be used as reference for decisions on potential migration of the current storage solution deployed in the LHCb online cluster.
Atomistic cluster alignment method for local order mining in liquids and glasses

NASA Astrophysics Data System (ADS)

Fang, X. W.; Wang, C. Z.; Yao, Y. X.; Ding, Z. J.; Ho, K. M.

2010-11-01

An atomistic cluster alignment method is developed to identify and characterize the local atomic structural order in liquids and glasses. With the “order mining” idea for structurally disordered systems, the method can detect the presence of any type of local order in the system and can quantify the structural similarity between a given set of templates and the aligned clusters in a systematic and unbiased manner. Moreover, population analysis can also be carried out for various types of clusters in the system. The advantages of the method in comparison with other previously developed analysis methods are illustrated by performing the structural analysis for four prototype systems (i.e., pure Al, pure Zr, Zr35Cu65 , and Zr36Ni64 ). The results show that the cluster alignment method can identify various types of short-range orders (SROs) in these systems correctly while some of these SROs are difficult to capture by most of the currently available analysis methods (e.g., Voronoi tessellation method). Such a full three-dimensional atomistic analysis method is generic and can be applied to describe the magnitude and nature of noncrystalline ordering in many disordered systems.
The Cluster Sensitivity Index: A Basic Measure of Classification Robustness

ERIC Educational Resources Information Center

Hom, Willard C.

2010-01-01

Analysts of institutional performance have occasionally used a peer grouping approach in which they compared institutions only to other institutions with similar characteristics. Because analysts historically have used cluster analysis to define peer groups (i.e., the group of comparable institutions), the author proposes and demonstrates with…
Pilot-in-the-Loop CFD Method Development

DTIC Science & Technology

2014-06-16

CFD analysis. Coupled simulations will be run at PSU on the COCOA -4 cluster, a high performance computing cluster. The CRUNCH CFD software has...been installed on the COCOA -4 servers and initial software tests are being conducted. Initial efforts will use the Generic Frigate Shape SFS-2 to
Individual Differences in Achievement Goals: A Longitudinal Study of Cognitive, Emotional, and Achievement Outcomes

ERIC Educational Resources Information Center

Daniels, Lia M.; Haynes, Tara L.; Stupnisky, Robert H.; Perry, Raymond P.; Newall, Nancy E.; Pekrun, Reinhard

2008-01-01

Within achievement goal theory debate remains regarding the adaptiveness of certain combinations of goals. Assuming a multiple-goals perspective, we used cluster analysis to classify 1002 undergraduate students according to their mastery and performance-approach goals. Four clusters emerged, representing different goal combinations: high…
Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

PubMed

Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

2012-01-01

Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.
Prediction models for clustered data: comparison of a random intercept and standard regression model

PubMed Central

2013-01-01

Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436
Prediction models for clustered data: comparison of a random intercept and standard regression model.

PubMed

Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne

2013-02-15

When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Users matter : multi-agent systems model of high performance computing cluster users.

DOE Office of Scientific and Technical Information (OSTI.GOV)

North, M. J.; Hood, C. S.; Decision and Information Sciences

2005-01-01

High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite their well-specified components, the aggregate behavior of clusters is poorly understood. The difficulties arise from complicated interactions between cluster components during operation. These interactions have been studied by many researchers, some of whom have identified the need for holistic multi-scale modeling that simultaneously includes network level, operating system level, process level, and user level behaviors. Each of these levels presents its own modeling challenges, but the user level is the most complex duemore » to the adaptability of human beings. In this vein, there are several major user modeling goals, namely descriptive modeling, predictive modeling and automated weakness discovery. This study shows how multi-agent techniques were used to simulate a large-scale computing cluster at each of these levels.« less

Coordinate based random effect size meta-analysis of neuroimaging studies.

PubMed

Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J

2017-06-01

Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.
Analysis of genetic association using hierarchical clustering and cluster validation indices.

PubMed

Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

2017-10-01

It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.
Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

PubMed

Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

2017-01-01

Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.
Differences in Pedaling Technique in Cycling: A Cluster Analysis.

PubMed

Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

2016-10-01

To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.
A stellar census in globular clusters with MUSE: The contribution of rotation to cluster dynamics studied with 200 000 stars

NASA Astrophysics Data System (ADS)

Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.

2018-02-01

This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

NASA Astrophysics Data System (ADS)

Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

2017-07-01

Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Who are the healthy active seniors? A cluster analysis.

PubMed

Lai, Claudia K Y; Chan, Engle Angela; Chin, Kenny C W

2014-12-01

This paper reports a cluster analysis of a sample recruited from a randomized controlled trial that explored the effect of using a life story work approach to improve the psychological outcomes of older people in the community. 238 subjects from community centers were included in this analysis. After statistical testing, 169 seniors were assigned to the active ageing (AG) cluster and 69 to the inactive ageing (IG) cluster. Those in the AG were younger and healthier, with fewer chronic diseases and fewer depressive symptoms than those in the IG. They were more satisfied with their lives, and had higher self-esteem. They met with their family members more frequently, they engaged in more leisure activities and were more likely to have the ability to move freely. In summary, active ageing was observed in people with better health and functional performance. Our results echoed the limited findings reported in the literature.
Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimers Disease

DTIC Science & Technology

2015-12-01

group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met- ric. For...differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as
Cluster analysis of phytoplankton data collected from the National Stream Quality Accounting Network in the Tennessee River basin, 1974-81

USGS Publications Warehouse

Stephens, D.W.; Wangsgard, J.B.

1988-01-01

A computer program, Numerical Taxonomy System of Multivariate Statistical Programs (NTSYS), was used with interfacing software to perform cluster analyses of phytoplankton data stored in the biological files of the U.S. Geological Survey. The NTSYS software performs various types of statistical analyses and is capable of handling a large matrix of data. Cluster analyses were done on phytoplankton data collected from 1974 to 1981 at four national Stream Quality Accounting Network stations in the Tennessee River basin. Analysis of the changes in clusters of phytoplankton genera indicated possible changes in the water quality of the French Broad River near Knoxville, Tennessee. At this station, the most common diatom groups indicated a shift in dominant forms with some of the less common diatoms being replaced by green and blue-green algae. There was a reduction in genera variability between 1974-77 and 1979-81 sampling periods. Statistical analysis of chloride and dissolved solids confirmed that concentrations of these substances were smaller in 1974-77 than in 1979-81. At Pickwick Landing Dam, the furthest downstream station used in the study, there was an increase in the number of genera of ' rare ' organisms with time. The appearance of two groups of green and blue-green algae indicated that an increase in temperature or nutrient concentrations occurred from 1974 to 1981, but this could not be confirmed using available water quality data. Associations of genera forming the phytoplankton communities at three stations on the Tennessee River were found to be seasonal. Nodal analysis of combined data from all four stations used in the study did not identify any seasonal or temporal patterns during 1974-81. Cluster analysis using the NYSYS programs was effective in reducing the large phytoplankton data set to a manageable size and provided considerable insight into the structure of phytoplankton communities in the Tennessee River basin. Problems encountered using cluster analysis were the subjectivity introduced in the definition of meaningful clusters, and the lack of taxonomic identification to the species level. (Author 's abstract)
Fingerprint analysis of Hibiscus mutabilis L. leaves based on ultra performance liquid chromatography with photodiode array detector combined with similarity analysis and hierarchical clustering analysis methods

PubMed Central

Liang, Xianrui; Ma, Meiling; Su, Weike

2013-01-01

Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD) combined with similarity analysis (SA) and hierarchical clustering analysis (HCA). Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs) of the relative retention times (RRT) and relative peak areas (RPA) of 10 characteristic peaks (one of them was identified as rutin) in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent. PMID:23930008
Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Geriatric Aide. Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lake County Area Vocational Center, Grayslake, IL.

This task analysis for nursing education provides performance standards, steps to be followed, knowledge required, attitudes to be developed, safety procedures, and equipment and supplies needed for 13 tasks performed by geriatric aides in the duty area of performing diagnostic measures and for 30 tasks in the duty area of providing therapeutic…
A dynamical study of Galactic globular clusters under different relaxation conditions

NASA Astrophysics Data System (ADS)

Zocchi, A.; Bertin, G.; Varri, A. L.

2012-03-01

Aims: We perform a systematic combined photometric and kinematic analysis of a sample of globular clusters under different relaxation conditions, based on their core relaxation time (as listed in available catalogs), by means of two well-known families of spherical stellar dynamical models. Systems characterized by shorter relaxation time scales are expected to be better described by isotropic King models, while less relaxed systems might be interpreted by means of non-truncated, radially-biased anisotropic f(ν) models, originally designed to represent stellar systems produced by a violent relaxation formation process and applied here for the first time to the study of globular clusters. Methods: The comparison between dynamical models and observations is performed by fitting simultaneously surface brightness and velocity dispersion profiles. For each globular cluster, the best-fit model in each family is identified, along with a full error analysis on the relevant parameters. Detailed structural properties and mass-to-light ratios are also explicitly derived. Results: We find that King models usually offer a good representation of the observed photometric profiles, but often lead to less satisfactory fits to the kinematic profiles, independently of the relaxation condition of the systems. For some less relaxed clusters, f(ν) models provide a good description of both observed profiles. Some derived structural characteristics, such as the total mass or the half-mass radius, turn out to be significantly model-dependent. The analysis confirms that, to answer some important dynamical questions that bear on the formation and evolution of globular clusters, it would be highly desirable to acquire larger numbers of accurate kinematic data-points, well distributed over the cluster field. Appendices are available in electronic form at http://www.aanda.org
Accelerating DNA analysis applications on GPU clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Villa, Oreste

DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
Nursing home care quality: a cluster analysis.

PubMed

Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

2017-02-13

Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.
Clinical phenotypes and survival of pre-capillary pulmonary hypertension in systemic sclerosis.

PubMed

Launay, David; Montani, David; Hassoun, Paul M; Cottin, Vincent; Le Pavec, Jérôme; Clerson, Pierre; Sitbon, Olivier; Jaïs, Xavier; Savale, Laurent; Weatherald, Jason; Sobanski, Vincent; Mathai, Stephen C; Shafiq, Majid; Cordier, Jean-François; Hachulla, Eric; Simonneau, Gérald; Humbert, Marc

2018-01-01

Pre-capillary pulmonary hypertension (PH) in systemic sclerosis (SSc) is a heterogeneous condition with an overall bad prognosis. The objective of this study was to identify and characterize homogeneous phenotypes by a cluster analysis in SSc patients with PH. Patients were identified from two prospective cohorts from the US and France. Clinical, pulmonary function, high-resolution chest tomography, hemodynamic and survival data were extracted. We performed cluster analysis using the k-means method and compared survival between clusters using Cox regression analysis. Cluster analysis of 200 patients identified four homogenous phenotypes. Cluster C1 included patients with mild to moderate risk pulmonary arterial hypertension (PAH) with limited or no interstitial lung disease (ILD) and low DLCO with a 3-year survival of 81.5% (95% CI: 71.4-88.2). C2 had pre-capillary PH due to extensive ILD and worse 3-year survival compared to C1 (adjusted hazard ratio [HR] 3.14; 95% CI 1.66-5.94; p = 0.0004). C3 had severe PAH and a trend towards worse survival (HR 2.53; 95% CI 0.99-6.49; p = 0.052). Cluster C4 and C1 were similar with no difference in survival (HR 0.65; 95% CI 0.19-2.27, p = 0.507) but with a higher DLCO in C4. PH in SSc can be characterized into distinct clusters that differ in prognosis.
Parallel File System I/O Performance Testing On LANL Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wiens, Isaac Christian; Green, Jennifer Kathleen

2016-08-18

These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.
Variability in body size and shape of UK offshore workers: A cluster analysis approach.

PubMed

Stewart, Arthur; Ledingham, Robert; Williams, Hector

2017-01-01

Male UK offshore workers have enlarged dimensions compared with UK norms and knowledge of specific sizes and shapes typifying their physiques will assist a range of functions related to health and ergonomics. A representative sample of the UK offshore workforce (n = 588) underwent 3D photonic scanning, from which 19 extracted dimensional measures were used in k-means cluster analysis to characterise physique groups. Of the 11 resulting clusters four somatotype groups were expressed: one cluster was muscular and lean, four had greater muscularity than adiposity, three had equal adiposity and muscularity and three had greater adiposity than muscularity. Some clusters appeared constitutionally similar to others, differing only in absolute size. These cluster centroids represent an evidence-base for future designs in apparel and other applications where body size and proportions affect functional performance. They also constitute phenotypic evidence providing insight into the 'offshore culture' which may underpin the enlarged dimensions of offshore workers. Copyright © 2016 Elsevier Ltd. All rights reserved.
Detecting synchronization clusters in multivariate time series via coarse-graining of Markov chains.

PubMed

Allefeld, Carsten; Bialonski, Stephan

2007-12-01

Synchronization cluster analysis is an approach to the detection of underlying structures in data sets of multivariate time series, starting from a matrix R of bivariate synchronization indices. A previous method utilized the eigenvectors of R for cluster identification, analogous to several recent attempts at group identification using eigenvectors of the correlation matrix. All of these approaches assumed a one-to-one correspondence of dominant eigenvectors and clusters, which has however been shown to be wrong in important cases. We clarify the usefulness of eigenvalue decomposition for synchronization cluster analysis by translating the problem into the language of stochastic processes, and derive an enhanced clustering method harnessing recent insights from the coarse-graining of finite-state Markov processes. We illustrate the operation of our method using a simulated system of coupled Lorenz oscillators, and we demonstrate its superior performance over the previous approach. Finally we investigate the question of robustness of the algorithm against small sample size, which is important with regard to field applications.
Comparative analysis on the selection of number of clusters in community detection

NASA Astrophysics Data System (ADS)

Kawamoto, Tatsuro; Kabashima, Yoshiyuki

2018-02-01

We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
Cluster ensemble based on Random Forests for genetic data.

PubMed

Alhusain, Luluah; Hafez, Alaaeldin M

2017-01-01

Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

Improvement of hospital performance through innovation: toward the value of hospital care.

PubMed

Dias, Casimiro; Escoval, Ana

2013-01-01

The perspective of innovation as the strategic lever of organizational performance has been widespread in the hospital sector. While public value of innovation can be significant, it is not evident that innovation always ends up in higher levels of performance. Within this context, the purpose of the article was to critically analyze the relationship between innovation and performance, taking into account the specificities of the hospital sector. This article pulls together primary data on organizational flexibility, innovation, and performance from 95 hospitals in Portugal, collected through a survey, data from interviews to hospital administration boards, and a panel of 15 experts. The diversity of data sources allowed for triangulation. The article uses mixed methods to explore the relationship between innovation and performance in the hospital sector in Portugal. The relationship between innovation and performance is analyzed through cluster analysis, supplemented with content analysis of interviews and the technical nominal group. The main findings reveal that the cluster of efficient innovators has twice the level of performance than other clusters. Organizational flexibility and external cooperation are the 2 major factors explaining these differences. The article identifies various organizational strategies to use innovation in order to enhance hospital performance. Overall, it proposes the alignment of perspectives of different stakeholders on the value proposition of hospital services, the embeddedness of information loops, and continuous adjustments toward high-value services.
Improvement of hospital performance through innovation: toward the value of hospital care.

PubMed

Dias, Casimiro; Escoval, Ana

2013-01-01

The perspective of innovation as the strategic lever of organizational performance has been widespread in the hospital sector. While public value of innovation can be significant, it is not evident that innovation always ends up in higher levels of performance. Within this context, the purpose of the article was to critically analyze the relationship between innovation and performance,taking into account the specificities of the hospital sector. This article pulls together primary data on organizational flexibility, innovation, and performance from 95 hospitals in Portugal,collected through a survey, data from interviews to hospital administration boards, and a panel of 15 experts. The diversity of data sources allowed for triangulation. The article uses mixed methods to explore the relationship between innovation and performance in the hospital sector in Portugal. The relationship between innovation and performance is analyzed through cluster analysis, supplemented with content analysis of interviews and the technical nominal group. The main findings reveal that the cluster of efficient innovators has twice the level of performance than other clusters. Organizational flexibility and external cooperation are the 2 major factors explaining these differences. The article identifies various organizational strategies to use innovation in order to enhance hospital performance. Overall, it proposes the alignment of perspectives of different stakeholders on the value proposition of hospital services, the embeddedness of information loops, and continuous adjustments toward high-value services.
Dietary patterns by cluster analysis in pregnant women: relationship with nutrient intakes and dietary patterns in 7-year-old offspring.

PubMed

Freitas-Vilela, Ana Amélia; Smith, Andrew D A C; Kac, Gilberto; Pearson, Rebecca M; Heron, Jon; Emond, Alan; Hibbeln, Joseph R; Castro, Maria Beatriz Trindade; Emmett, Pauline M

2017-04-01

Little is known about how dietary patterns of mothers and their children track over time. The objectives of this study are to obtain dietary patterns in pregnancy using cluster analysis, to examine women's mean nutrient intakes in each cluster and to compare the dietary patterns of mothers to those of their children. Pregnant women (n = 12 195) from the Avon Longitudinal Study of Parents and Children reported their frequency of consumption of 47 foods and food groups. These data were used to obtain dietary patterns during pregnancy by cluster analysis. The absolute and energy-adjusted nutrient intakes were compared between clusters. Women's dietary patterns were compared with previously derived clusters of their children at 7 years of age. Multinomial logistic regression was performed to evaluate relationships comparing maternal and offspring clusters. Three maternal clusters were identified: 'fruit and vegetables', 'meat and potatoes' and 'white bread and coffee'. After energy adjustment women in the 'fruit and vegetables' cluster had the highest mean nutrient intakes. Mothers in the 'fruit and vegetables' cluster were more likely than mothers in 'meat and potatoes' (adjusted odds ratio [OR]: 2.00; 95% Confidence Interval [CI]: 1.69-2.36) or 'white bread and coffee' (OR: 2.18; 95% CI: 1.87-2.53) clusters to have children in a 'plant-based' cluster. However the majority of children were in clusters unrelated to their mother dietary pattern. Three distinct dietary patterns were obtained in pregnancy; the 'fruit and vegetables' pattern being the most nutrient dense. Mothers' dietary patterns were associated with but did not dominate offspring dietary patterns. © 2016 The Authors. Maternal & Child Nutrition published by John Wiley & Sons Ltd.
Path Analysis Tests of Theoretical Models of Children's Memory Performance

ERIC Educational Resources Information Center

DeMarie, Darlene; Miller, Patricia H.; Ferron, John; Cunningham, Walter R.

2004-01-01

Path analysis was used to test theoretical models of relations among variables known to predict differences in children's memory--strategies, capacity, and metamemory. Children in kindergarten to fourth grade (chronological ages 5 to 11) performed different memory tasks. Several strategies (i.e., sorting, clustering, rehearsal, and self-testing)…
Joint fMRI analysis and subject clustering using sparse dictionary learning

NASA Astrophysics Data System (ADS)

Kim, Seung-Jun; Dontaraju, Krishna K.

2017-08-01

Multi-subject fMRI data analysis methods based on sparse dictionary learning are proposed. In addition to identifying the component spatial maps by exploiting the sparsity of the maps, clusters of the subjects are learned by postulating that the fMRI volumes admit a subspace clustering structure. Furthermore, in order to tune the associated hyper-parameters systematically, a cross-validation strategy is developed based on entry-wise sampling of the fMRI dataset. Efficient algorithms for solving the proposed constrained dictionary learning formulations are developed. Numerical tests performed on synthetic fMRI data show promising results and provides insights into the proposed technique.
TECHNOLOGICAL INNOVATION IN NEUROSURGERY: A QUANTITATIVE STUDY

PubMed Central

Marcus, Hani J; Hughes-Hallett, Archie; Kwasnicki, Richard M; Darzi, Ara; Yang, Guang-Zhong; Nandi, Dipankar

2015-01-01

Object Technological innovation within healthcare may be defined as the introduction of a new technology that initiates a change in clinical practice. Neurosurgery is a particularly technologically intensive surgical discipline, and new technologies have preceded many of the major advances in operative neurosurgical technique. The aim of the present study was to quantitatively evaluate technological innovation in neurosurgery using patents and peer-reviewed publications as metrics of technology development and clinical translation respectively. Methods A patent database was searched between 1960 and 2010 using the search terms “neurosurgeon” OR “neurosurgical” OR “neurosurgery”. The top 50 performing patent codes were then grouped into technology clusters. Patent and publication growth curves were then generated for these technology clusters. A top performing technology cluster was then selected as an exemplar for more detailed analysis of individual patents. Results In all, 11,672 patents and 208,203 publications relating to neurosurgery were identified. The top performing technology clusters over the 50 years were: image guidance devices, clinical neurophysiology devices, neuromodulation devices, operating microscopes and endoscopes. Image guidance and neuromodulation devices demonstrated a highly correlated rapid rise in patents and publications, suggesting they are areas of technology expansion. In-depth analysis of neuromodulation patents revealed that the majority of high performing patents were related to Deep Brain Stimulation (DBS). Conclusions Patent and publication data may be used to quantitatively evaluate technological innovation in neurosurgery. PMID:25699414
Fitness as a determinant of arterial stiffness in healthy adult men: a cross-sectional study.

PubMed

Chung, Jinwook; Kim, Milyang; Jin, Youngsoo; Kim, Yonghwan; Hong, Jeeyoung

2018-01-01

Fitness is known to influence arterial stiffness. This study aimed to assess differences in cardiorespiratory endurance, muscular strength, and flexibility according to arterial stiffness, based on sex and age. We enrolled 1590 healthy adults (men: 1242, women: 348) who were free of metabolic syndrome. We measured cardiorespiratory endurance in an exercise stress test on a treadmill, muscular strength by a grip test, and flexibility by upper body forward-bends from a standing position. The brachial-ankle pulse wave velocity test was performed to measure arterial stiffness before the fitness test. Cluster analysis was performed to divide the patients into groups with low (Cluster 1) and high (Cluster 2) arterial stiffness. According to the k-cluster analysis results, Cluster 1 included 624 men and 180 women, and Cluster 2 included 618 men and 168 women. Men in the middle-aged group with low arterial stiffness demonstrated higher cardiorespiratory endurance, muscular strength, and flexibility than those with high arterial stiffness. Similarly, among men in the old-aged group, the cardiorespiratory endurance and muscular strength, but not flexibility, differed significantly according to arterial stiffness. Women in both clusters showed similar cardiorespiratory endurance, muscular strength, and flexibility regardless of their arterial stiffness. Among healthy adults, arterial stiffness was inversely associated with fitness in men but not in women. Therefore, fitness seems to be a determinant for arterial stiffness in men. Additionally, regular exercise should be recommended for middle-aged men to prevent arterial stiffness.
A graph-Laplacian-based feature extraction algorithm for neural spike sorting.

PubMed

Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos

2009-01-01

Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.
Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain.

PubMed

Wolf, Antje; Kirschner, Karl N

2013-02-01

With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis

ERIC Educational Resources Information Center

Steinley, Douglas; Brusco, Michael J.

2008-01-01

A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these…
Cluster-specific small airway modeling for imaging-based CFD analysis of pulmonary air flow and particle deposition in COPD smokers

NASA Astrophysics Data System (ADS)

Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

2017-11-01

Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.
Microforms in gravel bed rivers: Formation, disintegration, and effects on bedload transport

USGS Publications Warehouse

Strom, K.; Papanicolaou, A.N.; Evangelopoulos, N.; Odeh, M.

2004-01-01

This research aims to advance current knowledge on cluster formation and evolution by tackling some of the aspects associated with cluster microtopography and the effects of clusters on bedload transport. The specific objectives of the study are (1) to identify the bed shear stress range in which clusters form and disintegrate, (2) to quantitatively describe the spacing characteristics and orientation of clusters with respect to flow characteristics, (3) to quantify the effects clusters have on the mean bedload rate, and (4) to assess the effects of clusters on the pulsating nature of bedload. In order to meet the objectives of this study, two main experimental scenarios, namely, Test Series A and B (20 experiments overall) are considered in a laboratory flume under well-controlled conditions. Series A tests are performed to address objectives (1) and (2) while Series B is designed to meet objectives (3) and (4). Results show that cluster microforms develop in uniform sediment at 1.25 to 2 times the Shields parameter of an individual particle and start disintegrating at about 2.25 times the Shields parameter. It is found that during an unsteady flow event, effects of clusters on bedload transport rate can be classified in three different phases: a sink phase where clusters absorb incoming sediment, a neutral phase where clusters do not affect bedload, and a source phase where clusters release particles. Clusters also increase the magnitude of the fluctuations in bedload transport rate, showing that clusters amplify the unsteady nature of bedload transport. A fourth-order autoregressive, autoregressive integrated moving average model is employed to describe the time series of bedload and provide a predictive formula for predicting bedload at different periods. Finally, a change-point analysis enhanced with a binary segmentation procedure is performed to identify the abrupt changes in the bedload statistic characteristics due to the effects of clusters and detect the different phases in bedload time series using probability theory. The analysis verifies the experimental findings that three phases are detected in the bedload rate time series structure, namely, sink, neutral, and source. ?? ASCE / JUNE 2004.
Classifying Higher Education Institutions in Korea: A Performance-Based Approach

ERIC Educational Resources Information Center

Shin, Jung Cheol

2009-01-01

The purpose of this study was to classify higher education institutions according to institutional performance rather than predetermined benchmarks. Institutional performance was defined as research performance and classified using Hierarchical Cluster Analysis, a statistical method that classifies objects according to specified classification…
Space-time analysis of pneumonia hospitalisations in the Netherlands.

PubMed

Benincà, Elisa; van Boven, Michiel; Hagenaars, Thomas; van der Hoek, Wim

2017-01-01

Community acquired pneumonia is a major global public health problem. In the Netherlands there are 40,000-50,000 hospital admissions for pneumonia per year. In the large majority of these hospital admissions the etiologic agent is not determined and a real-time surveillance system is lacking. Localised and temporal increases in hospital admissions for pneumonia are therefore only detected retrospectively and the etiologic agents remain unknown. Here, we perform spatio-temporal analyses of pneumonia hospital admission data in the Netherlands. To this end, we scanned for spatial clusters on yearly and seasonal basis, and applied wavelet cluster analysis on the time series of five main regions. The pneumonia hospital admissions show strong clustering in space and time superimposed on a regular yearly cycle with high incidence in winter and low incidence in summer. Cluster analysis reveals a heterogeneous pattern, with most significant clusters occurring in the western, highly urbanised, and in the eastern, intensively farmed, part of the Netherlands. Quantitatively, the relative risk (RR) of the significant clusters for the age-standardised incidence varies from a minimum of 1.2 to a maximum of 2.2. We discuss possible underlying causes for the patterns observed, such as variations in air pollution.
Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimer’s Disease

DTIC Science & Technology

2013-10-01

correct group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...centering of log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met...A) The 108 differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with
Hemodynamic Response to Interictal Epileptiform Discharges Addressed by Personalized EEG-fNIRS Recordings

PubMed Central

Pellegrino, Giovanni; Machado, Alexis; von Ellenrieder, Nicolas; Watanabe, Satsuki; Hall, Jeffery A.; Lina, Jean-Marc; Kobayashi, Eliane; Grova, Christophe

2016-01-01

Objective: We aimed at studying the hemodynamic response (HR) to Interictal Epileptic Discharges (IEDs) using patient-specific and prolonged simultaneous ElectroEncephaloGraphy (EEG) and functional Near InfraRed Spectroscopy (fNIRS) recordings. Methods: The epileptic generator was localized using Magnetoencephalography source imaging. fNIRS montage was tailored for each patient, using an algorithm to optimize the sensitivity to the epileptic generator. Optodes were glued using collodion to achieve prolonged acquisition with high quality signal. fNIRS data analysis was handled with no a priori constraint on HR time course, averaging fNIRS signals to similar IEDs. Cluster-permutation analysis was performed on 3D reconstructed fNIRS data to identify significant spatio-temporal HR clusters. Standard (GLM with fixed HRF) and cluster-permutation EEG-fMRI analyses were performed for comparison purposes. Results: fNIRS detected HR to IEDs for 8/9 patients. It mainly consisted oxy-hemoglobin increases (seven patients), followed by oxy-hemoglobin decreases (six patients). HR was lateralized in six patients and lasted from 8.5 to 30 s. Standard EEG-fMRI analysis detected an HR in 4/9 patients (4/9 without enough IEDs, 1/9 unreliable result). The cluster-permutation EEG-fMRI analysis restricted to the region investigated by fNIRS showed additional strong and non-canonical BOLD responses starting earlier than the IEDs and lasting up to 30 s. Conclusions: (i) EEG-fNIRS is suitable to detect the HR to IEDs and can outperform EEG-fMRI because of prolonged recordings and greater chance to detect IEDs; (ii) cluster-permutation analysis unveils additional HR features underestimated when imposing a canonical HR function (iii) the HR is often bilateral and lasts up to 30 s. PMID:27047325
A formal concept analysis approach to consensus clustering of multi-experiment expression data

PubMed Central

2014-01-01

Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

NASA Astrophysics Data System (ADS)

Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.

2014-06-01

Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

NASA Astrophysics Data System (ADS)

Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.

2014-11-01

Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Performance comparison analysis library communication cluster system using merge sort

NASA Astrophysics Data System (ADS)

Wulandari, D. A. R.; Ramadhan, M. E.

2018-04-01

Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.

Optimization of self-interstitial clusters in 3C-SiC with genetic algorithm

NASA Astrophysics Data System (ADS)

Ko, Hyunseok; Kaczmarowski, Amy; Szlufarska, Izabela; Morgan, Dane

2017-08-01

Under irradiation, SiC develops damage commonly referred to as black spot defects, which are speculated to be self-interstitial atom clusters. To understand the evolution of these defect clusters and their impacts (e.g., through radiation induced swelling) on the performance of SiC in nuclear applications, it is important to identify the cluster composition, structure, and shape. In this work the genetic algorithm code StructOpt was utilized to identify groundstate cluster structures in 3C-SiC. The genetic algorithm was used to explore clusters of up to ∼30 interstitials of C-only, Si-only, and Si-C mixtures embedded in the SiC lattice. We performed the structure search using Hamiltonians from both density functional theory and empirical potentials. The thermodynamic stability of clusters was investigated in terms of their composition (with a focus on Si-only, C-only, and stoichiometric) and shape (spherical vs. planar), as a function of the cluster size (n). Our results suggest that large Si-only clusters are likely unstable, and clusters are predominantly C-only for n ≤ 10 and stoichiometric for n > 10. The results imply that there is an evolution of the shape of the most stable clusters, where small clusters are stable in more spherical geometries while larger clusters are stable in more planar configurations. We also provide an estimated energy vs. size relationship, E(n), for use in future analysis.
A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

USGS Publications Warehouse

Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

2012-01-01

Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (p<0.0001). Conclusions: This study reveals that geographic hotspots of tractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.
Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

PubMed Central

Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

2012-01-01

Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

PubMed

Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

2015-10-01

Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

PubMed

Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

2016-04-01

Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.
A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

PubMed

Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

2017-09-01

The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Combinations of elevated tissue miRNA-17-92 cluster expression and serum prostate-specific antigen as potential diagnostic biomarkers for prostate cancer.

PubMed

Feng, Sujuan; Qian, Xiaosong; Li, Han; Zhang, Xiaodong

2017-12-01

The aim of the present study was to investigate the effectiveness of the miR-17-92 cluster as a disease progression marker in prostate cancer (PCa). Reverse transcription-quantitative polymerase chain reaction analysis was used to detect the microRNA (miR)-17-92 cluster expression levels in tissues from patients with PCa or benign prostatic hyperplasia (BPH), in addition to in PCa and BPH cell lines. Spearman correlation was used for comparison and estimation of correlations between miRNA expression levels and clinicopathological characteristics such as the Gleason score and prostate-specific antigen (PSA). Receiver operating curve (ROC) analysis was performed for evaluation of specificity and sensitivity of miR-17-92 cluster expression levels for discriminating patients with PCa from patients with BPH. Kaplan-Meier analysis was plotted to investigate the predictive potential of miR-17-92 cluster for PCa biochemical recurrence. Expression of the majority of miRNAs in the miR-17-92 cluster was identified to be significantly increased in PCa tissues and cell lines. Bivariate correlation analysis indicated that the high expression of unregulated miRNAs was positively correlated with Gleason grade, but had no significant association with PSA. ROC curves demonstrated that high expression of miR-17-92 cluster predicted a higher diagnostic accuracy compared with PSA. Improved discriminating quotients were observed when combinations of unregulated miRNAs with PSA were used. Survival analysis confirmed a high combined miRNA score of miR-17-92 cluster was associated with shorter biochemical recurrence interval. miR-17-92 cluster could be a potential diagnostic and prognostic biomarker for PCa, and the combination of the miR-17-92 cluster and serum PSA may enhance the accuracy for diagnosis of PCa.
Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

NASA Astrophysics Data System (ADS)

Žibert, Janez; Pražnikar, Jure

2012-09-01

The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of the diurnal concentration showed that various different day-profiles are presented in a cold period, while this is not the case for the hot season. Additional analysis of ship traffic and rain fall data showed that there is no statistically significant difference between the ship gross (bruto) registered tonnage (BRT) values in the case of BC and PM10 clusters, but that there is statistically significant differences between the rain fall in the BC and PM10 clusters. The wind-rose for clusters which included most days in the sampling period indicating that emitted PM10 and BC from Port of Koper were manly transported in the west direction over the sea and in the east direction, where there is in no populated area. Presented analysis showed that both BC and PM10 concentrations were driven by rain intensity and wind speed.
Non-specific filtering of beta-distributed data.

PubMed

Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori; Groshen, Susan; Siegmund, Kimberly D

2014-06-19

Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias. We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets. We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.
The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

PubMed

Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

2016-01-01

The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
Factors associated with the growing-finishing performances of swine herds: an exploratory study on serological and herd level indicators.

PubMed

Fablet, C; Rose, N; Grasland, B; Robert, N; Lewandowski, E; Gosselin, M

2018-01-01

Growing and finishing performances of pigs strongly influence farm efficiency and profitability. The performances of the pigs rely on the herd health status and also on several non-infectious factors. Many recommendations for the improvement of the technical performances of a herd are based on the results of studies assessing the effect of one or a limited number of infections or environmental factors. Few studies investigated jointly the influence of both type of factors on swine herd performances. This work aimed at identifying infectious and non-infectious factors associated with the growing and finishing performances of 41 French swine herds. Two groups of herds were identified using a clustering analysis: a cluster of 24 herds with the highest technical performance values (mean average daily gain = 781.1 g/day +/- 26.3; mean feed conversion ratio = 2.5 kg/kg +/- 0.1; mean mortality rate = 4.1% +/- 0.9; and mean carcass slaughter weight = 121.2 kg +/- 5.2) and a cluster of 17 herds with the lowest performance values (mean average daily gain =715.8 g/day +/- 26.5; mean feed conversion ratio = 2.6 kg/kg +/- 0.1; mean mortality rate = 6.8% +/- 2.0; and mean carcass slaughter weight = 117.7 kg +/- 3.6). Multiple correspondence analysis was used to identify factors associated with the level of technical performance. Infection with the porcine reproductive and respiratory syndrome virus and the porcine circovirus type 2 were infectious factors associated with the cluster having the lowest performance values. This cluster also featured farrow-to-finish type herds, a short interval between successive batches of pigs (≤3 weeks) and mixing of pigs from different batches in the growing or/and finishing steps. Inconsistency between nursery and fattening building management was another factor associated with the low-performance cluster. The odds of a herd showing low growing-finishing performance was significantly increased when infected by PRRS virus in the growing-finishing steps (OR = 8.8, 95% confidence interval [95% CI]: 1.8-41.7) and belonging to a farrow-to-finish type herd (OR = 5.1, 95% CI = 1.1-23.8). Herd management and viral infections significantly influenced the performance levels of the swine herds included in this study.
Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data.

PubMed

Yang, Mingxing; Li, Xiumin; Li, Zhibin; Ou, Zhimin; Liu, Ming; Liu, Suhuan; Li, Xuejun; Yang, Shuyu

2013-01-01

DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub's leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.
Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer.

PubMed

Giancarlo, Raffaele; Scaturro, Davide; Utro, Filippo

2008-10-29

Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. We consider five such measures: Clest, Consensus (Consensus Clustering), FOM (Figure of Merit), Gap (Gap Statistics) and ME (Model Explorer), in addition to the classic WCSS (Within Cluster Sum-of-Squares) and KL (Krzanowski and Lai index). We perform extensive experiments on six benchmark microarray datasets, using both Hierarchical and K-means clustering algorithms, and we provide an analysis assessing both the intrinsic ability of a measure to predict the correct number of clusters in a dataset and its merit relative to the other measures. We pay particular attention both to precision and speed. Moreover, we also provide various fast approximation algorithms for the computation of Gap, FOM and WCSS. The main result is a hierarchy of those measures in terms of precision and speed, highlighting some of their merits and limitations not reported before in the literature. Based on our analysis, we draw several conclusions for the use of those internal measures on microarray data. We report the main ones. Consensus is by far the best performer in terms of predictive power and remarkably algorithm-independent. Unfortunately, on large datasets, it may be of no use because of its non-trivial computer time demand (weeks on a state of the art PC). FOM is the second best performer although, quite surprisingly, it may not be competitive in this scenario: it has essentially the same predictive power of WCSS but it is from 6 to 100 times slower in time, depending on the dataset. The approximation algorithms for the computation of FOM, Gap and WCSS perform very well, i.e., they are faster while still granting a very close approximation of FOM and WCSS. The approximation algorithm for the computation of Gap deserves to be singled-out since it has a predictive power far better than Gap, it is competitive with the other measures, but it is at least two order of magnitude faster in time with respect to Gap. Another important novel conclusion that can be drawn from our analysis is that all the measures we have considered show severe limitations on large datasets, either due to computational demand (Consensus, as already mentioned, Clest and Gap) or to lack of precision (all of the other measures, including their approximations). The software and datasets are available under the GNU GPL on the supplementary material web page.
Optimized data fusion for K-means Laplacian clustering

PubMed Central

Yu, Shi; Liu, Xinhai; Tranchevent, Léon-Charles; Glänzel, Wolfgang; Suykens, Johan A. K.; De Moor, Bart; Moreau, Yves

2011-01-01

Motivation: We propose a novel algorithm to combine multiple kernels and Laplacians for clustering analysis. The new algorithm is formulated on a Rayleigh quotient objective function and is solved as a bi-level alternating minimization procedure. Using the proposed algorithm, the coefficients of kernels and Laplacians can be optimized automatically. Results: Three variants of the algorithm are proposed. The performance is systematically validated on two real-life data fusion applications. The proposed Optimized Kernel Laplacian Clustering (OKLC) algorithms perform significantly better than other methods. Moreover, the coefficients of kernels and Laplacians optimized by OKLC show some correlation with the rank of performance of individual data source. Though in our evaluation the K values are predefined, in practical studies, the optimal cluster number can be consistently estimated from the eigenspectrum of the combined kernel Laplacian matrix. Availability: The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/oklc.html. Contact: shiyu@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20980271
The adiposity of children is associated with their lifestyle behaviours: a cluster analysis of school-aged children from 12 nations.

PubMed

Dumuid, Dorothea; Olds, T; Lewis, L K; Martin-Fernández, J A; Barreira, T; Broyles, S; Chaput, J-P; Fogelholm, M; Hu, G; Kuriyan, R; Kurpad, A; Lambert, E V; Maia, J; Matsudo, V; Onywera, V O; Sarmiento, O L; Standage, M; Tremblay, M S; Tudor-Locke, C; Zhao, P; Katzmarzyk, P; Gillison, F; Maher, C

2018-02-01

The relationship between children's adiposity and lifestyle behaviour patterns is an area of growing interest. The objectives of this study are to identify clusters of children based on lifestyle behaviours and compare children's adiposity among clusters. Cross-sectional data from the International Study of Childhood Obesity, Lifestyle and the Environment were used. the participants were children (9-11 years) from 12 nations (n = 5710). 24-h accelerometry and self-reported diet and screen time were clustering input variables. Objectively measured adiposity indicators were waist-to-height ratio, percent body fat and body mass index z-scores. sex-stratified analyses were performed on the global sample and repeated on a site-wise basis. Cluster analysis (using isometric log ratios for compositional data) was used to identify common lifestyle behaviour patterns. Site representation and adiposity were compared across clusters using linear models. Four clusters emerged: (1) Junk Food Screenies, (2) Actives, (3) Sitters and (4) All-Rounders. Countries were represented differently among clusters. Chinese children were over-represented in Sitters and Colombian children in Actives. Adiposity varied across clusters, being highest in Sitters and lowest in Actives. Children from different sites clustered into groups of similar lifestyle behaviours. Cluster membership was linked with differing adiposity. Findings support the implementation of activity interventions in all countries, targeting both physical activity and sedentary time. © 2016 World Obesity Federation.
Subgroups of advanced cancer patients clustered by their symptom profiles: quality-of-life outcomes.

PubMed

Husain, Amna; Myers, Jeff; Selby, Debbie; Thomson, Barbara; Chow, Edward

2011-11-01

Symptom cluster analysis is a new frontier of research in symptom management. This study clustered patients by their symptom profiles to identify subgroups that may be at higher risk for poor quality of life (QOL) and that may, therefore, benefit most from targeted interventions. Longitudinal study of metastatic cancer patients using the Edmonton Symptom Assessment Scale (ESAS). We generated two-, three-, and four-cluster subgroups and examined the relationship of cluster membership with patient outcomes. To address the problem of missing longitudinal data, we developed a novel outcome variable (QualTime) that measures both QOL and time in study. Two hundred and twenty-one patients with a mean Palliative Performance Scale (PPS) of 59.1 were enrolled. The three-cluster model was chosen for further analysis. The low-burden subgroup had all low severity symptom scores. The intermediate subgroup separates from the low-burden group on the "debility" profile of fatigue, drowsiness, appetite, and well-being. The high-burden group separates from the intermediate-burden group on pain, depression, and anxiety. At baseline, PPS (p=0.0003) and cluster membership (p<0.0001) contributed significantly to global QOL. In univariate analysis, cluster membership was related to the longitudinal outcome, QualTime. In a multivariate model, the relationship of PPS to QualTime was still significant (p=0.0002), but subgroup membership was no longer significant (p=0.1009). PPS is a stronger predictor of the longitudinal variable than cluster subgroups; however, cluster subgroups provide a target for clinical interventions that may improve QOL.
A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.

The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in jobmore » queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.« less
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

PubMed

Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

2016-08-31

Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.
The application of cluster analysis in the intercomparison of loop structures in RNA.

PubMed

Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

2005-04-01

We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.
The application of cluster analysis in the intercomparison of loop structures in RNA

PubMed Central

HUANG, HUNG-CHUNG; NAGASWAMY, UMA; FOX, GEORGE E.

2005-01-01

We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence. PMID:15769871

Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

PubMed

Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio

2018-05-04

In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

NASA Astrophysics Data System (ADS)

Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

2017-03-01

Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Profile Analysis of the Woodcock-Johnson III Tests of Cognitive Abilities with Gifted Students.

ERIC Educational Resources Information Center

Rizza, Mary G.; McIntosh, David E.; McCunn, Alice

2001-01-01

The Cattell-Horn-Carroll (CHC) factor clusters of the Woodcock-Johnson III Tests of Cognitive Abilities (WJ III COG) were studied with a group of gifted and nongifted individuals. Results found both groups display similar patterns of performance across the CHC factor clusters. Discusses clinical and educational considerations when using the WJ III…
The X-CLASS-redMaPPer galaxy cluster comparison. I. Identification procedures

NASA Astrophysics Data System (ADS)

Sadibekova, T.; Pierre, M.; Clerc, N.; Faccioli, L.; Gastaud, R.; Le Fevre, J.-P.; Rozo, E.; Rykoff, E.

2014-11-01

Context. This paper is the first in a series undertaking a comprehensive correlation analysis between optically selected and X-ray-selected cluster catalogues. The rationale of the project is to develop a holistic picture of galaxy clusters utilising optical and X-ray-cluster-selected catalogues with well-understood selection functions. Aims: Unlike most of the X-ray/optical cluster correlations to date, the present paper focuses on the non-matching objects in either waveband. We investigate how the differences observed between the optical and X-ray catalogues may stem from (1) a shortcoming of the detection algorithms; (2) dispersion in the X-ray/optical scaling relations; or (3) substantial intrinsic differences between the cluster populations probed in the X-ray and optical bands. The aim is to inventory and elucidate these effects in order to account for selection biases in the further determination of X-ray/optical cluster scaling relations. Methods: We correlated the X-CLASS serendipitous cluster catalogue extracted from the XMM archive with the redMaPPer optical cluster catalogue derived from the Sloan Digital Sky Survey (DR8). We performed a detailed and, in large part, interactive analysis of the matching output from the correlation. The overlap between the two catalogues has been accurately determined and possible cluster positional errors were manually recovered. The final samples comprise 270 and 355 redMaPPer and X-CLASS clusters, respectively. X-ray cluster matching rates were analysed as a function of optical richness. In the second step, the redMaPPer clusters were correlated with the entire X-ray catalogue, containing point and uncharacterised sources (down to a few 10-15 erg s-1 cm-2 in the [0.5-2] keV band). A stacking analysis was performed for the remaining undetected optical clusters. Results: We find that all rich (λ ≥ 80) clusters are detected in X-rays out to z = 0.6. Below this redshift, the richness threshold for X-ray detection steadily decreases with redshift. Likewise, all X-ray bright clusters are detected by redMaPPer. After correcting for obvious pipeline shortcomings (about 10% of the cases both in optical and X-ray), ~50% of the redMaPPer (down to a richness of 20) are found to coincide with an X-CLASS cluster; when considering X-ray sources of any type, this fraction increases to ~80%; for the remaining objects, the stacking analysis finds a weak signal within 0.5 Mpc around the cluster optical centres. The fraction of clusters totally dominated by AGN-type emission appears to be a few percent. Conversely, ~40% of the X-CLASS clusters are identified with a redMaPPer (down to a richness of 20) - part of the non-matches being due to the X-CLASS sample extending further out than redMaPPer (z< 1.5 vs. z< 0.6), but extending the correlation down to a richness of 5 raises the matching rate to ~65%. Conclusions: This state-of-the-art study involving two well-validated cluster catalogues has shown itself to be complex, and it points to a number of issues inherent to blind cross-matching, owing both to pipeline shortcomings and cluster peculiar properties. These can only been accounted for after a manual check. The combined X-ray and optical scaling relations will be presented in a subsequent article.
Clustering Genes of Common Evolutionary History

PubMed Central

Gori, Kevin; Suchan, Tomasz; Alvarez, Nadir; Goldman, Nick; Dessimoz, Christophe

2016-01-01

Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent—due to events such as incomplete lineage sorting or horizontal gene transfer—it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such “process-agnostic” approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward’s method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta. We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl). PMID:26893301
Efficient generation of low-energy folded states of a model protein

NASA Astrophysics Data System (ADS)

Gordon, Heather L.; Kwan, Wai Kei; Gong, Chunhang; Larrass, Stefan; Rothstein, Stuart M.

2003-01-01

A number of short simulated annealing runs are performed on a highly-frustrated 46-"residue" off-lattice model protein. We perform, in an iterative fashion, a principal component analysis of the 946 nonbonded interbead distances, followed by two varieties of cluster analyses: hierarchical and k-means clustering. We identify several distinct sets of conformations with reasonably consistent cluster membership. Nonbonded distance constraints are derived for each cluster and are employed within a distance geometry approach to generate many new conformations, previously unidentified by the simulated annealing experiments. Subsequent analyses suggest that these new conformations are members of the parent clusters from which they were generated. Furthermore, several novel, previously unobserved structures with low energy were uncovered, augmenting the ensemble of simulated annealing results, and providing a complete distribution of low-energy states. The computational cost of this approach to generating low-energy conformations is small when compared to the expense of further Monte Carlo simulated annealing runs.
Local resonances in STM manipulation of chlorobenzene on Si(111)-7×7: performance of different cluster models and density functionals

NASA Astrophysics Data System (ADS)

Utecht, Manuel; Klamroth, Tillmann

2018-07-01

Hot localised charge carriers on the Si(111)-7×7 surface are modelled by small charged clusters. Such resonances induce non-local desorption, i.e. more than 10 nm away from the injection site, of chlorobenzene in scanning tunnelling microscope experiments. We used such a cluster model to characterise resonance localisation and vibrational activation for positive and negative resonances recently. In this work, we investigate to which extent the model depends on details of the used cluster or quantum chemistry methods and try to identify the smallest possible cluster suitable for a description of the neutral surface and the ion resonances. Furthermore, a detailed analysis for different chemisorption orientations is performed. While some properties, as estimates of the resonance energy or absolute values for atomic changes, show such a dependency, the main findings are very robust with respect to changes in the model and/or the chemisorption geometry.
Equivalent damage validation by variable cluster analysis

NASA Astrophysics Data System (ADS)

Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

2016-06-01

The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.
Percolation analyses of observed and simulated galaxy clustering

NASA Astrophysics Data System (ADS)

Bhavsar, S. P.; Barrow, J. D.

1983-11-01

A percolation cluster analysis is performed on equivalent regions of the CFA redshift survey of galaxies and the 4000 body simulations of gravitational clustering made by Aarseth, Gott and Turner (1979). The observed and simulated percolation properties are compared and, unlike correlation and multiplicity function analyses, favour high density (Omega = 1) models with n = - 1 initial data. The present results show that the three-dimensional data are consistent with the degree of filamentary structure present in isothermal models of galaxy formation at the level of percolation analysis. It is also found that the percolation structure of the CFA data is a function of depth. Percolation structure does not appear to be a sensitive probe of intrinsic filamentary structure.
Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

PubMed

Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

2016-01-01

Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.
Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

PubMed Central

Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

2016-01-01

Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610
Universal patterns of equilibrium cluster growth in aqueous sugars observed by dynamic light scattering.

PubMed

Sidebottom, D L; Tran, Tri D

2010-11-01

Dynamic light scattering performed on aqueous solutions of three sugars (glucose, maltose and sucrose) reveal a common pattern of sugar cluster formation with a narrow cluster size distribution. In each case, equilibrium clusters form whose size increases with increasing sugar content in an identical power law manner in advance of a common, critical-like, percolation threshold near 83 wt % sugar. The critical exponent of the power law divergence of the cluster size varies with temperature, increasing with decreasing temperature, due to changes in the strength of the intermolecular hydrogen bond and appears to vanish for temperatures in excess of 90 °C. Detailed analysis of the cluster growth process suggests a two-stage process: an initial cluster phase formed at low volume fractions, ϕ, consisting of noninteracting, monodisperse sugar clusters whose size increases ϕ(1/3) followed by an aggregation stage, active at concentrations above about ϕ=40%, where cluster-cluster contact first occurs.
A cluster analytic study of the Wechsler Intelligence Test for Children-IV in children referred for psychoeducational assessment due to persistent academic difficulties.

PubMed

Hale, Corinne R; Casey, Joseph E; Ricciardi, Philip W R

2014-02-01

Wechsler Intelligence Test for Children-IV core subtest scores of 472 children were cluster analyzed to determine if reliable and valid subgroups would emerge. Three subgroups were identified. Clusters were reliable across different stages of the analysis as well as across algorithms and samples. With respect to external validity, the Globally Low cluster differed from the other two clusters on Wechsler Individual Achievement Test-II Word Reading, Numerical Operations, and Spelling subtests, whereas the latter two clusters did not differ from one another. The clusters derived have been identified in studies using previous WISC editions. Clusters characterized by poor performance on subtests historically associated with the VIQ (i.e., VCI + WMI) and PIQ (i.e., POI + PSI) did not emerge, nor did a cluster characterized by low scores on PRI subtests. Picture Concepts represented the highest subtest score in every cluster, failing to vary in a predictable manner with the other PRI subtests.
Maritime Continent seasonal climate biases in AMIP experiments of the CMIP5 multimodel ensemble

NASA Astrophysics Data System (ADS)

Toh, Ying Ying; Turner, Andrew G.; Johnson, Stephanie J.; Holloway, Christopher E.

2018-02-01

The fidelity of 28 Coupled Model Intercomparison Project phase 5 (CMIP5) models in simulating mean climate over the Maritime Continent in the Atmospheric Model Intercomparison Project (AMIP) experiment is evaluated in this study. The performance of AMIP models varies greatly in reproducing seasonal mean climate and the seasonal cycle. The multi-model mean has better skill at reproducing the observed mean climate than the individual models. The spatial pattern of 850 hPa wind is better simulated than the precipitation in all four seasons. We found that model horizontal resolution is not a good indicator of model performance. Instead, a model's local Maritime Continent biases are somewhat related to its biases in the local Hadley circulation and global monsoon. The comparison with coupled models in CMIP5 shows that AMIP models generally performed better than coupled models in the simulation of the global monsoon and local Hadley circulation but less well at simulating the Maritime Continent annual cycle of precipitation. To characterize model systematic biases in the AMIP runs, we performed cluster analysis on Maritime Continent annual cycle precipitation. Our analysis resulted in two distinct clusters. Cluster I models are able to capture both the winter monsoon and summer monsoon shift, but they overestimate the precipitation; especially during the JJA and SON seasons. Cluster II models simulate weaker seasonal migration than observed, and the maximum rainfall position stays closer to the equator throughout the year. The tropics-wide properties of these clusters suggest a connection between the skill of simulating global properties of the monsoon circulation and the skill of simulating the regional scale of Maritime Continent precipitation.
The spatio-temporal mapping of epileptic networks: Combination of EEG–fMRI and EEG source imaging

PubMed Central

Vulliemoz, S.; Thornton, R.; Rodionov, R.; Carmichael, D.W.; Guye, M.; Lhatoo, S.; McEvoy, A.W.; Spinelli, L.; Michel, C.M.; Duncan, J.S.; Lemieux, L.

2009-01-01

Simultaneous EEG–fMRI acquisitions in patients with epilepsy often reveal distributed patterns of Blood Oxygen Level Dependant (BOLD) change correlated with epileptiform discharges. We investigated if electrical source imaging (ESI) performed on the interictal epileptiform discharges (IED) acquired during fMRI acquisition could be used to study the dynamics of the networks identified by the BOLD effect, thereby avoiding the limitations of combining results from separate recordings. Nine selected patients (13 IED types identified) with focal epilepsy underwent EEG–fMRI. Statistical analysis was performed using SPM5 to create BOLD maps. ESI was performed on the IED recorded during fMRI acquisition using a realistic head model (SMAC) and a distributed linear inverse solution (LAURA). ESI could not be performed in one case. In 10/12 remaining studies, ESI at IED onset (ESIo) was anatomically close to one BOLD cluster. Interestingly, ESIo was closest to the positive BOLD cluster with maximal statistical significance in only 4/12 cases and closest to negative BOLD responses in 4/12 cases. Very small BOLD clusters could also have clinical relevance in some cases. ESI at later time frame (ESIp) showed propagation to remote sources co-localised with other BOLD clusters in half of cases. In concordant cases, the distance between maxima of ESI and the closest EEG–fMRI cluster was less than 33 mm, in agreement with previous studies. We conclude that simultaneous ESI and EEG–fMRI analysis may be able to distinguish areas of BOLD response related to initiation of IED from propagation areas. This combination provides new opportunities for investigating epileptic networks. PMID:19408351
Integrating dynamic fuzzy C-means, data envelopment analysis and artificial neural network to online prediction performance of companies in stock exchange

NASA Astrophysics Data System (ADS)

Jahangoshai Rezaee, Mustafa; Jozmaleki, Mehrdad; Valipour, Mahsa

2018-01-01

One of the main features to invest in stock exchange companies is their financial performance. On the other hand, conventional evaluation methods such as data envelopment analysis are not only a retrospective process, but are also a process, which are incomplete and ineffective approaches to evaluate the companies in the future. To remove this problem, it is required to plan an expert system for evaluating organizations when the online data are received from stock exchange market. This paper deals with an approach for predicting the online financial performance of companies when data are received in different time's intervals. The proposed approach is based on integrating fuzzy C-means (FCM), data envelopment analysis (DEA) and artificial neural network (ANN). The classical FCM method is unable to update the number of clusters and their members when the data are changed or the new data are received. Hence, this method is developed in order to make dynamic features for the number of clusters and clusters members in classical FCM. Then, DEA is used to evaluate DMUs by using financial ratios to provide targets in neural network. Finally, the designed network is trained and prepared for predicting companies' future performance. The data on Tehran Stock Market companies for six consecutive years (2007-2012) are used to show the abilities of the proposed approach.
Strategic groups, performance, and strategic response in the nursing home industry.

PubMed

Zinn, J S; Aaronson, W E; Rosko, M D

1994-06-01

This study examines the effect of strategic group membership on nursing home performance and strategic behavior. Data from the 1987 Medicare and Medicaid Automated Certification Survey were combined with data from the 1987 and 1989 Pennsylvania Long Term Care Facility Questionnaire. The sample consisted of 383 Pennsylvania nursing homes. Cluster analysis was used to place the 383 nursing homes into strategic groups on the basis of variables measuring scope and resource deployment. Performance was measured by indicators of the quality of nursing home care (rates of pressure ulcers, catheterization, and restraint usage) and efficiency in services provision. Changes in Medicare participation after passage of the 1988 Medicare Catastrophic Coverage Act (MCCA) measured strategic behavior. MANOVA and Turkey HSD post hoc means tests determined if significant differences were associated with strategic group membership. Cluster analysis produced an optimal seven-group solution. Differences in group means were significant for the clustering, performance, and conduct variables (p < .0001). Strategic groups characterized by facilities providing a continuum of care services had the best patient care outcomes. The most efficient groups were characterized by facilities with high Medicare census. While all strategic groups increased Medicare census following passage of the MCCA, those dominated by for-profits had the greatest increases. Our analysis demonstrates that strategic orientation influences nursing home response to regulatory initiatives, a factor that should be recognized in policy formation directed at nursing home reform.
Analysis of ground-motion simulation big data

NASA Astrophysics Data System (ADS)

Maeda, T.; Fujiwara, H.

2016-12-01

We developed a parallel distributed processing system which applies a big data analysis to the large-scale ground motion simulation data. The system uses ground-motion index values and earthquake scenario parameters as input. We used peak ground velocity value and velocity response spectra as the ground-motion index. The ground-motion index values are calculated from our simulation data. We used simulated long-period ground motion waveforms at about 80,000 meshes calculated by a three dimensional finite difference method based on 369 earthquake scenarios of a great earthquake in the Nankai Trough. These scenarios were constructed by considering the uncertainty of source model parameters such as source area, rupture starting point, asperity location, rupture velocity, fmax and slip function. We used these parameters as the earthquake scenario parameter. The system firstly carries out the clustering of the earthquake scenario in each mesh by the k-means method. The number of clusters is determined in advance using a hierarchical clustering by the Ward's method. The scenario clustering results are converted to the 1-D feature vector. The dimension of the feature vector is the number of scenario combination. If two scenarios belong to the same cluster the component of the feature vector is 1, and otherwise the component is 0. The feature vector shows a `response' of mesh to the assumed earthquake scenario group. Next, the system performs the clustering of the mesh by k-means method using the feature vector of each mesh previously obtained. Here the number of clusters is arbitrarily given. The clustering of scenarios and meshes are performed by parallel distributed processing with Hadoop and Spark, respectively. In this study, we divided the meshes into 20 clusters. The meshes in each cluster are geometrically concentrated. Thus this system can extract regions, in which the meshes have similar `response', as clusters. For each cluster, it is possible to determine particular scenario parameters which characterize the cluster. In other word, by utilizing this system, we can obtain critical scenario parameters of the ground-motion simulation for each evaluation point objectively. This research was supported by CREST, JST.
Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students

ERIC Educational Resources Information Center

Valero-Mora, Pedro M.; Ledesma, Ruben D.

2011-01-01

This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…
An information theory analysis of spatial decisions in cognitive development

PubMed Central

Scott, Nicole M.; Sera, Maria D.; Georgopoulos, Apostolos P.

2015-01-01

Performance in a cognitive task can be considered as the outcome of a decision-making process operating across various knowledge domains or aspects of a single domain. Therefore, an analysis of these decisions in various tasks can shed light on the interplay and integration of these domains (or elements within a single domain) as they are associated with specific task characteristics. In this study, we applied an information theoretic approach to assess quantitatively the gain of knowledge across various elements of the cognitive domain of spatial, relational knowledge, as a function of development. Specifically, we examined changing spatial relational knowledge from ages 5 to 10 years. Our analyses consisted of a two-step process. First, we performed a hierarchical clustering analysis on the decisions made in 16 different tasks of spatial relational knowledge to determine which tasks were performed similarly at each age group as well as to discover how the tasks clustered together. We next used two measures of entropy to capture the gradual emergence of order in the development of relational knowledge. These measures of “cognitive entropy” were defined based on two independent aspects of chunking, namely (1) the number of clusters formed at each age group, and (2) the distribution of tasks across the clusters. We found that both measures of entropy decreased with age in a quadratic fashion and were positively and linearly correlated. The decrease in entropy and, therefore, gain of information during development was accompanied by improved performance. These results document, for the first time, the orderly and progressively structured “chunking” of decisions across the development of spatial relational reasoning and quantify this gain within a formal information-theoretic framework. PMID:25698915

Fast EEG spike detection via eigenvalue analysis and clustering of spatial amplitude distribution

NASA Astrophysics Data System (ADS)

Fukami, Tadanori; Shimada, Takamasa; Ishikawa, Bunnoshin

2018-06-01

Objective. In the current study, we tested a proposed method for fast spike detection in electroencephalography (EEG). Approach. We performed eigenvalue analysis in two-dimensional space spanned by gradients calculated from two neighboring samples to detect high-amplitude negative peaks. We extracted the spike candidates by imposing restrictions on parameters regarding spike shape and eigenvalues reflecting detection characteristics of individual medical doctors. We subsequently performed clustering, classifying detected peaks by considering the amplitude distribution at 19 scalp electrodes. Clusters with a small number of candidates were excluded. We then defined a score for eliminating spike candidates for which the pattern of detected electrodes differed from the overall pattern in a cluster. Spikes were detected by setting the score threshold. Main results. Based on visual inspection by a psychiatrist experienced in EEG, we evaluated the proposed method using two statistical measures of precision and recall with respect to detection performance. We found that precision and recall exhibited a trade-off relationship. The average recall value was 0.708 in eight subjects with the score threshold that maximized the F-measure, with 58.6 ± 36.2 spikes per subject. Under this condition, the average precision was 0.390, corresponding to a false positive rate 2.09 times higher than the true positive rate. Analysis of the required processing time revealed that, using a general-purpose computer, our method could be used to perform spike detection in 12.1% of the recording time. The process of narrowing down spike candidates based on shape occupied most of the processing time. Significance. Although the average recall value was comparable with that of other studies, the proposed method significantly shortened the processing time.
Characterization of HIV Transmission in South-East Austria

PubMed Central

Kessler, Harald H.; Haas, Bernhard; Stelzl, Evelyn; Weninger, Karin; Little, Susan J.; Mehta, Sanjay R.

2016-01-01

To gain deeper insight into the epidemiology of HIV-1 transmission in South-East Austria we performed a retrospective analysis of 259 HIV-1 partial pol sequences obtained from unique individuals newly diagnosed with HIV infection in South-East Austria from 2008 through 2014. After quality filtering, putative transmission linkages were inferred when two sequences were ≤1.5% genetically different. Multiple linkages were resolved into putative transmission clusters. Further phylogenetic analyses were performed using BEAST v1.8.1. Finally, we investigated putative links between the 259 sequences from South-East Austria and all publicly available HIV polymerase sequences in the Los Alamos National Laboratory HIV sequence database. We found that 45.6% (118/259) of the sampled sequences were genetically linked with at least one other sequence from South-East Austria forming putative transmission clusters. Clustering individuals were more likely to be men who have sex with men (MSM; p<0.001), infected with subtype B (p<0.001) or subtype F (p = 0.02). Among clustered males who reported only heterosexual (HSX) sex as an HIV risk, 47% clustered closely with MSM (either as pairs or within larger MSM clusters). One hundred and seven of the 259 sequences (41.3%) from South-East Austria had at least one putative inferred linkage with sequences from a total of 69 other countries. In conclusion, analysis of HIV-1 sequences from newly diagnosed individuals residing in South-East Austria revealed a high degree of national and international clustering mainly within MSM. Interestingly, we found that a high number of heterosexual males clustered within MSM networks, suggesting either linkage between risk groups or misrepresentation of sexual risk behaviors by subjects. PMID:26967154
Characterization of HIV Transmission in South-East Austria.

PubMed

Hoenigl, Martin; Chaillon, Antoine; Kessler, Harald H; Haas, Bernhard; Stelzl, Evelyn; Weninger, Karin; Little, Susan J; Mehta, Sanjay R

2016-01-01

To gain deeper insight into the epidemiology of HIV-1 transmission in South-East Austria we performed a retrospective analysis of 259 HIV-1 partial pol sequences obtained from unique individuals newly diagnosed with HIV infection in South-East Austria from 2008 through 2014. After quality filtering, putative transmission linkages were inferred when two sequences were ≤1.5% genetically different. Multiple linkages were resolved into putative transmission clusters. Further phylogenetic analyses were performed using BEAST v1.8.1. Finally, we investigated putative links between the 259 sequences from South-East Austria and all publicly available HIV polymerase sequences in the Los Alamos National Laboratory HIV sequence database. We found that 45.6% (118/259) of the sampled sequences were genetically linked with at least one other sequence from South-East Austria forming putative transmission clusters. Clustering individuals were more likely to be men who have sex with men (MSM; p<0.001), infected with subtype B (p<0.001) or subtype F (p = 0.02). Among clustered males who reported only heterosexual (HSX) sex as an HIV risk, 47% clustered closely with MSM (either as pairs or within larger MSM clusters). One hundred and seven of the 259 sequences (41.3%) from South-East Austria had at least one putative inferred linkage with sequences from a total of 69 other countries. In conclusion, analysis of HIV-1 sequences from newly diagnosed individuals residing in South-East Austria revealed a high degree of national and international clustering mainly within MSM. Interestingly, we found that a high number of heterosexual males clustered within MSM networks, suggesting either linkage between risk groups or misrepresentation of sexual risk behaviors by subjects.
Degree-based statistic and center persistency for brain connectivity analysis.

PubMed

Yoo, Kwangsun; Lee, Peter; Chung, Moo K; Sohn, William S; Chung, Sun Ju; Na, Duk L; Ju, Daheen; Jeong, Yong

2017-01-01

Brain connectivity analyses have been widely performed to investigate the organization and functioning of the brain, or to observe changes in neurological or psychiatric conditions. However, connectivity analysis inevitably introduces the problem of mass-univariate hypothesis testing. Although, several cluster-wise correction methods have been suggested to address this problem and shown to provide high sensitivity, these approaches fundamentally have two drawbacks: the lack of spatial specificity (localization power) and the arbitrariness of an initial cluster-forming threshold. In this study, we propose a novel method, degree-based statistic (DBS), performing cluster-wise inference. DBS is designed to overcome the above-mentioned two shortcomings. From a network perspective, a few brain regions are of critical importance and considered to play pivotal roles in network integration. Regarding this notion, DBS defines a cluster as a set of edges of which one ending node is shared. This definition enables the efficient detection of clusters and their center nodes. Furthermore, a new measure of a cluster, center persistency (CP) was introduced. The efficiency of DBS with a known "ground truth" simulation was demonstrated. Then they applied DBS to two experimental datasets and showed that DBS successfully detects the persistent clusters. In conclusion, by adopting a graph theoretical concept of degrees and borrowing the concept of persistence from algebraic topology, DBS could sensitively identify clusters with centric nodes that would play pivotal roles in an effect of interest. DBS is potentially widely applicable to variable cognitive or clinical situations and allows us to obtain statistically reliable and easily interpretable results. Hum Brain Mapp 38:165-181, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Clustering eating habits: frequent consumption of different dietary patterns among the Italian general population in the association with obesity, physical activity, sociocultural characteristics and psychological factors.

PubMed

Denoth, Francesca; Scalese, Marco; Siciliano, Valeria; Di Renzo, Laura; De Lorenzo, Antonino; Molinaro, Sabrina

2016-06-01

(a) To identify clusters of eating patterns among the Italian population aged 15-64 years, focusing on typical Mediterranean diet (Med-diet) items consumption; (b) to examine the distribution of eating habits, as identified clusters, among age classes and genders; (c) evaluate the impact of: belonging to a specific eating cluster, level of physical activity (PA), sociocultural and psychological factors, as elements determining weight abnormalities. Data for this cross-sectional study were collected using self-reporting questionnaires administered to a sample of 33,127 subjects participating in the Italian population survey on alcohol and other drugs (IPSAD(®)2011). The cluster analysis was performed on a subsample (n = 5278 subjects) which provided information on eating habits, and adapted to identify categories of eating patterns. Stepwise multinomial regression analysis was performed to evaluate the associations between weight categories and eating clusters, adjusted for the following background variables: PA levels, sociocultural and psychological factors. Three clusters were identified: "Mediterranean-like", "Western-like" and "low fruit/vegetables". Frequent consumption of Med-diet patterns was more common among females and elderly. The relationship between overweight/obesity and male gender, educational level, PA, depression and eating disorders (p < 0.05) was confirmed. Belonging to a cluster other than "Mediterranean-like" was significantly associated with obesity. The low consumption of Med-diet patterns among youth, and the frequent association of sociocultural, psychological issues and inappropriate lifestyle with overweight/obesity, highlight the need for an interdisciplinary approach including market policies, to promote a wider awareness of the Mediterranean eating habit benefits in combination with an appropriate lifestyle.
A Web-Based Multidrug-Resistant Organisms Surveillance and Outbreak Detection System with Rule-Based Classification and Clustering

PubMed Central

Tseng, Yi-Ju; Wu, Jung-Hsuan; Ping, Xiao-Ou; Lin, Hui-Chi; Chen, Ying-Yu; Shang, Rung-Ji; Chen, Ming-Yuan; Lai, Feipei

2012-01-01

Background The emergence and spread of multidrug-resistant organisms (MDROs) are causing a global crisis. Combating antimicrobial resistance requires prevention of transmission of resistant organisms and improved use of antimicrobials. Objectives To develop a Web-based information system for automatic integration, analysis, and interpretation of the antimicrobial susceptibility of all clinical isolates that incorporates rule-based classification and cluster analysis of MDROs and implements control chart analysis to facilitate outbreak detection. Methods Electronic microbiological data from a 2200-bed teaching hospital in Taiwan were classified according to predefined criteria of MDROs. The numbers of organisms, patients, and incident patients in each MDRO pattern were presented graphically to describe spatial and time information in a Web-based user interface. Hierarchical clustering with 7 upper control limits (UCL) was used to detect suspicious outbreaks. The system’s performance in outbreak detection was evaluated based on vancomycin-resistant enterococcal outbreaks determined by a hospital-wide prospective active surveillance database compiled by infection control personnel. Results The optimal UCL for MDRO outbreak detection was the upper 90% confidence interval (CI) using germ criterion with clustering (area under ROC curve (AUC) 0.93, 95% CI 0.91 to 0.95), upper 85% CI using patient criterion (AUC 0.87, 95% CI 0.80 to 0.93), and one standard deviation using incident patient criterion (AUC 0.84, 95% CI 0.75 to 0.92). The performance indicators of each UCL were statistically significantly higher with clustering than those without clustering in germ criterion (P < .001), patient criterion (P = .04), and incident patient criterion (P < .001). Conclusion This system automatically identifies MDROs and accurately detects suspicious outbreaks of MDROs based on the antimicrobial susceptibility of all clinical isolates. PMID:23195868
Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs.

PubMed

Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

2008-05-28

Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4-15.9 times faster, while Unphased jobs performed 1.1-18.6 times faster compared to the accumulated computation duration. Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance.
Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs

PubMed Central

Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

2008-01-01

Background Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Results Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4–15.9 times faster, while Unphased jobs performed 1.1–18.6 times faster compared to the accumulated computation duration. Conclusion Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance. PMID:18541045
Hydrodynamic fractionation of finite size gold nanoparticle clusters.

PubMed

Tsai, De-Hao; Cho, Tae Joon; DelRio, Frank W; Taurozzi, Julian; Zachariah, Michael R; Hackley, Vincent A

2011-06-15

We demonstrate a high-resolution in situ experimental method for performing simultaneous size classification and characterization of functional gold nanoparticle clusters (GNCs) based on asymmetric-flow field flow fractionation (AFFF). Field emission scanning electron microscopy, atomic force microscopy, multi-angle light scattering (MALS), and in situ ultraviolet-visible optical spectroscopy provide complementary data and imagery confirming the cluster state (e.g., dimer, trimer, tetramer), packing structure, and purity of fractionated populations. An orthogonal analysis of GNC size distributions is obtained using electrospray-differential mobility analysis (ES-DMA). We find a linear correlation between the normalized MALS intensity (measured during AFFF elution) and the corresponding number concentration (measured by ES-DMA), establishing the capacity for AFFF to quantify the absolute number concentration of GNCs. The results and corresponding methodology summarized here provide the proof of concept for general applications involving the formation, isolation, and in situ analysis of both functional and adventitious nanoparticle clusters of finite size. © 2011 American Chemical Society
Structural analysis of the PSD-95 cluster by electron tomography and CEMOVIS: a proposal for the application of the genetically encoded metallothionein tag.

PubMed

Hirabayashi, Ai; Fukunaga, Yuko; Miyazawa, Atsuo

2014-06-01

Postsynaptic density-95 (PSD-95) accumulates at excitatory postsynapses and plays important roles in the clustering and anchoring of numerous proteins at the PSD. However, a detailed ultrastructural analysis of clusters exclusively consisting of PSD-95 has never been performed. Here, we employed a genetically encoded tag, three tandem repeats of metallothionein (3MT), to study the structure of PSD-95 clusters in cells by electron tomography and cryo-electron microscopy of vitreous sections. We also performed conventional transmission electron microscopy (TEM). Cultured hippocampal neurons expressing a fusion protein of PSD-95 coupled to 3MT (PDS-95-3MT) were incubated with CdCl2 to result in the formation of Cd-bound PSD-95-3MT. Two types of electron-dense deposits composed of Cd-bound PSD-95-3MT were observed in these cells by TEM, as reported previously. Electron tomography revealed the presence of membrane-shaped structures representing PSD-95 clusters at the PSD and an ellipsoidal structure located in the non-synaptic cytoplasm. By TEM, the PSD-95 clusters appeared to be composed of a number of dense cores. In frozen hydrated sections, these dense cores were also found beneath the postsynaptic membrane. Taken together, our findings suggest that dense cores of PSD-95 aggregate to form the larger clusters present in the PSD and the non-synaptic cytoplasm. © The Author 2014. Published by Oxford University Press on behalf of The Japanese Society of Microscopy. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

PubMed

Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

2014-12-30

Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
Replicating cluster subtypes for the prevention of adolescent smoking and alcohol use.

PubMed

Babbin, Steven F; Velicer, Wayne F; Paiva, Andrea L; Brick, Leslie Ann D; Redding, Colleen A

2015-01-01

Substance abuse interventions tailored to the individual level have produced effective outcomes for a wide variety of behaviors. One approach to enhancing tailoring involves using cluster analysis to identify prevention subtypes that represent different attitudes about substance use. This study applied this approach to better understand tailored interventions for smoking and alcohol prevention. Analyses were performed on a sample of sixth graders from 20 New England middle schools involved in a 36-month tailored intervention study. Most adolescents reported being in the Acquisition Precontemplation (aPC) stage at baseline: not smoking or not drinking and not planning to start in the next six months. For smoking (N=4059) and alcohol (N=3973), each sample was randomly split into five subsamples. Cluster analysis was performed within each subsample based on three variables: Pros and Cons (from Decisional Balance Scales), and Situational Temptations. Across all subsamples for both smoking and alcohol, the following four clusters were identified: (1) Most Protected (MP; low Pros, high Cons, low Temptations); (2) Ambivalent (AM; high Pros, average Cons and Temptations); (3) Risk Denial (RD; average Pros, low Cons, average Temptations); and (4) High Risk (HR; high Pros, low Cons, and very high Temptations). Finding the same four clusters within aPC for both smoking and alcohol, replicating the results across the five subsamples, and demonstrating hypothesized relations among the clusters with additional external validity analyses provide strong evidence of the robustness of these results. These clusters demonstrate evidence of validity and can provide a basis for tailoring interventions. Copyright © 2014. Published by Elsevier Ltd.
Replicating cluster subtypes for the prevention of adolescent smoking and alcohol use

PubMed Central

Babbin, Steven F.; Velicer, Wayne F.; Paiva, Andrea L.; Brick, Leslie Ann D.; Redding, Colleen A.

2015-01-01

Introduction Substance abuse interventions tailored to the individual level have produced effective outcomes for a wide variety of behaviors. One approach to enhancing tailoring involves using cluster analysis to identify prevention subtypes that represent different attitudes about substance use. This study applied this approach to better understand tailored interventions for smoking and alcohol prevention. Methods Analyses were performed on a sample of sixth graders from 20 New England middle schools involved in a 36-month tailored intervention study. Most adolescents reported being in the Acquisition Precontemplation (aPC) stage at baseline: not smoking or not drinking and not planning to start in the next six months. For smoking (N= 4059) and alcohol (N= 3973), each sample was randomly split into five subsamples. Cluster analysis was performed within each subsample based on three variables: Pros and Cons (from Decisional Balance Scales), and Situational Temptations. Results Across all subsamples for both smoking and alcohol, the following four clusters were identified: (1) Most Protected (MP; low Pros, high Cons, low Temptations); (2) Ambivalent (AM; high Pros, average Cons and Temptations); (3) Risk Denial (RD; average Pros, low Cons, average Temptations); and (4) High Risk (HR; high Pros, low Cons, and very high Temptations). Conclusions Finding the same four clusters within aPC for both smoking and alcohol, replicating the results across the five subsamples, and demonstrating hypothesized relations among the clusters with additional external validity analyses provide strong evidence of the robustness of these results. These clusters demonstrate evidence of validity and can provide a basis for tailoring interventions. PMID:25222849
Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

PubMed

Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

2017-09-01

We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.
Emergy-based comparative analysis on industrial clusters: economic and technological development zone of Shenyang area, China.

PubMed

Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi

2014-09-01

In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.
Efficacy of GPS cluster analysis for predicting carnivory sites of a wide-ranging omnivore: the American black bear

USGS Publications Warehouse

Kindschuh, Sarah R.; Cain, James W.; Daniel, David; Peyton, Mark A.

2016-01-01

The capacity to describe and quantify predation by large carnivores expanded considerably with the advent of GPS technology. Analyzing clusters of GPS locations formed by carnivores facilitates the detection of predation events by identifying characteristics which distinguish predation sites. We present a performance assessment of GPS cluster analysis as applied to the predation and scavenging of an omnivore, the American black bear (Ursus americanus), on ungulate prey and carrion. Through field investigations of 6854 GPS locations from 24 individual bears, we identified 54 sites where black bears formed a cluster of locations while predating or scavenging elk (Cervus elaphus), mule deer (Odocoileus hemionus), or cattle (Bos spp.). We developed models for three data sets to predict whether a GPS cluster was formed at a carnivory site vs. a non-carnivory site (e.g., bed sites or non-ungulate foraging sites). Two full-season data sets contained GPS locations logged at either 3-h or 30-min intervals from April to November, and a third data set contained 30-min interval data from April through July corresponding to the calving period for elk. Longer fix intervals resulted in the detection of fewer carnivory sites. Clusters were more likely to be carnivory sites if they occurred in open or edge habitats, if they occurred in the early season, if the mean distance between all pairs of GPS locations within the cluster was less, and if the cluster endured for a longer period of time. Clusters were less likely to be carnivory sites if they were initiated in the morning or night compared to the day. The top models for each data set performed well and successfully predicted 71–96% of field-verified carnivory events, 55–75% of non–carnivory events, and 58–76% of clusters overall. Refinement of this method will benefit from further application across species and ecological systems.
FLOCK cluster analysis of mast cell event clustering by high-sensitivity flow cytometry predicts systemic mastocytosis.

PubMed

Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty

2015-11-01

In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.
Multiple goals, motivation and academic learning.

PubMed

Valle, Antonio; Cabanach, Ramón G; Núnez, José C; González-Pienda, Julio; Rodríguez, Susana; Piñeiro, Isabel

2003-03-01

The type of academic goals pursued by students is one of the most important variables in motivational research in educational contexts. Although motivational theory and research have emphasised the somewhat exclusive nature of two types of goal orientation (learning goals versus performance goals), some studies (Meece, 1994; Seifert, 1995, 1996) have shown that the two kinds of goals are relatively complementary and that it is possible for students to have multiple goals simultaneously, which guarantees some flexibility to adapt more efficaciously to various contexts and learning situations. The principal aim of this study is to determine the academic goals pursued by university students and to analyse the differences in several very significant variables related to motivation and academic learning. Participants were 609 university students (74% women and 26% men) who filled in several questionnaires about the variables under study. We used cluster analysis ('quick cluster analysis' method) to establish the different groups or clusters of individuals as a function of the three types of goals (learning goals, performance goals, and social reinforcement goals). By means of MANOVA, we determined whether the groups or clusters identified were significantly different in the variables that are relevant to motivation and academic learning. Lastly, we performed ANOVA on the variables that revealed significant effects in the previous analysis. Using cluster analysis, three groups of students with different motivational orientations were identified: a group with predominance of performance goals (Group PG: n = 230), a group with predominance of multiple goals (Group MG: n = 238), and a group with predominance of learning goals (Group LG: n = 141). Groups MG and LG attributed their success more to ability, they had higher perceived ability, they took task characteristics into account when planning which strategies to use in the learning process, they showed higher persistence, and used more deep learning strategies than did the students with predominance of performance goals (Group PG). On the other hand, Groups MG and PG took the evaluation criteria more into account when deciding which strategies to use in order to learn, and they attributed their failures more to luck than did Group LG. Students from Group MG attributed their success more to effort than did the other two groups and they attained higher achievement than Group PG. Group LG tended to attribute their failures more to lack of effort than did the other two groups.
Simulations of the Formation and Evolution of X-ray Clusters

NASA Astrophysics Data System (ADS)

Bryan, G. L.; Klypin, A.; Norman, M. L.

1994-05-01

We describe results from a set of Omega = 1 Cold plus Hot Dark Matter (CHDM) and Cold Dark Matter (CDM) simulations. We examine the formation and evolution of X-ray clusters in a cosmological setting with sufficient numbers to perform statistical analysis. We find that CDM, normalized to COBE, seems to produce too many large clusters, both in terms of the luminosity (dn/dL) and temperature (dn/dT) functions. The CHDM simulation produces fewer clusters and the temperature distribution (our numerically most secure result) matches observations where they overlap. The computed cluster luminosity function drops below observations, but we are almost surely underestimating the X-ray luminosity. Because of the lower fluctuations in CHDM, there are only a small number of bright clusters in our simulation volume; however we can use the simulated clusters to fix the relation between temperature and velocity dispersion, allowing us to use collisionless N-body codes to probe larger length scales with correspondingly brighter clusters. The hydrodynamic simulations have been performed with a hybrid particle-mesh scheme for the dark matter and a high resolution grid-based piecewise parabolic method for the adiabatic gas dynamics. This combination has been implemented for massively parallel computers, allowing us to achive grids as large as 512(3) .
Spatiotemporal Analysis of Corn Phenoregions in the Continental United States

NASA Astrophysics Data System (ADS)

Konduri, V. S.; Kumar, J.; Hoffman, F. M.; Ganguly, A. R.; Hargrove, W. W.

2017-12-01

The delineation of regions exhibiting similar crop performance has potential benefits for agricultural planning and management, policymaking and natural resource conservation. Studies of natural ecosystems have used multivariate clustering algorithms based on environmental characteristics to identify ecoregions for species range prediction and habitat conservation. However, few studies have used clustering to delineate regions based on crop phenology. The aim of this study was to perform a spatiotemporal analysis of phenologically self-similar clusters, or phenoregions, for the major corn growing areas in the Continental United States (CONUS) for the period 2008-2016. Annual trajectories of remotely sensed normalized difference vegetation index (NDVI), a useful proxy for land surface phenology, derived from Moderate Resolution Spectroradiometer (MODIS) instruments at 8-day intervals and 250 m resolution was used as the phenological metric. Because of the large data volumes involved, the phenoregion delineation was performed using a highly scalable, unsupervised clustering technique with the help of high performance computing. These phenoregions capture the spatial variability in the timing of important crop phenological stages (like emergence and maturity dates) and thus could be used to develop more accurate parameterizations for crop models applied at regional to global scales. Moreover, historical crop performance from phenoregions, in combination with climate and soils data, could be used to improve production forecasts. The temporal variability in NDVI at each location could also be used to develop an early warning system to identify locations where the crop deviates from its expected phenological behavior. Such deviations may indicate a need for irrigation or fertilization or suggest where pest outbreaks or other disturbances have occurred.

Electrical Load Profile Analysis Using Clustering Techniques

NASA Astrophysics Data System (ADS)

Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.

2017-03-01

Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Technological innovation in neurosurgery: a quantitative study.

PubMed

Marcus, Hani J; Hughes-Hallett, Archie; Kwasnicki, Richard M; Darzi, Ara; Yang, Guang-Zhong; Nandi, Dipankar

2015-07-01

Technological innovation within health care may be defined as the introduction of a new technology that initiates a change in clinical practice. Neurosurgery is a particularly technology-intensive surgical discipline, and new technologies have preceded many of the major advances in operative neurosurgical techniques. The aim of the present study was to quantitatively evaluate technological innovation in neurosurgery using patents and peer-reviewed publications as metrics of technology development and clinical translation, respectively. The authors searched a patent database for articles published between 1960 and 2010 using the Boolean search term "neurosurgeon OR neurosurgical OR neurosurgery." The top 50 performing patent codes were then grouped into technology clusters. Patent and publication growth curves were then generated for these technology clusters. A top-performing technology cluster was then selected as an exemplar for a more detailed analysis of individual patents. In all, 11,672 patents and 208,203 publications related to neurosurgery were identified. The top-performing technology clusters during these 50 years were image-guidance devices, clinical neurophysiology devices, neuromodulation devices, operating microscopes, and endoscopes. In relation to image-guidance and neuromodulation devices, the authors found a highly correlated rapid rise in the numbers of patents and publications, which suggests that these are areas of technology expansion. An in-depth analysis of neuromodulation-device patents revealed that the majority of well-performing patents were related to deep brain stimulation. Patent and publication data may be used to quantitatively evaluate technological innovation in neurosurgery.
A Highly Efficient Design Strategy for Regression with Outcome Pooling

PubMed Central

Mitchell, Emily M.; Lyles, Robert H.; Manatunga, Amita K.; Perkins, Neil J.; Schisterman, Enrique F.

2014-01-01

The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. PMID:25220822
A highly efficient design strategy for regression with outcome pooling.

PubMed

Mitchell, Emily M; Lyles, Robert H; Manatunga, Amita K; Perkins, Neil J; Schisterman, Enrique F

2014-12-10

The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.
Response of Human Skin to Aesthetic Scarification

PubMed Central

Gabriel, Vincent A.; McClellan, Elizabeth A.; Scheuermann, Richard H.

2014-01-01

This study was undertaken to investigate changes in RNA expression in previously healthy adult human skin following thermal injury induced by contact with hot metal that was undertaken as part of aesthetic scarification, a body modification practice. Subjects were recruited to have pre-injury skin and serial wound biopsies performed. 4 mm punch biopsies were taken prior to branding and 1 hour, 1 week, and 1, 2 and 3 months post injury. RNA was extracted and quality assured prior to the use of a whole-genome based bead array platform to describe expression changes in the samples using the pre-injury skin as a comparator. Analysis of the array data was performed using k-means clustering and a hypergeometric probability distribution without replacement and corrections for multiple comparisons were done. Confirmatory q-PCR was performed. Using a k of 10, several clusters of genes were shown to co-cluster together based on Gene Ontology classification with probabilities unlikely to occur by chance alone. OF particular interest were clusters relating to cell cycle, proteinaceous extracellular matrix and keratinization. Given the consistent expression changes at one week following injury in the cell cycle cluster, there is an opportunity to intervene early following burn injury to influence scar development. PMID:24582755
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

PubMed

Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

2013-03-01

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadgu, Teklu; Appel, Gordon John

Sandia National Laboratories (SNL) continued evaluation of total system performance assessment (TSPA) computing systems for the previously considered Yucca Mountain Project (YMP). This was done to maintain the operational readiness of the computing infrastructure (computer hardware and software) and knowledge capability for total system performance assessment (TSPA) type analysis, as directed by the National Nuclear Security Administration (NNSA), DOE 2010. This work is a continuation of the ongoing readiness evaluation reported in Lee and Hadgu (2014) and Hadgu et al. (2015). The TSPA computing hardware (CL2014) and storage system described in Hadgu et al. (2015) were used for the currentmore » analysis. One floating license of GoldSim with Versions 9.60.300, 10.5 and 11.1.6 was installed on the cluster head node, and its distributed processing capability was mapped on the cluster processors. Other supporting software were tested and installed to support the TSPA-type analysis on the server cluster. The current tasks included verification of the TSPA-LA uncertainty and sensitivity analyses, and preliminary upgrade of the TSPA-LA from Version 9.60.300 to the latest version 11.1. All the TSPA-LA uncertainty and sensitivity analyses modeling cases were successfully tested and verified for the model reproducibility on the upgraded 2014 server cluster (CL2014). The uncertainty and sensitivity analyses used TSPA-LA modeling cases output generated in FY15 based on GoldSim Version 9.60.300 documented in Hadgu et al. (2015). The model upgrade task successfully converted the Nominal Modeling case to GoldSim Version 11.1. Upgrade of the remaining of the modeling cases and distributed processing tasks will continue. The 2014 server cluster and supporting software systems are fully operational to support TSPA-LA type analysis.« less
Chemical Fingerprint and Quantitative Analysis for the Quality Evaluation of Platycladi cacumen by Ultra-performance Liquid Chromatography Coupled with Hierarchical Cluster Analysis.

PubMed

Shan, Mingqiu; Li, Sam Fong Yau; Yu, Sheng; Qian, Yan; Guo, Shuchen; Zhang, Li; Ding, Anwei

2018-01-01

Platycladi cacumen (dried twigs and leaves of Platycladus orientalis (L.) Franco) is a frequently utilized Chinese medicinal herb. To evaluate the quality of the phytomedcine, an ultra-performance liquid chromatographic method with diode array detection was established for chemical fingerprinting and quantitative analysis. In this study, 27 batches of P. cacumen from different regions were collected for analysis. A chemical fingerprint with 20 common peaks was obtained using Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine (Version 2004A). Among these 20 components, seven flavonoids (myricitrin, isoquercitrin, quercitrin, afzelin, cupressuflavone, amentoflavone and hinokiflavone) were identified and determined simultaneously. In the method validation, the seven analytes showed good regressions (R ≥ 0.9995) within linear ranges and good recoveries from 96.4% to 103.3%. Furthermore, with the contents of these seven flavonoids, hierarchical clustering analysis was applied to distinguish the 27 batches into five groups. The chemometric results showed that these groups were almost consistent with geographical positions and climatic conditions of the production regions. Integrating fingerprint analysis, simultaneous determination and hierarchical clustering analysis, the established method is rapid, sensitive, accurate and readily applicable, and also provides a significant foundation for quality control of P. cacumen efficiently. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Neuropsychological phenotypes among men with and without HIV disease in the multicenter AIDS cohort study.

PubMed

Molsberry, Samantha A; Cheng, Yu; Kingsley, Lawrence; Jacobson, Lisa; Levine, Andrew J; Martin, Eileen; Miller, Eric N; Munro, Cynthia A; Ragin, Ann; Sacktor, Ned; Becker, James T

2018-05-11

Mild forms of HIV-associated neurocognitive disorder (HAND) remain prevalent in the combination anti-retroviral therapy (cART) era. This study's objective was to identify neuropsychological subgroups within the Multicenter AIDS Cohort Study (MACS) based on the participant-based latent structure of cognitive function and to identify factors associated with subgroups. The MACS is a four-site longitudinal study of the natural and treated history of HIV disease among gay and bisexual men. Using neuropsychological domain scores we used a cluster variable selection algorithm to identify the optimal subset of domains with cluster information. Latent profile analysis was applied using scores from identified domains. Exploratory and post-hoc analyses were conducted to identify factors associated with cluster membership and the drivers of the observed associations. Cluster variable selection identified all domains as containing cluster information except for Working Memory. A three-profile solution produced the best fit for the data. Profile 1 performed below average on all domains, Profile 2 performed average on executive functioning, motor, and speed and below average on learning and memory, Profile 3 performed at or above average across all domains. Several demographic, cognitive, and social factors were associated with profile membership; these associations were driven by differences between Profile 1 and the other profiles. There is an identifiable pattern of neuropsychological performance among MACS members determined by all domains except Working Memory. Neither HIV nor HIV-related biomarkers were related with cluster membership, consistent with other findings that cognitive performance patterns do not map directly onto HIV serostatus.
Metrics and methods for characterizing dairy farm intensification using farm survey data.

PubMed

Gonzalez-Mejia, Alejandra; Styles, David; Wilson, Paul; Gibbons, James

2018-01-01

Evaluation of agricultural intensification requires comprehensive analysis of trends in farm performance across physical and socio-economic aspects, which may diverge across farm types. Typical reporting of economic indicators at sectorial or the "average farm" level does not represent farm diversity and provides limited insight into the sustainability of specific intensification pathways. Using farm business data from a total of 7281 farm survey observations of English and Welsh dairy farms over a 14-year period we calculate a time series of 16 key performance indicators (KPIs) pertinent to farm structure, environmental and socio-economic aspects of sustainability. We then apply principle component analysis and model-based clustering analysis to identify statistically the number of distinct dairy farm typologies for each year of study, and link these clusters through time using multidimensional scaling. Between 2001 and 2014, dairy farms have largely consolidated and specialized into two distinct clusters: more extensive farms relying predominantly on grass, with lower milk yields but higher labour intensity, and more intensive farms producing more milk per cow with more concentrate and more maize, but lower labour intensity. There is some indication that these clusters are converging as the extensive cluster is intensifying slightly faster than the intensive cluster, in terms of milk yield per cow and use of concentrate feed. In 2014, annual milk yields were 6,835 and 7,500 l/cow for extensive and intensive farm types, respectively, whilst annual concentrate feed use was 1.3 and 1.5 tonnes per cow. For several KPIs such as milk yield the mean trend across all farms differed substantially from the extensive and intensive typologies mean. The indicators and analysis methodology developed allows identification of distinct farm types and industry trends using readily available survey data. The identified groups allow the accurate evaluation of the consequences of the reduction in dairy farm numbers and intensification at national and international scales.
Metrics and methods for characterizing dairy farm intensification using farm survey data

PubMed Central

Gonzalez-Mejia, Alejandra; Styles, David; Wilson, Paul

2018-01-01

Evaluation of agricultural intensification requires comprehensive analysis of trends in farm performance across physical and socio-economic aspects, which may diverge across farm types. Typical reporting of economic indicators at sectorial or the “average farm” level does not represent farm diversity and provides limited insight into the sustainability of specific intensification pathways. Using farm business data from a total of 7281 farm survey observations of English and Welsh dairy farms over a 14-year period we calculate a time series of 16 key performance indicators (KPIs) pertinent to farm structure, environmental and socio-economic aspects of sustainability. We then apply principle component analysis and model-based clustering analysis to identify statistically the number of distinct dairy farm typologies for each year of study, and link these clusters through time using multidimensional scaling. Between 2001 and 2014, dairy farms have largely consolidated and specialized into two distinct clusters: more extensive farms relying predominantly on grass, with lower milk yields but higher labour intensity, and more intensive farms producing more milk per cow with more concentrate and more maize, but lower labour intensity. There is some indication that these clusters are converging as the extensive cluster is intensifying slightly faster than the intensive cluster, in terms of milk yield per cow and use of concentrate feed. In 2014, annual milk yields were 6,835 and 7,500 l/cow for extensive and intensive farm types, respectively, whilst annual concentrate feed use was 1.3 and 1.5 tonnes per cow. For several KPIs such as milk yield the mean trend across all farms differed substantially from the extensive and intensive typologies mean. The indicators and analysis methodology developed allows identification of distinct farm types and industry trends using readily available survey data. The identified groups allow the accurate evaluation of the consequences of the reduction in dairy farm numbers and intensification at national and international scales. PMID:29742166
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
Spatial cluster for clustering the influence factor of birth and death child in Bogor Regency, West Java

NASA Astrophysics Data System (ADS)

Bekti, Rokhana Dwi; Rachmawati, Ro'fah

2014-03-01

The number of birth and death child is the benchmarks to determine and monitor the health and welfare in Indonesia. It can be used to identify groups of people who have a high mortality risk. Identifying group is important to compare the characteristics of human that have high and low risk. These characteristics can be seen from the factors that influenced it. Furthermore, there are factors which influence of birth and death child, such us economic, health facility, education, and others. The influence factors of every individual are different, but there are similarities some individuals which live close together or in the close locations. It means there was spatial effect. To identify group in this research, clustering is done by spatial cluster method, which is view to considering the influence of the location or the relationship between locations. One of spatial cluster method is Spatial 'K'luster Analysis by Tree Edge Removal (SKATER). The research was conducted in Bogor Regency, West Java. The goal was to get a cluster of districts based on the factors that influence birth and death child. SKATER build four number of cluster respectively consists of 26, 7, 2, and 5 districts. SKATER has good performance for clustering which include spatial effect. If it compare by other cluster method, Kmeans has good performance by MANOVA test.
Comparison of five cluster validity indices performance in brain [18 F]FET-PET image segmentation using k-means.

PubMed

Abualhaj, Bedor; Weng, Guoyang; Ong, Melissa; Attarwala, Ali Asgar; Molina, Flavia; Büsing, Karen; Glatting, Gerhard

2017-01-01

Dynamic [ 18 F]fluoro-ethyl-L-tyrosine positron emission tomography ([ 18 F]FET-PET) is used to identify tumor lesions for radiotherapy treatment planning, to differentiate glioma recurrence from radiation necrosis and to classify gliomas grading. To segment different regions in the brain k-means cluster analysis can be used. The main disadvantage of k-means is that the number of clusters must be pre-defined. In this study, we therefore compared different cluster validity indices for automated and reproducible determination of the optimal number of clusters based on the dynamic PET data. The k-means algorithm was applied to dynamic [ 18 F]FET-PET images of 8 patients. Akaike information criterion (AIC), WB, I, modified Dunn's and Silhouette indices were compared on their ability to determine the optimal number of clusters based on requirements for an adequate cluster validity index. To check the reproducibility of k-means, the coefficients of variation CVs of the objective function values OFVs (sum of squared Euclidean distances within each cluster) were calculated using 100 random centroid initialization replications RCI 100 for 2 to 50 clusters. k-means was performed independently on three neighboring slices containing tumor for each patient to investigate the stability of the optimal number of clusters within them. To check the independence of the validity indices on the number of voxels, cluster analysis was applied after duplication of a slice selected from each patient. CVs of index values were calculated at the optimal number of clusters using RCI 100 to investigate the reproducibility of the validity indices. To check if the indices have a single extremum, visual inspection was performed on the replication with minimum OFV from RCI 100 . The maximum CV of OFVs was 2.7 × 10 -2 from all patients. The optimal number of clusters given by modified Dunn's and Silhouette indices was 2 or 3 leading to a very poor segmentation. WB and I indices suggested in median 5, [range 4-6] and 4, [range 3-6] clusters, respectively. For WB, I, modified Dunn's and Silhouette validity indices the suggested optimal number of clusters was not affected by the number of the voxels. The maximum coefficient of variation of WB, I, modified Dunn's, and Silhouette validity indices were 3 × 10 -2 , 1, 2 × 10 -1 and 3 × 10 -3 , respectively. WB-index showed a single global maximum, whereas the other indices showed also local extrema. From the investigated cluster validity indices, the WB-index is best suited for automated determination of the optimal number of clusters for [ 18 F]FET-PET brain images for the investigated image reconstruction algorithm and the used scanner: it yields meaningful results allowing better differentiation of tissues with higher number of clusters, it is simple, reproducible and has an unique global minimum. © 2016 American Association of Physicists in Medicine.
Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery.

PubMed

Perualila-Tan, Nolen Joy; Shkedy, Ziv; Talloen, Willem; Göhlmann, Hinrich W H; Moerbeke, Marijke Van; Kasim, Adetayo

2016-08-01

The modern process of discovering candidate molecules in early drug discovery phase includes a wide range of approaches to extract vital information from the intersection of biology and chemistry. A typical strategy in compound selection involves compound clustering based on chemical similarity to obtain representative chemically diverse compounds (not incorporating potency information). In this paper, we propose an integrative clustering approach that makes use of both biological (compound efficacy) and chemical (structural features) data sources for the purpose of discovering a subset of compounds with aligned structural and biological properties. The datasets are integrated at the similarity level by assigning complementary weights to produce a weighted similarity matrix, serving as a generic input in any clustering algorithm. This new analysis work flow is semi-supervised method since, after the determination of clusters, a secondary analysis is performed wherein it finds differentially expressed genes associated to the derived integrated cluster(s) to further explain the compound-induced biological effects inside the cell. In this paper, datasets from two drug development oncology projects are used to illustrate the usefulness of the weighted similarity-based clustering approach to integrate multi-source high-dimensional information to aid drug discovery. Compounds that are structurally and biologically similar to the reference compounds are discovered using this proposed integrative approach.
Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis

PubMed Central

Xu, Rui; Zhen, Zonglei; Liu, Jia

2010-01-01

Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081
Cluster-guided imaging-based CFD analysis of airflow and particle deposition in asthmatic human lungs

NASA Astrophysics Data System (ADS)

Choi, Jiwoong; Leblanc, Lawrence; Choi, Sanghun; Haghighi, Babak; Hoffman, Eric; Lin, Ching-Long

2017-11-01

The goal of this study is to assess inter-subject variability in delivery of orally inhaled drug products to small airways in asthmatic lungs. A recent multiscale imaging-based cluster analysis (MICA) of computed tomography (CT) lung images in an asthmatic cohort identified four clusters with statistically distinct structural and functional phenotypes associating with unique clinical biomarkers. Thus, we aimed to address inter-subject variability via inter-cluster variability. We selected a representative subject from each of the 4 asthma clusters as well as 1 male and 1 female healthy controls, and performed computational fluid and particle simulations on CT-based airway models of these subjects. The results from one severe and one non-severe asthmatic cluster subjects characterized by segmental airway constriction had increased particle deposition efficiency, as compared with the other two cluster subjects (one non-severe and one severe asthmatics) without airway constriction. Constriction-induced jets impinging on distal bifurcations led to excessive particle deposition. The results emphasize the impact of airway constriction on regional particle deposition rather than disease severity, demonstrating the potential of using cluster membership to tailor drug delivery. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837. XSEDE.
A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

PubMed Central

Liu, Jingxian; Wu, Kefeng

2017-01-01

The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with traditional spectral clustering and fast affinity propagation clustering. Experimental results have illustrated its superior performance in terms of quantitative and qualitative evaluations. PMID:28777353
Implementation of the force decomposition machine for molecular dynamics simulations.

PubMed

Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka

2012-09-01

We present the design and implementation of the force decomposition machine (FDM), a cluster of personal computers (PCs) that is tailored to running molecular dynamics (MD) simulations using the distributed diagonal force decomposition (DDFD) parallelization method. The cluster interconnect architecture is optimized for the communication pattern of the DDFD method. Our implementation of the FDM relies on standard commodity components even for networking. Although the cluster is meant for DDFD MD simulations, it remains general enough for other parallel computations. An analysis of several MD simulation runs on both the FDM and a standard PC cluster demonstrates that the FDM's interconnect architecture provides a greater performance compared to a more general cluster interconnect. Copyright © 2012 Elsevier Inc. All rights reserved.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

PubMed

Ferrari, Alberto; Comelli, Mario

2016-12-01

In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

Genotypic diversity of oscillatoriacean strains belonging to the genera Geitlerinema and Spirulina determined by 16S rDNA restriction analysis.

PubMed

Margheri, Maria C; Piccardi, Raffaella; Ventura, Stefano; Viti, Carlo; Giovannetti, Luciana

2003-05-01

Genotypic diversity of several cyanobacterial strains mostly isolated from marine or brackish waters, belonging to the genera Geitlerinema and Spirulina, was investigated by amplified 16S ribosomal DNA restriction analysis and compared with morphological features and response to salinity. Cluster analysis was performed on amplified 16S rDNA restriction profiles of these strains along with profiles obtained from sequence data of five Spirulina-like strains, including three representatives of the new genus Halospirulina. Our strains with tightly coiled trichomes from hypersaline waters could be assigned to the Halospirulina genus. Among the uncoiled strains, the two strains of hypersaline origin clustered together and were found to be distant from their counterparts of marine and freshwater habitat. Moreover, another cluster, formed by alkali-tolerant strains with tightly coiled trichomes, was well delineated.
Hybrid cloud and cluster computing paradigms for life science applications

PubMed Central

2010-01-01

Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.

PubMed

Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey

2010-12-21

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

NASA Astrophysics Data System (ADS)

Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

2018-01-01

To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.
WAIS-III index score profiles in the Canadian standardization sample.

PubMed

Lange, Rael T

2007-01-01

Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.
Generalization of Clustering Coefficients to Signed Correlation Networks

PubMed Central

Costantini, Giulio; Perugini, Marco

2014-01-01

The recent interest in network analysis applications in personality psychology and psychopathology has put forward new methodological challenges. Personality and psychopathology networks are typically based on correlation matrices and therefore include both positive and negative edge signs. However, some applications of network analysis disregard negative edges, such as computing clustering coefficients. In this contribution, we illustrate the importance of the distinction between positive and negative edges in networks based on correlation matrices. The clustering coefficient is generalized to signed correlation networks: three new indices are introduced that take edge signs into account, each derived from an existing and widely used formula. The performances of the new indices are illustrated and compared with the performances of the unsigned indices, both on a signed simulated network and on a signed network based on actual personality psychology data. The results show that the new indices are more resistant to sample variations in correlation networks and therefore have higher convergence compared with the unsigned indices both in simulated networks and with real data. PMID:24586367
Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

PubMed

Yokoyama, Eiji; Uchimura, Masako

2007-11-01

Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

NASA Astrophysics Data System (ADS)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.

PubMed

Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves

2011-08-01

The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

PubMed

Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

2008-06-18

Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.
Distinct phenotype clusters in childhood inflammatory brain diseases: implications for diagnostic evaluation.

PubMed

Cellucci, Tania; Tyrrell, Pascal N; Twilt, Marinka; Sheikh, Shehla; Benseler, Susanne M

2014-03-01

To identify distinct clusters of children with inflammatory brain diseases based on clinical, laboratory, and imaging features at presentation, to assess which features contribute strongly to the development of clusters, and to compare additional features between the identified clusters. A single-center cohort study was performed with children who had been diagnosed as having an inflammatory brain disease between June 1, 1989 and December 31, 2010. Demographic, clinical, laboratory, neuroimaging, and histologic data at diagnosis were collected. K-means cluster analysis was performed to identify clusters of patients based on their presenting features. Associations between the clusters and patient variables, such as diagnoses, were determined. A total of 147 children (50% female; median age 8.8 years) were identified: 105 with primary central nervous system (CNS) vasculitis, 11 with secondary CNS vasculitis, 8 with neuronal antibody syndromes, 6 with postinfectious syndromes, and 17 with other inflammatory brain diseases. Three distinct clusters were identified. Paresis and speech deficits were the most common presenting features in cluster 1. Children in cluster 2 were likely to present with behavior changes, cognitive dysfunction, and seizures, while those in cluster 3 experienced ataxia, vision abnormalities, and seizures. Lesions seen on T2/fluid-attenuated inversion recovery sequences of magnetic resonance imaging were common in all clusters, but unilateral ischemic lesions were more prominent in cluster 1. The clusters were associated with specific diagnoses and diagnostic test results. Children with inflammatory brain diseases presented with distinct phenotypical patterns that are associated with specific diagnoses. This information may inform the development of a diagnostic classification of childhood inflammatory brain diseases and suggest that specific pathways of diagnostic evaluation are warranted. Copyright © 2014 by the American College of Rheumatology.
MC 2 : galaxy imaging and redshift analysis of the merging cluster Ciza J2242.8+5301

DOE PAGES

Dawson, William A.; Jee, M. James; Stroe, Andra; ...

2015-05-28

X-ray and radio observations of CIZA J2242.8+5301 suggest that it is a major cluster merger. Despite being well studied in the X-ray, and radio, little has been presented on the cluster structure and dynamics inferred from its galaxy population. We carried out a deep (i < 25) broad band imaging survey of the system with Subaru SuprimeCam (g & i bands) and the Canada France Hawaii Telescope (r band) as well as a comprehensive spectroscopic survey of the cluster area (505 redshifts) using Keck DEIMOS. We use this data to perform a comprehensive galaxy/redshift analysis of the system, which ismore » the first step to a proper understanding the geometry and dynamics of the merger, as well as using the merger to constrain self-interacting dark matter.« less
High-dimensional cluster analysis with the Masked EM Algorithm

PubMed Central

Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

2014-01-01

Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
Health and disease phenotyping in old age using a cluster network analysis.

PubMed

Valenzuela, Jesus Felix; Monterola, Christopher; Tong, Victor Joo Chuan; Ng, Tze Pin; Larbi, Anis

2017-11-15

Human ageing is a complex trait that involves the synergistic action of numerous biological processes that interact to form a complex network. Here we performed a network analysis to examine the interrelationships between physiological and psychological functions, disease, disability, quality of life, lifestyle and behavioural risk factors for ageing in a cohort of 3,270 subjects aged ≥55 years. We considered associations between numerical and categorical descriptors using effect-size measures for each variable pair and identified clusters of variables from the resulting pairwise effect-size network and minimum spanning tree. We show, by way of a correspondence analysis between the two sets of clusters, that they correspond to coarse-grained and fine-grained structure of the network relationships. The clusters obtained from the minimum spanning tree mapped to various conceptual domains and corresponded to physiological and syndromic states. Hierarchical ordering of these clusters identified six common themes based on interactions with physiological systems and common underlying substrates of age-associated morbidity and disease chronicity, functional disability, and quality of life. These findings provide a starting point for indepth analyses of ageing that incorporate immunologic, metabolomic and proteomic biomarkers, and ultimately offer low-level-based typologies of healthy and unhealthy ageing.
Classification of attempted suicide by cluster analysis: A study of 888 suicide attempters presenting to the emergency department.

PubMed

Kim, Hyeyoung; Kim, Bora; Kim, Se Hyun; Park, C Hyung Keun; Kim, Eun Young; Ahn, Yong Min

2018-08-01

It is essential to understand the latent structure of the population of suicide attempters for effective suicide prevention. The aim of this study was to identify subgroups among Korean suicide attempters in terms of the details of the suicide attempt. A total of 888 people who attempted suicide and were subsequently treated in the emergency rooms of 17 medical centers between May and November of 2013 were included in the analysis. The variables assessed included demographic characteristics, clinical information, and details of the suicide attempt assessed by the Suicide Intent Scale (SIS) and Columbia-Suicide Severity Rating Scale (C-SSRS). Cluster analysis was performed using the Ward method. Of the participants, 85.4% (n = 758) fell into a cluster characterized by less planning, low lethality methods, and ambivalence towards death ("impulsive"). The other cluster (n = 130) involved a more severe and well-planned attempt, used highly lethal methods, and took more precautions to avoid being interrupted ("planned"). The first cluster was dominated by women, while the second cluster was associated more with men, older age, and physical illness. We only included participants who visited the emergency department after their suicide attempt and had no missing values for SIS or C-SSRS. Cluster analysis extracted two distinct subgroups of Korean suicide attempters showing different patterns of suicidal behaviors. Understanding that a significant portion of suicide attempts occur impulsively calls for new prevention strategies tailored to differing subgroup profiles. Copyright © 2018 Elsevier B.V. All rights reserved.
Wavelet-based clustering of resting state MRI data in the rat.

PubMed

Medda, Alessio; Hoffmann, Lukas; Magnuson, Matthew; Thompson, Garth; Pan, Wen-Ju; Keilholz, Shella

2016-01-01

While functional connectivity has typically been calculated over the entire length of the scan (5-10min), interest has been growing in dynamic analysis methods that can detect changes in connectivity on the order of cognitive processes (seconds). Previous work with sliding window correlation has shown that changes in functional connectivity can be observed on these time scales in the awake human and in anesthetized animals. This exciting advance creates a need for improved approaches to characterize dynamic functional networks in the brain. Previous studies were performed using sliding window analysis on regions of interest defined based on anatomy or obtained from traditional steady-state analysis methods. The parcellation of the brain may therefore be suboptimal, and the characteristics of the time-varying connectivity between regions are dependent upon the length of the sliding window chosen. This manuscript describes an algorithm based on wavelet decomposition that allows data-driven clustering of voxels into functional regions based on temporal and spectral properties. Previous work has shown that different networks have characteristic frequency fingerprints, and the use of wavelets ensures that both the frequency and the timing of the BOLD fluctuations are considered during the clustering process. The method was applied to resting state data acquired from anesthetized rats, and the resulting clusters agreed well with known anatomical areas. Clusters were highly reproducible across subjects. Wavelet cross-correlation values between clusters from a single scan were significantly higher than the values from randomly matched clusters that shared no temporal information, indicating that wavelet-based analysis is sensitive to the relationship between areas. Copyright © 2015 Elsevier Inc. All rights reserved.
On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.

PubMed

Haneuse, Sebastien; Rivera-Rodriguez, Claudia

2018-01-01

In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
HPLC-DAD-ESI-MS Analysis of Flavonoids from Leaves of Different Cultivars of Sweet Osmanthus.

PubMed

Wang, Yiguang; Fu, Jianxin; Zhang, Chao; Zhao, Hongbo

2016-09-14

Osmanthus fragrans Lour. has traditionally been a popular ornamental plant in China. In this study, ethanol extracts of the leaves of four cultivar groups of O. fragrans were analyzed by high-performance liquid chromatography coupled with diode array detection (HPLC-DAD) and high-performance liquid chromatography with electrospray ionization and mass spectrometry (HPLC-ESI-MS). The results suggest that variation in flavonoids among O. fragrans cultivars is quantitative, rather than qualitative. Fifteen components were detected and separated, among which, the structures of 11 flavonoids and two coumarins were identified or tentatively identified. According to principal component analysis (PCA) and hierarchical cluster analysis (HCA) based on the abundance of these components (expressed as rutin equivalents), 22 selected cultivars were classified into four clusters. The seven cultivars from Cluster III ('Xiaoye Sugui', 'Boye Jingui', 'Wuyi Dangui', 'Yingye Dangui', 'Danzhuang', 'Foding Zhu', and 'Tianxiang Taige'), which are enriched in rutin and total flavonoids, and 'Sijigui' from Cluster II which contained the highest amounts of kaempferol glycosides and apigenin 7-O-glucoside, could be selected as potential pharmaceutical resources. However, the chemotaxonomy in this paper does not correlate with the distribution of the existing cultivar groups, demonstrating that the distribution of flavonoids in O. fragrans leaves does not provide an effective means of classification for O. fragrans cultivars based on flower color.

Academic Performance and Lifestyle Behaviors in Australian School Children: A Cluster Analysis

ERIC Educational Resources Information Center

Dumuid, Dorothea; Olds, Timothy; Martín-Fernández, Josep-Antoni; Lewis, Lucy K.; Cassidy, Leah; Maher, Carol

2017-01-01

Poor academic performance has been linked with particular lifestyle behaviors, such as unhealthy diet, short sleep duration, high screen time, and low physical activity. However, little is known about how lifestyle behavior patterns (or combinations of behaviors) contribute to children's academic performance. We aimed to compare academic…
Machine-learned cluster identification in high-dimensional data.

PubMed

Ultsch, Alfred; Lötsch, Jörn

2017-02-01

High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright Â© 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Large Data at Small Universities: Astronomical processing using a computer classroom

NASA Astrophysics Data System (ADS)

Fuller, Nathaniel James; Clarkson, William I.; Fluharty, Bill; Belanger, Zach; Dage, Kristen

2016-06-01

The use of large computing clusters for astronomy research is becoming more commonplace as datasets expand, but access to these required resources is sometimes difficult for research groups working at smaller Universities. As an alternative to purchasing processing time on an off-site computing cluster, or purchasing dedicated hardware, we show how one can easily build a crude on-site cluster by utilizing idle cycles on instructional computers in computer-lab classrooms. Since these computers are maintained as part of the educational mission of the University, the resource impact on the investigator is generally low.By using open source Python routines, it is possible to have a large number of desktop computers working together via a local network to sort through large data sets. By running traditional analysis routines in an “embarrassingly parallel” manner, gains in speed are accomplished without requiring the investigator to learn how to write routines using highly specialized methodology. We demonstrate this concept here applied to 1. photometry of large-format images and 2. Statistical significance-tests for X-ray lightcurve analysis. In these scenarios, we see a speed-up factor which scales almost linearly with the number of cores in the cluster. Additionally, we show that the usage of the cluster does not severely limit performance for a local user, and indeed the processing can be performed while the computers are in use for classroom purposes.
Clustering performances in the NBA according to players' anthropometric attributes and playing experience.

PubMed

Zhang, Shaoliang; Lorenzo, Alberto; Gómez, Miguel-Angel; Mateus, Nuno; Gonçalves, Bruno; Sampaio, Jaime

2018-04-20

The aim of this study was: (i) to group basketball players into similar clusters based on a combination of anthropometric characteristics and playing experience; and (ii) explore the distribution of players (included starters and non-starters) from different levels of teams within the obtained clusters. The game-related statistics from 699 regular season balanced games were analyzed using a two-step cluster model and a discriminant analysis. The clustering process allowed identifying five different player profiles: Top height and weight (HW) with low experience, TopHW-LowE; Middle HW with middle experience, MiddleHW-MiddleE; Middle HW with top experience, MiddleHW-TopE; Low HW with low experience, LowHW-LowE; Low HW with middle experience, LowHW-MiddleE. Discriminant analysis showed that TopHW-LowE group was highlighted by two-point field goals made and missed, offensive and defensive rebounds, blocks, and personal fouls; whereas the LowHW-LowE group made fewest passes and touches. The players from weaker teams were mostly distributed in LowHW-LowE group, whereas players from stronger teams were mainly grouped in LowHW-MiddleE group; and players that participated in the finals were allocated in the MiddleHW-MiddleE group. These results provide alternative references for basketball staff concerning the process of evaluating performance.
Defining syndromes using cattle meat inspection data for syndromic surveillance purposes: a statistical approach with the 2005-2010 data from ten French slaughterhouses.

PubMed

Dupuy, Céline; Morignat, Eric; Maugey, Xavier; Vinard, Jean-Luc; Hendrikx, Pascal; Ducrot, Christian; Calavas, Didier; Gay, Emilie

2013-04-30

The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging.The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer's lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases.
Isochrone Fittings for the Open Star Clusters NGC 3680 and Melotte 66

NASA Astrophysics Data System (ADS)

Guillemaud, Nikolas; Frinchaboy, P. M.; Thompson, B. A.

2013-01-01

I will be displaying the results from isochrone fittings on two open star clusters. The stellar evolution models used to generate the isochrones are from Dartmouth (Dotter et al. 2007) and Padova (Mango et al. 2008). Both of the models were applied to two star clusters: NGC 3680 and Melotte 66. The analysis is performed by utilizing infrared observations from the CPAPIR instrument; which is operated in conjunction with CTIO’s 1.5m telescope. This research was made possible by the NSF’s REU grant; award number 0851558.
Self-aggregation in scaled principal component space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Chris H.Q.; He, Xiaofeng; Zha, Hongyuan

2001-10-05

Automatic grouping of voluminous data into meaningful structures is a challenging task frequently encountered in broad areas of science, engineering and information processing. These data clustering tasks are frequently performed in Euclidean space or a subspace chosen from principal component analysis (PCA). Here we describe a space obtained by a nonlinear scaling of PCA in which data objects self-aggregate automatically into clusters. Projection into this space gives sharp distinctions among clusters. Gene expression profiles of cancer tissue subtypes, Web hyperlink structure and Internet newsgroups are analyzed to illustrate interesting properties of the space.
Robust statistical methods for hit selection in RNA interference high-throughput screening experiments.

PubMed

Zhang, Xiaohua Douglas; Yang, Xiting Cindy; Chung, Namjin; Gates, Adam; Stec, Erica; Kunapuli, Priya; Holder, Dan J; Ferrer, Marc; Espeseth, Amy S

2006-04-01

RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.
Extracting Galaxy Cluster Gas Inhomogeneity from X-Ray Surface Brightness: A Statistical Approach and Application to Abell 3667

NASA Astrophysics Data System (ADS)

Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi

2008-11-01

Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.
High- and low-level hierarchical classification algorithm based on source separation process

NASA Astrophysics Data System (ADS)

Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber

2016-10-01

High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

PubMed

Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K

2003-11-01

Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Measurement of entanglement entropy in the two-dimensional Potts model using wavelet analysis.

PubMed

Tomita, Yusuke

2018-05-01

A method is introduced to measure the entanglement entropy using a wavelet analysis. Using this method, the two-dimensional Haar wavelet transform of a configuration of Fortuin-Kasteleyn (FK) clusters is performed. The configuration represents a direct snapshot of spin-spin correlations since spin degrees of freedom are traced out in FK representation. A snapshot of FK clusters loses image information at each coarse-graining process by the wavelet transform. It is shown that the loss of image information measures the entanglement entropy in the Potts model.
A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB.

PubMed

Kent, Peter; Jensen, Rikke K; Kongsted, Alice

2014-10-02

There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups. Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.
Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra

NASA Astrophysics Data System (ADS)

Unglert, K.; Radić, V.; Jellinek, A. M.

2016-06-01

Variations in the spectral content of volcano seismicity related to changes in volcanic activity are commonly identified manually in spectrograms. However, long time series of monitoring data at volcano observatories require tools to facilitate automated and rapid processing. Techniques such as self-organizing maps (SOM) and principal component analysis (PCA) can help to quickly and automatically identify important patterns related to impending eruptions. For the first time, we evaluate the performance of SOM and PCA on synthetic volcano seismic spectra constructed from observations during two well-studied eruptions at Klauea Volcano, Hawai'i, that include features observed in many volcanic settings. In particular, our objective is to test which of the techniques can best retrieve a set of three spectral patterns that we used to compose a synthetic spectrogram. We find that, without a priori knowledge of the given set of patterns, neither SOM nor PCA can directly recover the spectra. We thus test hierarchical clustering, a commonly used method, to investigate whether clustering in the space of the principal components and on the SOM, respectively, can retrieve the known patterns. Our clustering method applied to the SOM fails to detect the correct number and shape of the known input spectra. In contrast, clustering of the data reconstructed by the first three PCA modes reproduces these patterns and their occurrence in time more consistently. This result suggests that PCA in combination with hierarchical clustering is a powerful practical tool for automated identification of characteristic patterns in volcano seismic spectra. Our results indicate that, in contrast to PCA, common clustering algorithms may not be ideal to group patterns on the SOM and that it is crucial to evaluate the performance of these tools on a control dataset prior to their application to real data.
From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

NASA Astrophysics Data System (ADS)

Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

2018-03-01

In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.
Retrospective space-time cluster analysis of whooping cough, re-emergence in Barcelona, Spain, 2000-2011.

PubMed

Solano, Rubén; Gómez-Barroso, Diana; Simón, Fernando; Lafuente, Sarah; Simón, Pere; Rius, Cristina; Gorrindo, Pilar; Toledo, Diana; Caylà, Joan A

2014-05-01

A retrospective, space-time study of whooping cough cases reported to the Public Health Agency of Barcelona, Spain between the years 2000 and 2011 is presented. It is based on 633 individual whooping cough cases and the 2006 population census from the Spanish National Statistics Institute, stratified by age and sex at the census tract level. Cluster identification was attempted using space-time scan statistic assuming a Poisson distribution and restricting temporal extent to 7 days and spatial distance to 500 m. Statistical calculations were performed with Stata 11 and SatScan and mapping was performed with ArcGis 10.0. Only clusters showing statistical significance (P <0.05) were mapped. The most likely cluster identified included five census tracts located in three neighbourhoods in central Barcelona during the week from 17 to 23 August 2011. This cluster included five cases compared with the expected level of 0.0021 (relative risk = 2436, P <0.001). In addition, 11 secondary significant space-time clusters were detected with secondary clusters occurring at different times and localizations. Spatial statistics is felt to be useful by complementing epidemiological surveillance systems through visualizing excess in the number of cases in space and time and thus increase the possibility of identifying outbreaks not reported by the surveillance system.
Detection of protein complex from protein-protein interaction network using Markov clustering

NASA Astrophysics Data System (ADS)

Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.

2017-05-01

Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Dark matter phenomenology of high-speed galaxy cluster collisions

DOE PAGES

Mishchenko, Yuriy; Ji, Chueng-Ryong

2017-07-29

Here, we perform a general computational analysis of possible post-collision mass distributions in high-speed galaxy cluster collisions in the presence of self-interacting dark matter. Using this analysis, we show that astrophysically weakly self-interacting dark matter can impart subtle yet measurable features in the mass distributions of colliding galaxy clusters even without significant disruptions to the dark matter halos of the colliding galaxy clusters themselves. Most profound such evidence is found to reside in the tails of dark matter halos’ distributions, in the space between the colliding galaxy clusters. Such features appear in our simulations as shells of scattered dark mattermore » expanding in alignment with the outgoing original galaxy clusters, contributing significant densities to projected mass distributions at large distances from collision centers and large scattering angles of up to 90°. Our simulations indicate that as much as 20% of the total collision’s mass may be deposited into such structures without noticeable disruptions to the main galaxy clusters. Such structures at large scattering angles are forbidden in purely gravitational high-speed galaxy cluster collisions.Convincing identification of such structures in real colliding galaxy clusters would be a clear indication of the self-interacting nature of dark matter. Our findings may offer an explanation for the ring-like dark matter feature recently identified in the long-range reconstructions of the mass distribution of the colliding galaxy cluster CL0024+017.« less
Dark matter phenomenology of high-speed galaxy cluster collisions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mishchenko, Yuriy; Ji, Chueng-Ryong

Here, we perform a general computational analysis of possible post-collision mass distributions in high-speed galaxy cluster collisions in the presence of self-interacting dark matter. Using this analysis, we show that astrophysically weakly self-interacting dark matter can impart subtle yet measurable features in the mass distributions of colliding galaxy clusters even without significant disruptions to the dark matter halos of the colliding galaxy clusters themselves. Most profound such evidence is found to reside in the tails of dark matter halos’ distributions, in the space between the colliding galaxy clusters. Such features appear in our simulations as shells of scattered dark mattermore » expanding in alignment with the outgoing original galaxy clusters, contributing significant densities to projected mass distributions at large distances from collision centers and large scattering angles of up to 90°. Our simulations indicate that as much as 20% of the total collision’s mass may be deposited into such structures without noticeable disruptions to the main galaxy clusters. Such structures at large scattering angles are forbidden in purely gravitational high-speed galaxy cluster collisions.Convincing identification of such structures in real colliding galaxy clusters would be a clear indication of the self-interacting nature of dark matter. Our findings may offer an explanation for the ring-like dark matter feature recently identified in the long-range reconstructions of the mass distribution of the colliding galaxy cluster CL0024+017.« less
[Visual field progression in glaucoma: cluster analysis].

PubMed

Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

2012-11-01

Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

Gene expression profiles reveal key genes for early diagnosis and treatment of adamantinomatous craniopharyngioma.

PubMed

Yang, Jun; Hou, Ziming; Wang, Changjiang; Wang, Hao; Zhang, Hongbing

2018-04-23

Adamantinomatous craniopharyngioma (ACP) is an aggressive brain tumor that occurs predominantly in the pediatric population. Conventional diagnosis method and standard therapy cannot treat ACPs effectively. In this paper, we aimed to identify key genes for ACP early diagnosis and treatment. Datasets GSE94349 and GSE68015 were obtained from Gene Expression Omnibus database. Consensus clustering was applied to discover the gene clusters in the expression data of GSE94349 and functional enrichment analysis was performed on gene set in each cluster. The protein-protein interaction (PPI) network was built by the Search Tool for the Retrieval of Interacting Genes, and hubs were selected. Support vector machine (SVM) model was built based on the signature genes identified from enrichment analysis and PPI network. Dataset GSE94349 was used for training and testing, and GSE68015 was used for validation. Besides, RT-qPCR analysis was performed to analyze the expression of signature genes in ACP samples compared with normal controls. Seven gene clusters were discovered in the differentially expressed genes identified from GSE94349 dataset. Enrichment analysis of each cluster identified 25 pathways that highly associated with ACP. PPI network was built and 46 hubs were determined. Twenty-five pathway-related genes that overlapped with the hubs in PPI network were used as signatures to establish the SVM diagnosis model for ACP. The prediction accuracy of SVM model for training, testing, and validation data were 94, 85, and 74%, respectively. The expression of CDH1, CCL2, ITGA2, COL8A1, COL6A2, and COL6A3 were significantly upregulated in ACP tumor samples, while CAMK2A, RIMS1, NEFL, SYT1, and STX1A were significantly downregulated, which were consistent with the differentially expressed gene analysis. SVM model is a promising classification tool for screening and early diagnosis of ACP. The ACP-related pathways and signature genes will advance our knowledge of ACP pathogenesis and benefit the therapy improvement.
Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

PubMed

Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

2016-01-01

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.
Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

PubMed Central

Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

2016-01-01

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646
Simultaneous determination of 19 flavonoids in commercial trollflowers by using high-performance liquid chromatography and classification of samples by hierarchical clustering analysis.

PubMed

Song, Zhiling; Hashi, Yuki; Sun, Hongyang; Liang, Yi; Lan, Yuexiang; Wang, Hong; Chen, Shizhong

2013-12-01

The flowers of Trollius species, named Jin Lianhua in Chinese, are widely used traditional Chinese herbs with vital biological activity that has been used for several decades in China to treat upper respiratory infections, pharyngitis, tonsillitis, and bronchitis. We developed a rapid and reliable method for simultaneous quantitative analysis of 19 flavonoids in trollflowers by using high-performance liquid chromatography (HPLC). Chromatography was performed on Inertsil ODS-3 C18 column, with gradient elution methanol-acetonitrile-water with 0.02% (v/v) formic acid. Content determination was used to evaluate the quality of commercial trollflowers from different regions in China, while three Trollius species (Trollius chinensis Bunge, Trollius ledebouri Reichb, Trollius buddae Schipcz) were explicitly distinguished by using hierarchical clustering analysis. The linearity, precision, accuracy, limit of detection, and limit of quantification were validated for the quantification method, which proved sensitive, accurate and reproducible indicating that the proposed approach was applicable for the routine analysis and quality control of trollflowers. © 2013.
Looking Wider and Further: The Evolution of Galaxies Inside Galaxy Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Yuanyuan

2016-01-01

Galaxy clusters are rare objects in the universe, but on-going wide field optical surveys are identifying many thousands of them to redshift 1.0 and beyond. Using early data from the Dark Energy Survey (DES) and publicly released data from the Sloan Digital Sky Survey (SDSS), this dissertation explores the evolution of cluster galaxies in the redshift range from 0 to 1.0. As it is common for deep wide field sky surveys like DES to struggle with galaxy detection efficiency at cluster core, the first component of this dissertation describes an efficient package that helps resolving the issue. The second partmore » focuses on the formation of cluster galaxies. The study quantifies the growth of cluster bright central galaxies (BCGs), and argues for the importance of merging and intra-cluster light production during BCG evolution. An analysis of cluster red sequence galaxy luminosity function is also performed, demonstrating that the abundance of these galaxies is mildly dependent on cluster mass and redshift. The last component of the dissertation characterizes the properties of galaxy filaments to help understanding cluster environments« less
Cognitive Model Exploration and Optimization: A New Challenge for Computational Science

DTIC Science & Technology

2010-03-01

the generation and analysis of computational cognitive models to explain various aspects of cognition. Typically the behavior of these models...computational scale of a workstation, so we have turned to high performance computing (HPC) clusters and volunteer computing for large-scale...computational resources. The majority of applications on the Department of Defense HPC clusters focus on solving partial differential equations (Post
Strategic groups, performance, and strategic response in the nursing home industry.

PubMed Central

Zinn, J S; Aaronson, W E; Rosko, M D

1994-01-01

OBJECTIVE. This study examines the effect of strategic group membership on nursing home performance and strategic behavior. DATA SOURCES AND STUDY SETTING. Data from the 1987 Medicare and Medicaid Automated Certification Survey were combined with data from the 1987 and 1989 Pennsylvania Long Term Care Facility Questionnaire. The sample consisted of 383 Pennsylvania nursing homes. STUDY DESIGN. Cluster analysis was used to place the 383 nursing homes into strategic groups on the basis of variables measuring scope and resource deployment. Performance was measured by indicators of the quality of nursing home care (rates of pressure ulcers, catheterization, and restraint usage) and efficiency in services provision. Changes in Medicare participation after passage of the 1988 Medicare Catastrophic Coverage Act (MCCA) measured strategic behavior. MANOVA and Turkey HSD post hoc means tests determined if significant differences were associated with strategic group membership. FINDINGS. Cluster analysis produced an optimal seven-group solution. Differences in group means were significant for the clustering, performance, and conduct variables (p < .0001). Strategic groups characterized by facilities providing a continuum of care services had the best patient care outcomes. The most efficient groups were characterized by facilities with high Medicare census. While all strategic groups increased Medicare census following passage of the MCCA, those dominated by for-profits had the greatest increases. CONCLUSIONS. Our analysis demonstrates that strategic orientation influences nursing home response to regulatory initiatives, a factor that should be recognized in policy formation directed at nursing home reform. PMID:8005789
Multi-scale visual analysis of time-varying electrocorticography data via clustering of brain regions

DOE PAGES

Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...

2017-06-06

There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study

PubMed Central

Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

2016-01-01

Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

PubMed

Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

2015-11-04

In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.
Objective and Perceived Weight: Associations with Risky Adolescent Sexual Behavior

PubMed Central

Akers, Aletha Y.; Cohen, Elan D.; Marshal, Michael P.; Roebuck, Geoff; Yu, Lan; Hipwell, Alison E.

2016-01-01

CONTEXT Studies have shown that obesity is associated with increased sexual risk-taking, particularly among adolescent females, but the relationships between obesity, perceived weight and sexual risk behaviors are poorly understood. METHODS Integrative data analysis was performed that combined baseline data from the 1994–1995 National Longitudinal Study of Adolescent Health (from 17,606 respondents in grades 7–12) and the 1997 National Longitudinal Survey of Youth (from 7,752 respondents aged 12–16). Using six sexual behaviors measured in both data sets (age at first intercourse, various measures of contraceptive use and number of partners), cluster analysis was conducted that identified five distinct behavior clusters. Multivariate ordinal logistic regression analysis examined associations between adolescents’ weight status (categorized as underweight, normal-weight, overweight or obese) and weight perception and their cluster membership. RESULTS Among males, being underweight, rather than normal-weight, was negatively associated with membership in increasingly risky clusters (odds ratio, 0.5), as was the perception of being overweight, as opposed to about the right weight (0.8). However, being overweight was positively associated with males’ membership in increasingly risky clusters (1.3). Among females, being obese, rather than normal-weight, was negatively correlated with membership in increasingly risky clusters (0.8), while the perception of being overweight was positively correlated with such membership (1.1). CONCLUSIONS Both objective and subjective assessments of weight are associated with the clustering of risky sexual behaviors among adolescents, and these behavioral patterns differ by gender. PMID:27608419
Objective and Perceived Weight: Associations with Risky Adolescent Sexual Behavior.

PubMed

Akers, Aletha Y; Cohen, Elan D; Marshal, Michael P; Roebuck, Geoff; Yu, Lan; Hipwell, Alison E

2016-09-01

Studies have shown that obesity is associated with increased sexual risk-taking, particularly among adolescent females, but the relationships between obesity, perceived weight and sexual risk behaviors are poorly understood. Integrative data analysis was performed that combined baseline data from the 1994-1995 National Longitudinal Study of Adolescent Health (from 17,606 respondents in grades 7-12) and the 1997 National Longitudinal Survey of Youth (from 7,752 respondents aged 12-16). Using six sexual behaviors measured in both data sets (age at first intercourse, various measures of contraceptive use and number of partners), cluster analysis was conducted that identified five distinct behavior clusters. Multivariate ordinal logistic regression analysis examined associations between adolescents' weight status (categorized as underweight, normal-weight, overweight or obese) and weight perception and their cluster membership. Among males, being underweight, rather than normal-weight, was negatively associated with membership in increasingly risky clusters (odds ratio, 0.5), as was the perception of being overweight, as opposed to about the right weight (0.8). However, being overweight was positively associated with males' membership in increasingly risky clusters (1.3). Among females, being obese, rather than normal-weight, was negatively correlated with membership in increasingly risky clusters (0.8), while the perception of being overweight was positively correlated with such membership (1.1). Both objective and subjective assessments of weight are associated with the clustering of risky sexual behaviors among adolescents, and these behavioral patterns differ by gender. Copyright © 2016 by the Guttmacher Institute.
Farm, household, and farmer characteristics associated with changes in management practices and technology adoption among dairy smallholders.

PubMed

Martínez-García, Carlos Galdino; Ugoretz, Sarah Janes; Arriaga-Jordán, Carlos Manuel; Wattiaux, Michel André

2015-02-01

This study explored whether technology adoption and changes in management practices were associated with farm structure, household, and farmer characteristics and to identify processes that may foster productivity and sustainability of small-scale dairy farming in the central highlands of Mexico. Factor analysis of survey data from 44 smallholders identified three factors-related to farm size, farmer's engagement, and household structure-that explained 70 % of cumulative variance. The subsequent hierarchical cluster analysis yielded three clusters. Cluster 1 included the most senior farmers with fewest years of education but greatest years of experience. Cluster 2 included farmers who reported access to extension, cooperative services, and more management changes. Cluster 2 obtained 25 and 35 % more milk than farmers in clusters 1 and 3, respectively. Cluster 3 included the youngest farmers, with most years of education and greatest availability of family labor. Access to a network and membership in a community of peers appeared as important contributors to success. Smallholders gravitated towards easy to implement technologies that have immediate benefits. Nonusers of high investment technologies found them unaffordable because of cost, insufficient farm size, and lack of knowledge or reliable electricity. Multivariate analysis may be a useful tool in planning extension activities and organizing channels of communication to effectively target farmers with varying needs, constraints, and motivations for change and in identifying farmers who may exemplify models of change for others who manage farms that are structurally similar but performing at a lower level.
An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

PubMed Central

Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.

2017-01-01

Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422
An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

PubMed

Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S

2017-04-01

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Clustering P-Wave Receiver Functions To Constrain Subsurface Seismic Structure

NASA Astrophysics Data System (ADS)

Chai, C.; Larmat, C. S.; Maceira, M.; Ammon, C. J.; He, R.; Zhang, H.

2017-12-01

The acquisition of high-quality data from permanent and temporary dense seismic networks provides the opportunity to apply statistical and machine learning techniques to a broad range of geophysical observations. Lekic and Romanowicz (2011) used clustering analysis on tomographic velocity models of the western United States to perform tectonic regionalization and the velocity-profile clusters agree well with known geomorphic provinces. A complementary and somewhat less restrictive approach is to apply cluster analysis directly to geophysical observations. In this presentation, we apply clustering analysis to teleseismic P-wave receiver functions (RFs) continuing efforts of Larmat et al. (2015) and Maceira et al. (2015). These earlier studies validated the approach with surface waves and stacked EARS RFs from the USArray stations. In this study, we experiment with both the K-means and hierarchical clustering algorithms. We also test different distance metrics defined in the vector space of RFs following Lekic and Romanowicz (2011). We cluster data from two distinct data sets. The first, corresponding to the western US, was by smoothing/interpolation of receiver-function wavefield (Chai et al. 2015). Spatial coherence and agreement with geologic region increase with this simpler, spatially smoothed set of observations. The second data set is composed of RFs for more than 800 stations of the China Digital Seismic Network (CSN). Preliminary results show a first order agreement between clusters and tectonic region and each region cluster includes a distinct Ps arrival, which probably reflects differences in crustal thickness. Regionalization remains an important step to characterize a model prior to application of full waveform and/or stochastic imaging techniques because of the computational expense of these types of studies. Machine learning techniques can provide valuable information that can be used to design and characterize formal geophysical inversion, providing information on spatial variability in the subsurface geology.
The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

NASA Astrophysics Data System (ADS)

Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

2012-07-01

We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.
Analysis of genetic association in Listeria and Diabetes using Hierarchical Clustering and Silhouette Index

NASA Astrophysics Data System (ADS)

Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.

2016-04-01

It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.
Cancer Transcriptome Dataset Analysis: Comparing Methods of Pathway and Gene Regulatory Network-Based Cluster Identification.

PubMed

Nam, Seungyoon

2017-04-01

Cancer transcriptome analysis is one of the leading areas of Big Data science, biomarker, and pharmaceutical discovery, not to forget personalized medicine. Yet, cancer transcriptomics and postgenomic medicine require innovation in bioinformatics as well as comparison of the performance of available algorithms. In this data analytics context, the value of network generation and algorithms has been widely underscored for addressing the salient questions in cancer pathogenesis. Analysis of cancer trancriptome often results in complicated networks where identification of network modularity remains critical, for example, in delineating the "druggable" molecular targets. Network clustering is useful, but depends on the network topology in and of itself. Notably, the performance of different network-generating tools for network cluster (NC) identification has been little investigated to date. Hence, using gastric cancer (GC) transcriptomic datasets, we compared two algorithms for generating pathway versus gene regulatory network-based NCs, showing that the pathway-based approach better agrees with a reference set of cancer-functional contexts. Finally, by applying pathway-based NC identification to GC transcriptome datasets, we describe cancer NCs that associate with candidate therapeutic targets and biomarkers in GC. These observations collectively inform future research on cancer transcriptomics, drug discovery, and rational development of new analysis tools for optimal harnessing of omics data.
Sensitivity and specificity of univariate MRI analysis of experimentally degraded cartilage

PubMed Central

Lin, Ping-Chang; Reiter, David A.; Spencer, Richard G.

2010-01-01

MRI is increasingly used to evaluate cartilage in tissue constructs, explants, and animal and patient studies. However, while mean values of MR parameters, including T1, T2, magnetization transfer rate km, apparent diffusion coefficient ADC, and the dGEMRIC-derived fixed charge density, correlate with tissue status, the ability to classify tissue according to these parameters has not been explored. Therefore, the sensitivity and specificity with which each of these parameters was able to distinguish between normal and trypsin- degraded, and between normal and collagenase-degraded, cartilage explants were determined. Initial analysis was performed using a training set to determine simple group means to which parameters obtained from a validation set were compared. T1 and ADC showed the greatest ability to discriminate between normal and degraded cartilage. Further analysis with k-means clustering, which eliminates the need for a priori identification of sample status, generally performed comparably. Use of fuzzy c-means (FCM) clustering to define centroids likewise did not result in improvement in discrimination. Finally, a FCM clustering approach in which validation samples were assigned in a probabilistic fashion to control and degraded groups was implemented, reflecting the range of tissue characteristics seen with cartilage degradation. PMID:19705467

Integrating Multiple Data Views for Improved Malware Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Blake H.

2014-01-31

Exploiting multiple views of a program makes obfuscating the intended behavior of a program more difficult allowing for better performance in classification, clustering, and phylogenetic reconstruction.
Identification of five clusters of comorbidities in a longitudinal Japanese chronic obstructive pulmonary disease cohort.

PubMed

Chubachi, Shotaro; Sato, Minako; Kameyama, Naofumi; Tsutsumi, Akihiro; Sasaki, Mamoru; Tateno, Hiroki; Nakamura, Hidetoshi; Asano, Koichiro; Betsuyaku, Tomoko

2016-08-01

Patients with chronic obstructive pulmonary disease (COPD) frequently suffer from various comorbidities. Recently, cluster analysis has been proposed to examine the phenotypic heterogeneity in COPD. In order to comprehensively understand the comorbidities of COPD in Japan, we conducted multicenter, longitudinal cohort study, called the Keio COPD Comorbidity Research (K-CCR). In this cohort, comorbid diagnoses were established by both objective examination and review of clinical records, in addition to self-report. We aimed to investigate the clustering of nineteen clinically relevant comorbidities and the meaningful outcomes of the clusters over a two-year follow-up period. The present study analyzed data from COPD patients whose data of comorbidities were completed (n = 311). Cluster analysis was performed using Ward's minimum-variance method. Five comorbidity clusters were identified: less comorbidity; malignancy; metabolic and cardiovascular; gastroesophageal reflux disease (GERD) and psychological; and underweight and anemic. FEV1 did not differ among the clusters. GERD and psychological cluster had worse COPD assessment test (CAT) and Saint George's respiratory questionnaire (SGRQ) at baseline compared to the other clusters (CAT: p = 0.0003 and SGRQ: p = 0.00046). The rate of change in these scores did not differ within 2 years. The underweight and anemic cluster included subjects with lower baseline ratio of predicted diffusing capacity (DLco/VA) compared to the malignancy cluster (p = 0.036). Five clusters of comorbidities were identified in Japanese COPD patients. The clinical characteristics and health-related quality of life were different among these clusters during a follow-up of two years. Copyright © 2016 Elsevier Ltd. All rights reserved.
Analysis of the convective evaporation of nondilute clusters of drops

NASA Technical Reports Server (NTRS)

Bellan, J.; Harstad, K.

1987-01-01

The penetration distance of an outer flow into a drop cluster volume is the critical, evaporation mode-controlling parameter in the present model for nondilute drop clusters' convective evaporation. The model is found to perform well for such low penetration distances as those obtained for dense clusters in hot environments and low relative velocities between the outer gases and the cluster. For large penetration distances, however, the predictive power of the model deteriorates; in addition, the evaporation time is found to be a weak function of the initial relative velocity and a strong function of the initial drop temperature. The results generally show that the interior drop temperature was transient throughout the drop lifetime, although temperature nonuniformities persisted up to the first third of the total evaporation time at most.
Star clusters: age, metallicity and extinction from integrated spectra

NASA Astrophysics Data System (ADS)

González Delgado, Rosa M.; Cid Fernandes, Roberto

2010-01-01

Integrated optical spectra of star clusters in the Magellanic Clouds and a few Galactic globular clusters are fitted using high-resolution spectral models for single stellar populations. The goal is to estimate the age, metallicity and extinction of the clusters, and evaluate the degeneracies among these parameters. Several sets of evolutionary models that were computed with recent high-spectral-resolution stellar libraries (MILES, GRANADA, STELIB), are used as inputs to the starlight code to perform the fits. The comparison of the results derived from this method and previous estimates available in the literature allow us to evaluate the pros and cons of each set of models to determine star cluster properties. In addition, we quantify the uncertainties associated with the age, metallicity and extinction determinations resulting from variance in the ingredients for the analysis.
Wechsler Adult Intelligence Scale-Third Edition profiles and their relationship to self-reported outcome following traumatic brain injury.

PubMed

Harman-Smith, Yasmin E; Mathias, Jane L; Bowden, Stephen C; Rosenfeld, Jeffrey V; Bigler, Erin D

2013-01-01

Neuropsychological assessments of outcome after traumatic brain injury (TBI) are often unrelated to self-reported problems after TBI. The current study cluster-analyzed the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) subtest scores from mild, moderate, and severe TBI (n=220) and orthopedic injury control (n=95) groups, to determine whether specific cognitive profiles are related to people's perceived outcomes after TBI. A two-stage cluster analysis produced 4- and 6-cluster solutions, with the 6-cluster solution better capturing subtle variations in cognitive functioning. The 6 clusters differed in the levels and profiles of cognitive performance, self-reported recovery, and education and injury severity. The findings suggest that subtle cognitive impairments after TBI should be interpreted in conjunction with patient's self-reported problems.
Ion induced electron emission statistics under Agm- cluster bombardment of Ag

NASA Astrophysics Data System (ADS)

Breuers, A.; Penning, R.; Wucher, A.

2018-05-01

The electron emission from a polycrystalline silver surface under bombardment with Agm- cluster ions (m = 1, 2, 3) is investigated in terms of ion induced kinetic excitation. The electron yield γ is determined directly by a current measurement method on the one hand and implicitly by the analysis of the electron emission statistics on the other hand. Successful measurements of the electron emission spectra ensure a deeper understanding of the ion induced kinetic electron emission process, with particular emphasis on the effect of the projectile cluster size to the yield as well as to emission statistics. The results allow a quantitative comparison to computer simulations performed for silver atoms and clusters impinging onto a silver surface.
A Survey of Variable Extragalactic Sources with XTE's All Sky Monitor (ASM)

NASA Technical Reports Server (NTRS)

Jernigan, Garrett

1998-01-01

The original goal of the project was the near real-time detection of AGN utilizing the SSC 3 of the ASM on XTE which does a deep integration on one 100 square degree region of the sky. While the SSC never performed sufficiently well to allow the success of this goal, the work on the project has led to the development of a new analysis method for coded aperture systems which has now been applied to ASM data for mapping regions near clusters of galaxies such as the Perseus Cluster and the Coma Cluster. Publications are in preparation that describe both the new method and the results from mapping clusters of galaxies.
Deep Brain Stimulation of the Subthalamic Nucleus Improves Lexical Switching in Parkinsons Disease Patients.

PubMed

Vonberg, Isabelle; Ehlen, Felicitas; Fromm, Ortwin; Kühn, Andrea A; Klostermann, Fabian

2016-01-01

Reduced verbal fluency (VF) has been reported in patients with Parkinson's disease (PD), especially those treated by Deep Brain Stimulation of the subthalamic nucleus (STN DBS). To delineate the nature of this dysfunction we aimed at identifying the particular VF-related operations modified by STN DBS. Eleven PD patients performed VF tasks in their STN DBS ON and OFF condition. To differentiate VF-components modulated by the stimulation, a temporal cluster analysis was performed, separating production spurts (i.e., 'clusters' as correlates of automatic activation spread within lexical fields) from slower cluster transitions (i.e., 'switches' reflecting set-shifting towards new lexical fields). The results were compared to those of eleven healthy control subjects. PD patients produced significantly more switches accompanied by shorter switch times in the STN DBS ON compared to the STN DBS OFF condition. The number of clusters and time intervals between words within clusters were not affected by the treatment state. Although switch behavior in patients with DBS ON improved, their task performance was still lower compared to that of healthy controls. Beyond impacting on motor symptoms, STN DBS seems to influence the dynamics of cognitive procedures. Specifically, the results are in line with basal ganglia roles for cognitive switching, in the particular case of VF, from prevailing lexical concepts to new ones.
Comparative analysis of prophages in Streptococcus mutans genomes

PubMed Central

Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

2017-01-01

Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986
Fast clustering using adaptive density peak detection.

PubMed

Wang, Xiao-Feng; Xu, Yifan

2017-12-01

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
The effect of billboard design specifications on driving: A pilot study.

PubMed

Marciano, Hadas; Setter, Pe'erly

2017-07-01

Decades of research on the effects of advertising billboards on road accident rates, driver performance, and driver visual scanning behavior, has produced no conclusive findings. We suggest that road safety researchers should shift their focus and attempt to identify the billboard characteristics that are most distracting to drivers. This line of research may produce concrete guidelines for permissible billboards that would be likely to reduce the influence of the billboards on road safety. The current study is a first step towards this end. A pool of 161 photos of real advertising billboards was used as stimuli within a triple task paradigm designed to simulate certain components of driving. Each trial consisted of one ongoing tracking task accompanied by two additional concurrent tasks: (1) billboard observation task; and (2) circle color change identification task. Five clusters of billboards, identified by conducting a cluster analysis of their graphic content, were used as a within variable in one-way ANOVAs conducted on performance level data collected from the multiple tasks. Cluster 5, labeled Loaded Billboards, yielded significantly deteriorated performance on the tracking task. Cluster 4, labeled Graphical Billboards, yielded deteriorated performance primarily on the color change identification task. Cluster 3, labeled Minimal Billboards, had no effect on any of these tasks. We strongly recommend that these clusters be systematically explored in experiments involving additional real driving settings, such as driving simulators and field studies. This will enable validation of the current results and help incorporate them into real driving situations. Copyright © 2017. Published by Elsevier Ltd.
Prevalence and risk factors for scrub typhus in South India.

PubMed

Trowbridge, Paul; P, Divya; Premkumar, Prasanna S; Varghese, George M

2017-05-01

To determine the prevalence and risk factors of scrub typhus in Tamil Nadu, South India. We performed a clustered seroprevalence study of the areas around Vellore. All participants completed a risk factor survey, with seropositive and seronegative participants acting as cases and controls, respectively, in a risk factor analysis. After univariate analysis, variables found to be significant underwent multivariate analysis. Of 721 people participating in this study, 31.8% tested seropositive. By univariate analysis, after accounting for clustering, having a house that was clustered with other houses, having a fewer rooms in a house, having fewer people living in a household, defecating outside, female sex, age >60 years, shorter height, lower weight, smaller body mass index and smaller mid-upper arm circumference were found to be significantly associated with seropositivity. After multivariate regression modelling, living in a house clustered with other houses, female sex and age >60 years were significantly associated with scrub typhus exposure. Overall, scrub typhus is much more common than previously thought. Previously described individual environmental and habitual risk factors seem to have less importance in South India, perhaps because of the overall scrub typhus-conducive nature of the environment in this region. © 2017 John Wiley & Sons Ltd.
Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

PubMed Central

Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

2014-01-01

Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182
Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes.

PubMed

Azevedo, Analice C; Bento, Cláudia B P; Ruiz, Jeronimo C; Queiroz, Marisa V; Mantovani, Hilário C

2015-10-01

Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Characterization of Erwinia chrysanthemi by pectinolytic isozyme polymorphism and restriction fragment length polymorphism analysis of PCR-amplified fragments of pel genes.

PubMed Central

Nassar, A; Darrasse, A; Lemattre, M; Kotoujansky, A; Dervin, C; Vedel, R; Bertheau, Y

1996-01-01

Conserved regions about 420 bp long of the pelADE cluster specific to Erwinia chrysanthemi were amplified by PCR and used to differentiate 78 strains of E. chrysanthemi that were obtained from different hosts and geographical areas. No PCR products were obtained from DNA samples extracted from other pectinolytic and nonpectinolytic species and genera. The pel fragments amplified from the E. chrysanthemi strains studied were compared by performing a restriction fragment length polymorphism (RFLP) analysis. On the basis of similarity coefficients derived from the RFLP analysis, the strains were separated into 16 PCR RFLP patterns grouped in six clusters, These clusters appeared to be correlated with other infraspecific levels of E. chrysanthemi classification, such as pathovar and biovar, and occasionally with geographical origin. Moreover, the clusters correlated well with the polymorphism of pectate lyase and pectin methylesterase isoenzymes. While the pectin methylesterase profiles correlated with host monocot-dicot classification, the pectate lyase polymorphism might reflect the cell wall microdomains of the plants belonging to these classes. PMID:8779560
Genetic diversity and population structure analysis between Indian red jungle fowl and domestic chicken using microsatellite markers.

PubMed

Kumar, Vinay; Shukla, Sanjeev K; Mathew, Jose; Sharma, Deepak

2015-01-01

The present study was conducted to assess the genetic diversity, population structure, and relatedness in Indian red jungle fowl (RJF, Gallus gallus murgi) from northern India and three domestic chicken populations (gallus gallus domesticus), maintained at the institute farms, namely White Leghorn (WL), Aseel (AS) and Red Cornish (RC) using 25 microsatellite markers. All the markers were polymorphic, the number of alleles at each locus ranged from five (MCW0111) to forty-three (LEI0212) with an average number of 19 alleles per locus. Across all loci, the mean expected heterozygosity and polymorphic information content were 0.883 and 0.872, respectively. Population-specific alleles were found in each population. A UPGMA dendrogram based on shared allele distances clearly revealed two major clusters among the four populations; cluster I had genotypes from RJF and WL whereas cluster II had AS and RC genotypes. Furthermore, the estimation of population structure was performed to understand how genetic variation is partitioned within and among populations. The maximum ▵K value was observed for K = 4 with four identified clusters. Furthermore, factorial analysis clearly showed four clustering; each cluster represented the four types of population used in the study. These results clearly, demonstrate the potential of microsatellite markers in elucidating the genetic diversity, relationships, and population structure analysis in RJF and domestic chicken populations.
Factors influencing students' performance in a Brazilian dental school.

PubMed

Silva, Erica Tatiane da; Nunes, Maria de Fátima; Queiroz, Maria Goretti; Leles, Cláudio R

2010-01-01

Comprehensive assessment of students' academic performance plays an important role in educational planning. The aim of this study was to investigate variables that influence student's performance in a retrospective sample including all undergraduate students who entered in a Brazilian dental school, in a 20-year period between 1984 and 2003 (n=1182). Demographic and educational variables were used to predict performance in the overall curriculum and course groups. Cluster analysis (K-means algorithm) categorized students into groups of higher, moderate or lower performance. Clusters of overall performance showed external validity, demonstrated by Chi-square test and ANOVA. Lower performance groups had the smallest number of students in overall performance and course groups clusters, ranging from 11.8% (clinical courses) to 19.2% (basic courses). Students' performance was more satisfactory in dental and clinical courses, rather than basic and non-clinical courses (p<0.001). Better student's performance was predicted by lower time elapsed between completion of high school and dental school admission, female gender, better rank in admission test, class attendance rate and student workload hours in teaching, research and extension (R(2)=0.491). Findings give evidence about predictors of undergraduate students' performance and reinforce the need for curricular reformulation focused on with improvement of integration among courses.
Effects of cluster location and cluster distribution on performance on the traveling salesman problem.

PubMed

MacGregor, James N

2015-10-01

Research on human performance in solving traveling salesman problems typically uses point sets as stimuli, and most models have proposed a processing stage at which stimulus dots are clustered. However, few empirical studies have investigated the effects of clustering on performance. In one recent study, researchers compared the effects of clustered, random, and regular stimuli, and concluded that clustering facilitates performance (Dry, Preiss, & Wagemans, 2012). Another study suggested that these results may have been influenced by the location rather than the degree of clustering (MacGregor, 2013). Two experiments are reported that mark an attempt to disentangle these factors. The first experiment tested several combinations of degree of clustering and cluster location, and revealed mixed evidence that clustering influences performance. In a second experiment, both factors were varied independently, showing that they interact. The results are discussed in terms of the importance of clustering effects, in particular, and perceptual factors, in general, during performance of the traveling salesman problem.
Pattern Activity Clustering and Evaluation (PACE)

NASA Astrophysics Data System (ADS)

Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna

2012-06-01

With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.
Gastrointestinal Fibroblasts Have Specialized, Diverse Transcriptional Phenotypes: A Comprehensive Gene Expression Analysis of Human Fibroblasts

PubMed Central

Ishii, Genichiro; Aoyagi, Kazuhiko; Sasaki, Hiroki; Ochiai, Atsushi

2015-01-01

Background Fibroblasts are the principal stromal cells that exist in whole organs and play vital roles in many biological processes. Although the functional diversity of fibroblasts has been estimated, a comprehensive analysis of fibroblasts from the whole body has not been performed and their transcriptional diversity has not been sufficiently explored. The aim of this study was to elucidate the transcriptional diversity of human fibroblasts within the whole body. Methods Global gene expression analysis was performed on 63 human primary fibroblasts from 13 organs. Of these, 32 fibroblasts from gastrointestinal organs (gastrointestinal fibroblasts: GIFs) were obtained from a pair of 2 anatomical sites: the submucosal layer (submucosal fibroblasts: SMFs) and the subperitoneal layer (subperitoneal fibroblasts: SPFs). Using hierarchical clustering analysis, we elucidated identifiable subgroups of fibroblasts and analyzed the transcriptional character of each subgroup. Results In unsupervised clustering, 2 major clusters that separate GIFs and non-GIFs were observed. Organ- and anatomical site-dependent clusters within GIFs were also observed. The signature genes that discriminated GIFs from non-GIFs, SMFs from SPFs, and the fibroblasts of one organ from another organ consisted of genes associated with transcriptional regulation, signaling ligands, and extracellular matrix remodeling. Conclusions GIFs are characteristic fibroblasts with specific gene expressions from transcriptional regulation, signaling ligands, and extracellular matrix remodeling related genes. In addition, the anatomical site- and organ-dependent diversity of GIFs was also discovered. These features of GIFs contribute to their specific physiological function and homeostatic maintenance, and create a functional diversity of the gastrointestinal tract. PMID:26046848

Construction and Utilization of a Beowulf Computing Cluster: A User's Perspective

NASA Technical Reports Server (NTRS)

Woods, Judy L.; West, Jeff S.; Sulyma, Peter R.

2000-01-01

Lockheed Martin Space Operations - Stennis Programs (LMSO) at the John C Stennis Space Center (NASA/SSC) has designed and built a Beowulf computer cluster which is owned by NASA/SSC and operated by LMSO. The design and construction of the cluster are detailed in this paper. The cluster is currently used for Computational Fluid Dynamics (CFD) simulations. The CFD codes in use and their applications are discussed. Examples of some of the work are also presented. Performance benchmark studies have been conducted for the CFD codes being run on the cluster. The results of two of the studies are presented and discussed. The cluster is not currently being utilized to its full potential; therefore, plans are underway to add more capabilities. These include the addition of structural, thermal, fluid, and acoustic Finite Element Analysis codes as well as real-time data acquisition and processing during test operations at NASA/SSC. These plans are discussed as well.
Association of Inflammatory Cytokines With the Symptom Cluster of Pain, Fatigue, Depression, and Sleep Disturbance in Chinese Patients With Cancer.

PubMed

Ji, Yan-Bo; Bo, Chun-Lu; Xue, Xiu-Juan; Weng, En-Ming; Gao, Guang-Chao; Dai, Bei-Bei; Ding, Kai-Wen; Xu, Cui-Ping

2017-12-01

Pain, fatigue, depression, and sleep disturbance are common in patients with cancer and usually co-occur as a symptom cluster. However, the mechanism underlying this symptom cluster is unclear. This study aimed to identify subgroups of cluster symptoms, compare demographic and clinical characteristics between subgroups, and examine the associations between inflammatory cytokines and cluster symptoms. Participants were 170 Chinese inpatients with cancer from two tertiary hospitals. Inflammatory markers including interleukin-6 (IL-6), interleukin-1 receptor antagonist, and tumor necrosis factor alpha were measured. Intergroup differences and associations of inflammatory cytokines with the cluster symptoms were examined with one-way analyses of variance and logistic regression. Based on cluster analysis, participants were categorized into Subgroup 1 (all low symptoms), Subgroup 2 (low pain and moderate fatigue), or Subgroup 3 (moderate-to-high on all symptoms). The three subgroups differed significantly in Eastern Cooperative Oncology Group (ECOG) performance status, sex, residence, current treatment, education, economic status, and inflammatory cytokines levels (all P < 0.05). Compared with Subgroup 1, Subgroup 3 had a significantly poorer ECOG physical performance status and higher IL-6 levels, were more often treated with combined chemoradiotherapy, and were more likely to be rural residents. IL-6 and ECOG physical performance status were significantly associated with 1.246-fold (95% CI 1.114-1.396) and 31.831-fold (95% CI 6.017-168.385) increased risk of Subgroup 3. Our findings suggest that IL-6 levels are associated with cluster symptoms in cancer patients. Clinicians should identify patients at risk for more severe symptoms and formulate novel target interventions to improve symptom management. Copyright © 2017. Published by Elsevier Inc.
Estimating global distribution of boreal, temperate, and tropical tree plant functional types using clustering techniques

NASA Astrophysics Data System (ADS)

Wang, Audrey; Price, David T.

2007-03-01

A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
A pattern-mixture model approach for handling missing continuous outcome data in longitudinal cluster randomized trials.

PubMed

Fiero, Mallorie H; Hsu, Chiu-Hsieh; Bell, Melanie L

2017-11-20

We extend the pattern-mixture approach to handle missing continuous outcome data in longitudinal cluster randomized trials, which randomize groups of individuals to treatment arms, rather than the individuals themselves. Individuals who drop out at the same time point are grouped into the same dropout pattern. We approach extrapolation of the pattern-mixture model by applying multilevel multiple imputation, which imputes missing values while appropriately accounting for the hierarchical data structure found in cluster randomized trials. To assess parameters of interest under various missing data assumptions, imputed values are multiplied by a sensitivity parameter, k, which increases or decreases imputed values. Using simulated data, we show that estimates of parameters of interest can vary widely under differing missing data assumptions. We conduct a sensitivity analysis using real data from a cluster randomized trial by increasing k until the treatment effect inference changes. By performing a sensitivity analysis for missing data, researchers can assess whether certain missing data assumptions are reasonable for their cluster randomized trial. Copyright © 2017 John Wiley & Sons, Ltd.
An Empirical Taxonomy of Hospital Governing Board Roles

PubMed Central

Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R

2008-01-01

Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260
Performance Analysis of the ARL Linux Networx Cluster

DTIC Science & Technology

2004-06-01

OVERFLOW, used processors selected by SGE. All benchmarks on the GAMESS, COBALT, LSDYNA and FLUENT. Each code Origin 3800 were executed using IRIX cpusets...scheduler. for these benchmarks defines a missile with grid fins consisting of seventeen million cells [31. 4. Application Performance Results and
Investigating the long-term course of schizophrenia by sequence analysis.

PubMed

An der Heiden, Wolfram; Häfner, Heinz

2015-08-30

In the present study we set out to explore the long-term clinical course of schizophrenia in a holistic manner by adopting sequence analysis. Our aim was to identify course types of illness by means of cluster analysis. The study was based on course and outcome data for 107 patients followed up over 134 months after first admission in the ABC Schizophrenia Study. Focusing on the main syndromes (positive, negative, depressive and unspecific symptoms) and their combinations we looked for similarities in individual illness courses using the 'optimal matching' method. A cluster analysis performed on the resulting similarity matrix yielded two main groups (a 'improving' and a 'chronic' group), which comprised a total of six different types of illness course. The course types differed in both quantitative (frequency of syndromes and syndrome combinations) and qualitative terms (clinical presentation, sequence of syndromes). Cluster membership was only rarely, but clearly associated with sociodemographic characteristics, treatment data and other illness variables. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
NeatMap--non-clustering heat map alternatives in R.

PubMed

Rajaram, Satwik; Oono, Yoshi

2010-01-22

The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map. NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets. NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.
A genetic graph-based approach for partitional clustering.

PubMed

Menéndez, Héctor D; Barrero, David F; Camacho, David

2014-05-01

Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
A Systems Biology Approach for Identifying Hepatotoxicant Groups Based on Similarity in Mechanisms of Action and Chemical Structure.

PubMed

Hebels, Dennie G A J; Rasche, Axel; Herwig, Ralf; van Westen, Gerard J P; Jennen, Danyel G J; Kleinjans, Jos C S

2016-01-01

When evaluating compound similarity, addressing multiple sources of information to reach conclusions about common pharmaceutical and/or toxicological mechanisms of action is a crucial strategy. In this chapter, we describe a systems biology approach that incorporates analyses of hepatotoxicant data for 33 compounds from three different sources: a chemical structure similarity analysis based on the 3D Tanimoto coefficient, a chemical structure-based protein target prediction analysis, and a cross-study/cross-platform meta-analysis of in vitro and in vivo human and rat transcriptomics data derived from public resources (i.e., the diXa data warehouse). Hierarchical clustering of the outcome scores of the separate analyses did not result in a satisfactory grouping of compounds considering their known toxic mechanism as described in literature. However, a combined analysis of multiple data types may hypothetically compensate for missing or unreliable information in any of the single data types. We therefore performed an integrated clustering analysis of all three data sets using the R-based tool iClusterPlus. This indeed improved the grouping results. The compound clusters that were formed by means of iClusterPlus represent groups that show similar gene expression while simultaneously integrating a similarity in structure and protein targets, which corresponds much better with the known mechanism of action of these toxicants. Using an integrative systems biology approach may thus overcome the limitations of the separate analyses when grouping liver toxicants sharing a similar mechanism of toxicity.
Cluster: Mission Overview and End-of-Life Analysis

NASA Technical Reports Server (NTRS)

Pallaschke, S.; Munoz, I.; Rodriquez-Canabal, J.; Sieg, D.; Yde, J. J.

2007-01-01

The Cluster mission is part of the scientific programme of the European Space Agency (ESA) and its purpose is the analysis of the Earth's magnetosphere. The Cluster project consists of four satellites. The selected polar orbit has a shape of 4.0 and 19.2 Re which is required for performing measurements near the cusp and the tail of the magnetosphere. When crossing these regions the satellites form a constellation which in most of the cases so far has been a regular tetrahedron. The satellite operations are carried out by the European Space Operations Centre (ESOC) at Darmstadt, Germany. The paper outlines the future orbit evolution and the envisaged operations from a Flight Dynamics point of view. In addition a brief summary of the LEOP and routine operations is included beforehand.
Exploring the effects of climatic variables on monthly precipitation variation using a continuous wavelet-based multiscale entropy approach.

PubMed

Roushangar, Kiyoumars; Alizadeh, Farhad; Adamowski, Jan

2018-08-01

Understanding precipitation on a regional basis is an important component of water resources planning and management. The present study outlines a methodology based on continuous wavelet transform (CWT) and multiscale entropy (CWME), combined with self-organizing map (SOM) and k-means clustering techniques, to measure and analyze the complexity of precipitation. Historical monthly precipitation data from 1960 to 2010 at 31 rain gauges across Iran were preprocessed by CWT. The multi-resolution CWT approach segregated the major features of the original precipitation series by unfolding the structure of the time series which was often ambiguous. The entropy concept was then applied to components obtained from CWT to measure dispersion, uncertainty, disorder, and diversification of subcomponents. Based on different validity indices, k-means clustering captured homogenous areas more accurately, and additional analysis was performed based on the outcome of this approach. The 31 rain gauges in this study were clustered into 6 groups, each one having a unique CWME pattern across different time scales. The results of clustering showed that hydrologic similarity (multiscale variation of precipitation) was not based on geographic contiguity. According to the pattern of entropy across the scales, each cluster was assigned an entropy signature that provided an estimation of the entropy pattern of precipitation data in each cluster. Based on the pattern of mean CWME for each cluster, a characteristic signature was assigned, which provided an estimation of the CWME of a cluster across scales of 1-2, 3-8, and 9-13 months relative to other stations. The validity of the homogeneous clusters demonstrated the usefulness of the proposed approach to regionalize precipitation. Further analysis based on wavelet coherence (WTC) was performed by selecting central rain gauges in each cluster and analyzing against temperature, wind, Multivariate ENSO index (MEI), and East Atlantic (EA) and North Atlantic Oscillation (NAO), indeces. The results revealed that all climatic features except NAO influenced precipitation in Iran during the 1960-2010 period. Copyright © 2018 Elsevier Inc. All rights reserved.
Waiting-time distributions of magnetic discontinuities: clustering or Poisson process?

PubMed

Greco, A; Matthaeus, W H; Servidio, S; Dmitruk, P

2009-10-01

Using solar wind data from the Advanced Composition Explorer spacecraft, with the support of Hall magnetohydrodynamic simulations, the waiting-time distributions of magnetic discontinuities have been analyzed. A possible phenomenon of clusterization of these discontinuities is studied in detail. We perform a local Poisson's analysis in order to establish if these intermittent events are randomly distributed or not. Possible implications about the nature of solar wind discontinuities are discussed.
Waiting-time distributions of magnetic discontinuities: Clustering or Poisson process?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Greco, A.; Matthaeus, W. H.; Servidio, S.

2009-10-15

Using solar wind data from the Advanced Composition Explorer spacecraft, with the support of Hall magnetohydrodynamic simulations, the waiting-time distributions of magnetic discontinuities have been analyzed. A possible phenomenon of clusterization of these discontinuities is studied in detail. We perform a local Poisson's analysis in order to establish if these intermittent events are randomly distributed or not. Possible implications about the nature of solar wind discontinuities are discussed.
Quasichemical analysis of the cluster-pair approximation for the thermodynamics of proton hydration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pollard, Travis; Beck, Thomas L.; Department of Physics, University of Cincinnati, Cincinnati, Ohio 45221

2014-06-14

A theoretical analysis of the cluster-pair approximation (CPA) is presented based on the quasichemical theory of solutions. The sought single-ion hydration free energy of the proton includes an interfacial potential contribution by definition. It is shown, however, that the CPA involves an extra-thermodynamic assumption that does not guarantee uniform convergence to a bulk free energy value with increasing cluster size. A numerical test of the CPA is performed using the classical polarizable AMOEBA force field and supporting quantum chemical calculations. The enthalpy and free energy differences are computed for the kosmotropic Na{sup +}/F{sup −} ion pair in water clusters ofmore » size n = 5, 25, 105. Additional calculations are performed for the chaotropic Rb{sup +}/I{sup −} ion pair. A small shift in the proton hydration free energy and a larger shift in the hydration enthalpy, relative to the CPA values, are predicted based on the n = 105 simulations. The shifts arise from a combination of sequential hydration and interfacial potential effects. The AMOEBA and quantum chemical results suggest an electrochemical surface potential of water in the range −0.4 to −0.5 V. The physical content of single-ion free energies and implications for ion-water force field development are also discussed.« less
Global detection approach for clustered microcalcifications in mammograms using a deep learning network.

PubMed

Wang, Juan; Nishikawa, Robert M; Yang, Yongyi

2017-04-01

In computerized detection of clustered microcalcifications (MCs) from mammograms, the traditional approach is to apply a pattern detector to locate the presence of individual MCs, which are subsequently grouped into clusters. Such an approach is often susceptible to the occurrence of false positives (FPs) caused by local image patterns that resemble MCs. We investigate the feasibility of a direct detection approach to determining whether an image region contains clustered MCs or not. Toward this goal, we develop a deep convolutional neural network (CNN) as the classifier model to which the input consists of a large image window ([Formula: see text] in size). The multiple layers in the CNN classifier are trained to automatically extract image features relevant to MCs at different spatial scales. In the experiments, we demonstrated this approach on a dataset consisting of both screen-film mammograms and full-field digital mammograms. We evaluated the detection performance both on classifying image regions of clustered MCs using a receiver operating characteristic (ROC) analysis and on detecting clustered MCs from full mammograms by a free-response receiver operating characteristic analysis. For comparison, we also considered a recently developed MC detector with FP suppression. In classifying image regions of clustered MCs, the CNN classifier achieved 0.971 in the area under the ROC curve, compared to 0.944 for the MC detector. In detecting clustered MCs from full mammograms, at 90% sensitivity, the CNN classifier obtained an FP rate of 0.69 clusters/image, compared to 1.17 clusters/image by the MC detector. These results indicate that using global image features can be more effective in discriminating clustered MCs from FPs caused by various sources, such as linear structures, thereby providing a more accurate detection of clustered MCs on mammograms.
Validating clustering of molecular dynamics simulations using polymer models.

PubMed

Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn

2011-11-14

Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models

PubMed Central

2011-01-01

Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
Sensory analysis of characterising flavours: evaluating tobacco product odours using an expert panel.

PubMed

Krüsemann, Erna J Z; Lasschuijt, Marlou P; de Graaf, C; de Wijk, René A; Punter, Pieter H; van Tiel, Loes; Cremers, Johannes W J M; van de Nobelen, Suzanne; Boesveldt, Sanne; Talhout, Reinskje

2018-05-23

Tobacco flavours are an important regulatory concept in several jurisdictions, for example in the USA, Canada and Europe. The European Tobacco Products Directive 2014/40/EU prohibits cigarettes and roll-your-own tobacco having a characterising flavour. This directive defines characterising flavour as 'a clearly noticeable smell or taste other than one of tobacco […]'. To distinguish between products with and without a characterising flavour, we trained an expert panel to identify characterising flavours by smelling. An expert panel (n=18) evaluated the smell of 20 tobacco products using self-defined odour attributes, following Quantitative Descriptive Analysis. The panel was trained during 14 attribute training, consensus training and performance monitoring sessions. Products were assessed during six test sessions. Principal component analysis, hierarchical clustering (four and six clusters) and Hotelling's T-tests (95% and 99% CIs) were used to determine differences and similarities between tobacco products based on odour attributes. The final attribute list contained 13 odour descriptors. Panel performance was sufficient after 14 training sessions. Products marketed as unflavoured that formed a cluster were considered reference products. A four-cluster method distinguished cherry-flavoured, vanilla-flavoured and menthol-flavoured products from reference products. Six clusters subdivided reference products into tobacco leaves, roll-your-own and commercial products. An expert panel was successfully trained to assess characterising odours in cigarettes and roll-your-own tobacco. This method could be applied to other product types such as e-cigarettes. Regulatory decisions on the choice of reference products and significance level are needed which directly influences the products being assessed as having a characterising odour. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Spatio-Temporal Analysis of Smear-Positive Tuberculosis in the Sidama Zone, Southern Ethiopia

PubMed Central

Dangisso, Mesay Hailu; Datiko, Daniel Gemechu; Lindtjørn, Bernt

2015-01-01

Background Tuberculosis (TB) is a disease of public health concern, with a varying distribution across settings depending on socio-economic status, HIV burden, availability and performance of the health system. Ethiopia is a country with a high burden of TB, with regional variations in TB case notification rates (CNRs). However, TB program reports are often compiled and reported at higher administrative units that do not show the burden at lower units, so there is limited information about the spatial distribution of the disease. We therefore aim to assess the spatial distribution and presence of the spatio-temporal clustering of the disease in different geographic settings over 10 years in the Sidama Zone in southern Ethiopia. Methods A retrospective space–time and spatial analysis were carried out at the kebele level (the lowest administrative unit within a district) to identify spatial and space-time clusters of smear-positive pulmonary TB (PTB). Scan statistics, Global Moran’s I, and Getis and Ordi (Gi*) statistics were all used to help analyze the spatial distribution and clusters of the disease across settings. Results A total of 22,545 smear-positive PTB cases notified over 10 years were used for spatial analysis. In a purely spatial analysis, we identified the most likely cluster of smear-positive PTB in 192 kebeles in eight districts (RR= 2, p<0.001), with 12,155 observed and 8,668 expected cases. The Gi* statistic also identified the clusters in the same areas, and the spatial clusters showed stability in most areas in each year during the study period. The space-time analysis also detected the most likely cluster in 193 kebeles in the same eight districts (RR= 1.92, p<0.001), with 7,584 observed and 4,738 expected cases in 2003-2012. Conclusion The study found variations in CNRs and significant spatio-temporal clusters of smear-positive PTB in the Sidama Zone. The findings can be used to guide TB control programs to devise effective TB control strategies for the geographic areas characterized by the highest CNRs. Further studies are required to understand the factors associated with clustering based on individual level locations and investigation of cases. PMID:26030162

Athletic groin pain (part 2): a prospective cohort study on the biomechanical evaluation of change of direction identifies three clusters of movement patterns

PubMed Central

Franklyn-Miller, A; Richter, C; King, E; Gore, S; Moran, K; Strike, S; Falvey, E C

2017-01-01

Background Athletic groin pain (AGP) is prevalent in sports involving repeated accelerations, decelerations, kicking and change-of-direction movements. Clinical and radiological examinations lack the ability to assess pathomechanics of AGP, but three-dimensional biomechanical movement analysis may be an important innovation. Aim The primary aim was to describe and analyse movements used by patients with AGP during a maximum effort change-of-direction task. The secondary aim was to determine if specific anatomical diagnoses were related to a distinct movement strategy. Methods 322 athletes with a current symptom of chronic AGP participated. Structured and standardised clinical assessments and radiological examinations were performed on all participants. Additionally, each participant performed multiple repetitions of a planned maximum effort change-of-direction task during which whole body kinematics were recorded. Kinematic and kinetic data were examined using continuous waveform analysis techniques in combination with a subgroup design that used gap statistic and hierarchical clustering. Results Three subgroups (clusters) were identified. Kinematic and kinetic measures of the clusters differed strongly in patterns observed in thorax, pelvis, hip, knee and ankle. Cluster 1 (40%) was characterised by increased ankle eversion, external rotation and knee internal rotation and greater knee work. Cluster 2 (15%) was characterised by increased hip flexion, pelvis contralateral drop, thorax tilt and increased hip work. Cluster 3 (45%) was characterised by high ankle dorsiflexion, thorax contralateral drop, ankle work and prolonged ground contact time. No correlation was observed between movement clusters and clinically palpated location of the participant's pain. Conclusions We identified three distinct movement strategies among athletes with long-standing groin pain during a maximum effort change-of-direction task These movement strategies were not related to clinical assessment findings but highlighted targets for rehabilitation in response to possible propagative mechanisms. Trial registration number NCT02437942, pre results. PMID:28209597
Study of parameters of the nearest neighbour shared algorithm on clustering documents

NASA Astrophysics Data System (ADS)

Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

2018-03-01

Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Using exploratory data analysis to identify and predict patterns of human Lyme disease case clustering within a multistate region, 2010-2014.

PubMed

Hendricks, Brian; Mark-Carew, Miguella

2017-02-01

Lyme disease is the most commonly reported vectorborne disease in the United States. The objective of our study was to identify patterns of Lyme disease reporting after multistate inclusion to mitigate potential border effects. County-level human Lyme disease surveillance data were obtained from Kentucky, Maryland, Ohio, Pennsylvania, Virginia, and West Virginia state health departments. Rate smoothing and Local Moran's I was performed to identify clusters of reporting activity and identify spatial outliers. A logistic generalized estimating equation was performed to identify significant associations in disease clustering over time. Resulting analyses identified statistically significant (P=0.05) clusters of high reporting activity and trends over time. High reporting activity aggregated near border counties in high incidence states, while low reporting aggregated near shared county borders in non-high incidence states. Findings highlight the need for exploratory surveillance approaches to describe the extent to which state level reporting affects accurate estimation of Lyme disease progression. Copyright © 2017 Elsevier Ltd. All rights reserved.
IoT Big-Data Centred Knowledge Granule Analytic and Cluster Framework for BI Applications: A Case Base Analysis.

PubMed

Chang, Hsien-Tsung; Mishra, Nilamadhab; Lin, Chung-Chih

2015-01-01

The current rapid growth of Internet of Things (IoT) in various commercial and non-commercial sectors has led to the deposition of large-scale IoT data, of which the time-critical analytic and clustering of knowledge granules represent highly thought-provoking application possibilities. The objective of the present work is to inspect the structural analysis and clustering of complex knowledge granules in an IoT big-data environment. In this work, we propose a knowledge granule analytic and clustering (KGAC) framework that explores and assembles knowledge granules from IoT big-data arrays for a business intelligence (BI) application. Our work implements neuro-fuzzy analytic architecture rather than a standard fuzzified approach to discover the complex knowledge granules. Furthermore, we implement an enhanced knowledge granule clustering (e-KGC) mechanism that is more elastic than previous techniques when assembling the tactical and explicit complex knowledge granules from IoT big-data arrays. The analysis and discussion presented here show that the proposed framework and mechanism can be implemented to extract knowledge granules from an IoT big-data array in such a way as to present knowledge of strategic value to executives and enable knowledge users to perform further BI actions.
IoT Big-Data Centred Knowledge Granule Analytic and Cluster Framework for BI Applications: A Case Base Analysis

PubMed Central

Chang, Hsien-Tsung; Mishra, Nilamadhab; Lin, Chung-Chih

2015-01-01

The current rapid growth of Internet of Things (IoT) in various commercial and non-commercial sectors has led to the deposition of large-scale IoT data, of which the time-critical analytic and clustering of knowledge granules represent highly thought-provoking application possibilities. The objective of the present work is to inspect the structural analysis and clustering of complex knowledge granules in an IoT big-data environment. In this work, we propose a knowledge granule analytic and clustering (KGAC) framework that explores and assembles knowledge granules from IoT big-data arrays for a business intelligence (BI) application. Our work implements neuro-fuzzy analytic architecture rather than a standard fuzzified approach to discover the complex knowledge granules. Furthermore, we implement an enhanced knowledge granule clustering (e-KGC) mechanism that is more elastic than previous techniques when assembling the tactical and explicit complex knowledge granules from IoT big-data arrays. The analysis and discussion presented here show that the proposed framework and mechanism can be implemented to extract knowledge granules from an IoT big-data array in such a way as to present knowledge of strategic value to executives and enable knowledge users to perform further BI actions. PMID:26600156
DOE Office of Scientific and Technical Information (OSTI.GOV)

Graf, Norman A.; /SLAC

Maximizing the physics performance of detectors being designed for the International Linear Collider, while remaining sensitive to cost constraints, requires a powerful, efficient, and flexible simulation, reconstruction and analysis environment to study the capabilities of a large number of different detector designs. The preparation of Letters Of Intent for the International Linear Collider involved the detailed study of dozens of detector options, layouts and readout technologies; the final physics benchmarking studies required the reconstruction and analysis of hundreds of millions of events. We describe the Java-based software toolkit (org.lcsim) which was used for full event reconstruction and analysis. The componentsmore » are fully modular and are available for tasks from digitization of tracking detector signals through to cluster finding, pattern recognition, track-fitting, calorimeter clustering, individual particle reconstruction, jet-finding, and analysis. The detector is defined by the same xml input files used for the detector response simulation, ensuring the simulation and reconstruction geometries are always commensurate by construction. We discuss the architecture as well as the performance.« less
A web portal for hydrodynamical, cosmological simulations

NASA Astrophysics Data System (ADS)

Ragagnin, A.; Dolag, K.; Biffi, V.; Cadolle Bel, M.; Hammer, N. J.; Krukau, A.; Petkova, M.; Steinborn, D.

2017-07-01

This article describes a data centre hosting a web portal for accessing and sharing the output of large, cosmological, hydro-dynamical simulations with a broad scientific community. It also allows users to receive related scientific data products by directly processing the raw simulation data on a remote computing cluster. The data centre has a multi-layer structure: a web portal, a job control layer, a computing cluster and a HPC storage system. The outer layer enables users to choose an object from the simulations. Objects can be selected by visually inspecting 2D maps of the simulation data, by performing highly compounded and elaborated queries or graphically by plotting arbitrary combinations of properties. The user can run analysis tools on a chosen object. These services allow users to run analysis tools on the raw simulation data. The job control layer is responsible for handling and performing the analysis jobs, which are executed on a computing cluster. The innermost layer is formed by a HPC storage system which hosts the large, raw simulation data. The following services are available for the users: (I) CLUSTERINSPECT visualizes properties of member galaxies of a selected galaxy cluster; (II) SIMCUT returns the raw data of a sub-volume around a selected object from a simulation, containing all the original, hydro-dynamical quantities; (III) SMAC creates idealized 2D maps of various, physical quantities and observables of a selected object; (IV) PHOX generates virtual X-ray observations with specifications of various current and upcoming instruments.
Dynamic clustering detection through multi-valued descriptors of dermoscopic images.

PubMed

Cozza, Valentina; Guarracino, Maria Rosario; Maddalena, Lucia; Baroni, Adone

2011-09-10

This paper introduces a dynamic clustering methodology based on multi-valued descriptors of dermoscopic images. The main idea is to support medical diagnosis to decide if pigmented skin lesions belonging to an uncertain set are nearer to malignant melanoma or to benign nevi. Melanoma is the most deadly skin cancer, and early diagnosis is a current challenge for clinicians. Most data analysis algorithms for skin lesions discrimination focus on segmentation and extraction of features of categorical or numerical type. As an alternative approach, this paper introduces two new concepts: first, it considers multi-valued data that scalar variables not only describe but also intervals or histogram variables; second, it introduces a dynamic clustering method based on Wasserstein distance to compare multi-valued data. The overall strategy of analysis can be summarized into the following steps: first, a segmentation of dermoscopic images allows to identify a set of multi-valued descriptors; second, we performed a discriminant analysis on a set of images where there is an a priori classification so that it is possible to detect which features discriminate the benign and malignant lesions; and third, we performed the proposed dynamic clustering method on the uncertain cases, which need to be associated to one of the two previously mentioned groups. Results based on clinical data show that the grading of specific descriptors associated to dermoscopic characteristics provides a novel way to characterize uncertain lesions that can help the dermatologist's diagnosis. Copyright © 2011 John Wiley & Sons, Ltd.
An exploratory analysis for Lean and Six Sigma implementation in hospitals: Together is better?

PubMed

Lee, Jung Young; McFadden, Kathleen L; Gowen, Charles R

Despite the increasing interest for Lean and Six Sigma implementations in hospitals, there has been little empirical evidence that goes beyond descriptive case studies to address the current status and the effectiveness of the implementations. The aim of this study was to explore existing patterns of Lean and Six Sigma implementation in U.S. hospitals and compare the performance of the different patterns. We collected data from 215 U.S. hospitals via a survey that includes measurement items developed from related literature. Using the cross-sectional data, we conducted a cluster analysis, followed by t tests, chi-square tests, and regression analyses for cluster verification. The cluster analysis identifies two clusters, a Moderate Six Sigma group and a Lean Six Sigma group. Results show that the Lean Six Sigma group outperforms the Moderate Six Sigma group across many performance dimensions: responsiveness capability, patient safety, and possibly cost saving. In addition, the Lean Six Sigma group tends to be composed of larger, private teaching hospitals located in more urban areas, and they employ more resources for quality improvement. Our research contributes to the quality management literature by supporting the possible complementary relationship between Lean and Six Sigma in hospitals. Our study encourages practitioners and managers to pay more attention to Lean implementation. Although Lean seems to be conducted in a limited fashion in many hospitals, it should be expanded and combined with Six Sigma for better results.
Cortical atrophy patterns in early Parkinson's disease patients using hierarchical cluster analysis.

PubMed

Uribe, Carme; Segura, Barbara; Baggio, Hugo Cesar; Abos, Alexandra; Garcia-Diaz, Anna Isabel; Campabadal, Anna; Marti, Maria Jose; Valldeoriola, Francesc; Compta, Yaroslau; Tolosa, Eduard; Junque, Carme

2018-05-01

Cortical brain atrophy detectable with MRI in non-demented advanced Parkinson's disease (PD) is well characterized, but its presence in early disease stages is still under debate. We aimed to investigate cortical atrophy patterns in a large sample of early untreated PD patients using a hypothesis-free data-driven approach. Seventy-seven de novo PD patients and 50 controls from the Parkinson's Progression Marker Initiative database with T1-weighted images in a 3-tesla Siemens scanner were included in this study. Mean cortical thickness was extracted from 360 cortical areas defined by the Human Connectome Project Multi-Modal Parcellation version 1.0, and a hierarchical cluster analysis was performed using Ward's linkage method. A general linear model with cortical thickness data was then used to compare clustering groups using FreeSurfer software. We identified two patterns of cortical atrophy. Compared with controls, patients grouped in pattern 1 (n = 33) were characterized by cortical thinning in bilateral orbitofrontal, anterior cingulate, and lateral and medial anterior temporal gyri. Patients in pattern 2 (n = 44) showed cortical thinning in bilateral occipital gyrus, cuneus, superior parietal gyrus, and left postcentral gyrus, and they showed neuropsychological impairment in memory and other cognitive domains. Even in the early stages of PD, there is evidence of cortical brain atrophy. Neuroimaging clustering analysis is able to detect two subgroups of cortical thinning, one with mainly anterior atrophy, and the other with posterior predominance and worse cognitive performance. Copyright © 2018 Elsevier Ltd. All rights reserved.
Assessment of the climatic potential for tourism in Iran through biometeorology clustering.

PubMed

Roshan, Gholamreza; Yousefi, Robabe; Błażejczyk, Krzysztof

2018-04-01

This study presents a spatiotemporal analysis of bioclimatic comfort conditions for Iran using mean daily meteorological data from 1995 to 2014, analyzed through Physiological Equivalent Temperature (PET) index and Universal Thermal Climate Index (UTCI) indices, and bioclimatic clustering. The results of this study demonstrate that due to the climate variability across Iran during the year, there is at any point in time a location with climatic condition suitable for tourism. Mean values demonstrate maxima in bioclimatic comfort indices for the country in late winter and spring and minima for summer. Seven statistically significant clusters in bioclimatic indices were identified. Comparing these with clustering performed on PET and UTCI, the maximum overlaps between the two indices. In the following, the outputs of this research showed that most appropriate bioclimatic clustering for Iran includes seven clusters. These clustering locations according to climatic suitability for tourism provide a valuable contribution to tourism management in the country, particularly through marketing destinations to maximize tourist flow.
Robust continuous clustering

PubMed Central

Shah, Sohil Atul

2017-01-01

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
K-Means Algorithm Performance Analysis With Determining The Value Of Starting Centroid With Random And KD-Tree Method

NASA Astrophysics Data System (ADS)

Sirait, Kamson; Tulus; Budhiarti Nababan, Erna

2017-12-01

Clustering methods that have high accuracy and time efficiency are necessary for the filtering process. One method that has been known and applied in clustering is K-Means Clustering. In its application, the determination of the begining value of the cluster center greatly affects the results of the K-Means algorithm. This research discusses the results of K-Means Clustering with starting centroid determination with a random and KD-Tree method. The initial determination of random centroid on the data set of 1000 student academic data to classify the potentially dropout has a sse value of 952972 for the quality variable and 232.48 for the GPA, whereas the initial centroid determination by KD-Tree has a sse value of 504302 for the quality variable and 214,37 for the GPA variable. The smaller sse values indicate that the result of K-Means Clustering with initial KD-Tree centroid selection have better accuracy than K-Means Clustering method with random initial centorid selection.
Computational intelligence for the Balanced Scorecard: studying performance trends of hemodialysis clinics.

PubMed

Cattinelli, Isabella; Bolzoni, Elena; Chermisi, Milena; Bellocchio, Francesco; Barbieri, Carlo; Mari, Flavio; Amato, Claudia; Menzer, Marcus; Stopper, Andrea; Gatti, Emanuele

2013-07-01

The Balanced Scorecard (BSC) is a general, widely employed instrument for enterprise performance monitoring based on the periodic assessment of strategic Key Performance Indicators that are scored against preset targets. The BSC is currently employed as an effective management support tool within Fresenius Medical Care (FME) and is routinely analyzed via standard statistical methods. More recently, the application of computational intelligence techniques (namely, self-organizing maps) to BSC data has been proposed as a way to enhance the quantity and quality of information that can be extracted from it. In this work, additional methods are presented to analyze the evolution of clinic performance over time. Performance evolution is studied at the single-clinic level by computing two complementary indexes that measure the proportion of time spent within performance clusters and improving/worsening trends. Self-organizing maps are used in conjunction with these indexes to identify the specific drivers of the observed performance. The performance evolution for groups of clinics is modeled under a probabilistic framework by resorting to Markov chain properties. These allow a study of the probability of transitioning between performance clusters as time progresses for the identification of the performance level that is expected to become dominant over time. We show the potential of the proposed methods through illustrative results derived from the analysis of BSC data of 109 FME clinics in three countries. We were able to identify the performance drivers for specific groups of clinics and to distinguish between countries whose performances are likely to improve from those where a decline in performance might be expected. According to the stationary distribution of the Markov chain, the expected trend is best in Turkey (where the highest performance cluster has the highest probability, P=0.46), followed by Portugal (where the second best performance cluster dominates, with P=0.50), and finally Italy (where the second best performance cluster has P=0.34). These results highlight the ability of the proposed methods to extract insights about performance trends that cannot be easily extrapolated using standard analyses and that are valuable in directing management strategies within a continuous quality improvement policy. Copyright © 2013 Elsevier B.V. All rights reserved.
Suzaku observations of low surface brightness cluster Abell 1631

NASA Astrophysics Data System (ADS)

Babazaki, Yasunori; Mitsuishi, Ikuyuki; Ota, Naomi; Sasaki, Shin; Böhringer, Hans; Chon, Gayoung; Pratt, Gabriel W.; Matsumoto, Hironori

2018-04-01

We present analysis results for a nearby galaxy cluster Abell 1631 at z = 0.046 using the X-ray observatory Suzaku. This cluster is categorized as a low X-ray surface brightness cluster. To study the dynamical state of the cluster, we conduct four-pointed Suzaku observations and investigate physical properties of the Mpc-scale hot gas associated with the A 1631 cluster for the first time. Unlike relaxed clusters, the X-ray image shows no strong peak at the center and an irregular morphology. We perform spectral analysis and investigate the radial profiles of the gas temperature, density, and entropy out to approximately 1.5 Mpc in the east, north, west, and south directions by combining with the XMM-Newton data archive. The measured gas density in the central region is relatively low (a few ×10-4 cm-3) at the given temperature (˜2.9 keV) compared with X-ray-selected clusters. The entropy profile and value within the central region (r < 0.1 r200) are found to be flatter and higher (≳400 keV cm2). The observed bolometric luminosity is approximately three times lower than that expected from the luminosity-temperature relation in previous studies of relaxed clusters. These features are also observed in another low surface brightness cluster, Abell 76. The spatial distributions of galaxies and the hot gas appear to be different. The X-ray luminosity is relatively lower than that expected from the velocity dispersion. A post-merger scenario may explain the observed results.
Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

PubMed Central

Sinyor, Mark; Schaffer, Ayal; Streiner, David L

2014-01-01

Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321
Suzaku observations of low surface brightness cluster Abell 1631

NASA Astrophysics Data System (ADS)

Babazaki, Yasunori; Mitsuishi, Ikuyuki; Ota, Naomi; Sasaki, Shin; Böhringer, Hans; Chon, Gayoung; Pratt, Gabriel W.; Matsumoto, Hironori

2018-06-01

We present analysis results for a nearby galaxy cluster Abell 1631 at z = 0.046 using the X-ray observatory Suzaku. This cluster is categorized as a low X-ray surface brightness cluster. To study the dynamical state of the cluster, we conduct four-pointed Suzaku observations and investigate physical properties of the Mpc-scale hot gas associated with the A 1631 cluster for the first time. Unlike relaxed clusters, the X-ray image shows no strong peak at the center and an irregular morphology. We perform spectral analysis and investigate the radial profiles of the gas temperature, density, and entropy out to approximately 1.5 Mpc in the east, north, west, and south directions by combining with the XMM-Newton data archive. The measured gas density in the central region is relatively low (a few ×10-4 cm-3) at the given temperature (˜2.9 keV) compared with X-ray-selected clusters. The entropy profile and value within the central region (r < 0.1 r200) are found to be flatter and higher (≳400 keV cm2). The observed bolometric luminosity is approximately three times lower than that expected from the luminosity-temperature relation in previous studies of relaxed clusters. These features are also observed in another low surface brightness cluster, Abell 76. The spatial distributions of galaxies and the hot gas appear to be different. The X-ray luminosity is relatively lower than that expected from the velocity dispersion. A post-merger scenario may explain the observed results.
Multiscale Embedded Gene Co-expression Network Analysis

PubMed Central

Song, Won-Min; Zhang, Bin

2015-01-01

Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma. PMID:26618778
Multiscale Embedded Gene Co-expression Network Analysis.

PubMed

Song, Won-Min; Zhang, Bin

2015-11-01

Gene co-expression network analysis has been shown effective in identifying functional co-expressed gene modules associated with complex human diseases. However, existing techniques to construct co-expression networks require some critical prior information such as predefined number of clusters, numerical thresholds for defining co-expression/interaction, or do not naturally reproduce the hallmarks of complex systems such as the scale-free degree distribution of small-worldness. Previously, a graph filtering technique called Planar Maximally Filtered Graph (PMFG) has been applied to many real-world data sets such as financial stock prices and gene expression to extract meaningful and relevant interactions. However, PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|3), the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework. Here, we developed a new co-expression network analysis framework called Multiscale Embedded Gene Co-expression Network Analysis (MEGENA) by: i) introducing quality control of co-expression similarities, ii) parallelizing embedded network construction, and iii) developing a novel clustering technique to identify multi-scale clustering structures in Planar Filtered Networks (PFNs). We applied MEGENA to a series of simulated data and the gene expression data in breast carcinoma and lung adenocarcinoma from The Cancer Genome Atlas (TCGA). MEGENA showed improved performance over well-established clustering methods and co-expression network construction approaches. MEGENA revealed not only meaningful multi-scale organizations of co-expressed gene clusters but also novel targets in breast carcinoma and lung adenocarcinoma.
Integrated Copy Number and Expression Analysis Identifies Profiles of Whole-Arm Chromosomal Alterations and Subgroups with Favorable Outcome in Ovarian Clear Cell Carcinomas

PubMed Central

Uehara, Yuriko; Oda, Katsutoshi; Ikeda, Yuji; Koso, Takahiro; Tsuji, Shingo; Yamamoto, Shogo; Asada, Kayo; Sone, Kenbun; Kurikawa, Reiko; Makii, Chinami; Hagiwara, Otoe; Tanikawa, Michihiro; Maeda, Daichi; Hasegawa, Kosei; Nakagawa, Shunsuke; Wada-Hiraike, Osamu; Kawana, Kei; Fukayama, Masashi; Fujiwara, Keiichi; Yano, Tetsu; Osuga, Yutaka; Fujii, Tomoyuki; Aburatani, Hiroyuki

2015-01-01

Ovarian clear cell carcinoma (CCC) is generally associated with chemoresistance and poor clinical outcome, even with early diagnosis; whereas high-grade serous carcinomas (SCs) and endometrioid carcinomas (ECs) are commonly chemosensitive at advanced stages. Although an integrated genomic analysis of SC has been performed, conclusive views on copy number and expression profiles for CCC are still limited. In this study, we performed single nucleotide polymorphism analysis with 57 epithelial ovarian cancers (31 CCCs, 14 SCs, and 12 ECs) and microarray expression analysis with 55 cancers (25 CCCs, 16 SCs, and 14 ECs). We then evaluated PIK3CA mutations and ARID1A expression in CCCs. SNP array analysis classified 13% of CCCs into a cluster with high frequency and focal range of copy number alterations (CNAs), significantly lower than for SCs (93%, P < 0.01) and ECs (50%, P = 0.017). The ratio of whole-arm to all CNAs was higher in CCCs (46.9%) than SCs (21.7%; P < 0.0001). SCs with loss of heterozygosity (LOH) of BRCA1 (85%) also had LOH of NF1 and TP53, and LOH of BRCA2 (62%) coexisted with LOH of RB1 and TP53. Microarray analysis classified CCCs into three clusters. One cluster (CCC-2, n = 10) showed more favorable prognosis than the CCC-1 and CCC-3 clusters (P = 0.041). Coexistent alterations of PIK3CA and ARID1A were more common in CCC-1 and CCC-3 (7/11, 64%) than in CCC-2 (0/10, 0%; P < 0.01). Being in cluster CCC-2 was an independent favorable prognostic factor in CCC. In conclusion, CCC was characterized by a high ratio of whole-arm CNAs; whereas CNAs in SC were mainly focal, but preferentially caused LOH of well-known tumor suppressor genes. As such, expression profiles might be useful for sub-classification of CCC, and might provide useful information on prognosis. PMID:26043110

DOE Office of Scientific and Technical Information (OSTI.GOV)

Biewer, Theodore M.; Marcus, Chris; Klepper, C Christopher

The divertor-specific ITER Diagnostic Residual Gas Analyzer (DRGA) will provide essential information relating to DT fusion plasma performance. This includes pulse-resolving measurements of the fuel isotopic mix reaching the pumping ducts, as well as the concentration of the helium generated as the ash of the fusion reaction. In the present baseline design, the cluster of sensors attached to this diagnostic's differentially pumped analysis chamber assembly includes a radiation compatible version of a commercial quadrupole mass spectrometer, as well as an optical gas analyzer using a plasma-based light excitation source. This paper reports on a laboratory study intended to validate themore » performance of this sensor cluster, with emphasis on the detection limit of the isotopic measurement. This validation study was carried out in a laboratory set-up that closely prototyped the analysis chamber assembly configuration of the baseline design. This includes an ITER-specific placement of the optical gas measurement downstream from the first turbine of the chamber's turbo-molecular pump to provide sufficient light emission while preserving the gas dynamics conditions that allow for \\textasciitilde 1 s response time from the sensor cluster [1].« less
Greater-than-bulk melting temperatures explained: Gallium melts Gangnam style

NASA Astrophysics Data System (ADS)

Gaston, Nicola; Steenbergen, Krista

2014-03-01

The experimental discovery of superheating in gallium clusters contradicted the clear and well-demonstrated paradigm that the melting temperature of a particle should decrease with its size. However the extremely sensitive dependence of melting temperature on size also goes to the heart of cluster science, and the interplay between the effects of electronic and geometric structure. We have performed extensive first-principles molecular dynamics calculations, incorporating parallel tempering for an efficient exploration of configurational phase space. This is necessary, due to the complicated energy landscape of gallium. In the nanoparticles, melting is preceded by a transitions between phases. A structural feature, referred to here as the Gangnam motif, is found to increase with the latent heat and appears throughout the observed phase changes of this curious metal. We will present our detailed analysis of the solid-state isomers, performed using extensive statistical sampling of the trajectory data for the assignment of cluster structures to known phases of gallium. Finally, we explain the greater-than-bulk melting through analysis of the factors that stabilise the liquid structures.
Impact of the Choice of Normalization Method on Molecular Cancer Class Discovery Using Nonnegative Matrix Factorization.

PubMed

Yang, Haixuan; Seoighe, Cathal

2016-01-01

Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.
Complex time series analysis of PM10 and PM2.5 for a coastal site using artificial neural network modelling and k-means clustering

NASA Astrophysics Data System (ADS)

Elangasinghe, M. A.; Singhal, N.; Dirks, K. N.; Salmond, J. A.; Samarasinghe, S.

2014-09-01

This paper uses artificial neural networks (ANN), combined with k-means clustering, to understand the complex time series of PM10 and PM2.5 concentrations at a coastal location of New Zealand based on data from a single site. Out of available meteorological parameters from the network (wind speed, wind direction, solar radiation, temperature, relative humidity), key factors governing the pattern of the time series concentrations were identified through input sensitivity analysis performed on the trained neural network model. The transport pathways of particulate matter under these key meteorological parameters were further analysed through bivariate concentration polar plots and k-means clustering techniques. The analysis shows that the external sources such as marine aerosols and local sources such as traffic and biomass burning contribute equally to the particulate matter concentrations at the study site. These results are in agreement with the results of receptor modelling by the Auckland Council based on Positive Matrix Factorization (PMF). Our findings also show that contrasting concentration-wind speed relationships exist between marine aerosols and local traffic sources resulting in very noisy and seemingly large random PM10 concentrations. The inclusion of cluster rankings as an input parameter to the ANN model showed a statistically significant (p < 0.005) improvement in the performance of the ANN time series model and also showed better performance in picking up high concentrations. For the presented case study, the correlation coefficient between observed and predicted concentrations improved from 0.77 to 0.79 for PM2.5 and from 0.63 to 0.69 for PM10 and reduced the root mean squared error (RMSE) from 5.00 to 4.74 for PM2.5 and from 6.77 to 6.34 for PM10. The techniques presented here enable the user to obtain an understanding of potential sources and their transport characteristics prior to the implementation of costly chemical analysis techniques or advanced air dispersion models.
Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

NASA Astrophysics Data System (ADS)

Takahashi, A.; Hashimoto, M.

2015-12-01

Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.
Chemometrics-based Approach in Analysis of Arnicae flos

PubMed Central

Zheleva-Dimitrova, Dimitrina Zh.; Balabanova, Vessela; Gevrenova, Reneta; Doichinova, Irini; Vitkova, Antonina

2015-01-01

Introduction: Arnica montana flowers have a long history as herbal medicines for external use on injuries and rheumatic complaints. Objective: To investigate Arnicae flos of cultivated accessions from Bulgaria, Poland, Germany, Finland, and Pharmacy store for phenolic derivatives and sesquiterpene lactones (STLs). Materials and Methods: Samples of Arnica from nine origins were prepared by ultrasound-assisted extraction with 80% methanol for phenolic compounds analysis. Subsequent reverse-phase high-performance liquid chromatography (HPLC) separation of the analytes was performed using gradient elution and ultraviolet detection at 280 and 310 nm (phenolic acids), and 360 nm (flavonoids). Total STLs were determined in chloroform extracts by solid-phase extraction-HPLC at 225 nm. The HPLC generated chromatographic data were analyzed using principal component analysis (PCA) and hierarchical clustering (HC). Results: The highest total amount of phenolic acids was found in the sample from Botanical Garden at Joensuu University, Finland (2.36 mg/g dw). Astragalin, isoquercitrin, and isorhamnetin 3-glucoside were the main flavonol glycosides being present up to 3.37 mg/g (astragalin). Three well-defined clusters were distinguished by PCA and HC. Cluster C1 comprised of the German and Finnish accessions characterized by the highest content of flavonols. Cluster C2 included the Bulgarian and Polish samples presenting a low content of flavonoids. Cluster C3 consisted only of one sample from a pharmacy store. Conclusion: A validated HPLC method for simultaneous determination of phenolic acids, flavonoid glycosides, and aglycones in A. montana flowers was developed. The PCA loading plot showed that quercetin, kaempferol, and isorhamnetin can be used to distinguish different Arnica accessions. SUMMARY A principal component analysis (PCA) on 13 phenolic compounds and total amount of sesquiterpene lactones in Arnicae flos collection tended to cluster the studied 9 accessions into three main groups. The profiles obtained demonstrated that the samples from Germany and Finland are characterized by greater amounts of phenolic derivatives than the Bulgarian and Polish ones. The PCA loading plot showed that quercetin, kaemferol and isorhamnetin can be used to distinguish different arnica accessions. PMID:27013791
Nuclear Potential Clustering As a New Tool to Detect Patterns in High Dimensional Datasets

NASA Astrophysics Data System (ADS)

Tonkova, V.; Paulus, D.; Neeb, H.

2013-02-01

We present a new approach for the clustering of high dimensional data without prior assumptions about the structure of the underlying distribution. The proposed algorithm is based on a concept adapted from nuclear physics. To partition the data, we model the dynamic behaviour of nucleons interacting in an N-dimensional space. An adaptive nuclear potential, comprised of a short-range attractive (strong interaction) and a long-range repulsive term (Coulomb force) is assigned to each data point. By modelling the dynamics, nucleons that are densely distributed in space fuse to build nuclei (clusters) whereas single point clusters repel each other. The formation of clusters is completed when the system reaches the state of minimal potential energy. The data are then grouped according to the particles' final effective potential energy level. The performance of the algorithm is tested with several synthetic datasets showing that the proposed method can robustly identify clusters even when complex configurations are present. Furthermore, quantitative MRI data from 43 multiple sclerosis patients were analyzed, showing a reasonable splitting into subgroups according to the individual patients' disease grade. The good performance of the algorithm on such highly correlated non-spherical datasets, which are typical for MRI derived image features, shows that Nuclear Potential Clustering is a valuable tool for automated data analysis, not only in the MRI domain.
Severe or life-threatening asthma exacerbation: patient heterogeneity identified by cluster analysis.

PubMed

Sekiya, K; Nakatani, E; Fukutomi, Y; Kaneda, H; Iikura, M; Yoshida, M; Takahashi, K; Tomii, K; Nishikawa, M; Kaneko, N; Sugino, Y; Shinkai, M; Ueda, T; Tanikawa, Y; Shirai, T; Hirabayashi, M; Aoki, T; Kato, T; Iizuka, K; Homma, S; Taniguchi, M; Tanaka, H

2016-08-01

Severe or life-threatening asthma exacerbation is one of the worst outcomes of asthma because of the risk of death. To date, few studies have explored the potential heterogeneity of this condition. To examine the clinical characteristics and heterogeneity of patients with severe or life-threatening asthma exacerbation. This was a multicentre, prospective study of patients with severe or life-threatening asthma exacerbation and pulse oxygen saturation < 90% who were admitted to 17 institutions across Japan. Cluster analysis was performed using variables from patient- and physician-orientated structured questionnaires. Analysis of data from 175 patients with severe or life-threatening asthma exacerbation revealed five distinct clusters. Cluster 1 (n = 27) was younger-onset asthma with severe symptoms at baseline, including limitation of activities, a higher frequency of treatment with oral corticosteroids and short-acting beta-agonists, and a higher frequency of asthma hospitalizations in the past year. Cluster 2 (n = 35) was predominantly composed of elderly females, with the highest frequency of comorbid, chronic hyperplastic rhinosinusitis/nasal polyposis, and a long disease duration. Cluster 3 (n = 40) was allergic asthma without inhaled corticosteroid use at baseline. Patients in this cluster had a higher frequency of atopy, including allergic rhinitis and furred pet hypersensitivity, and a better prognosis during hospitalization compared with the other clusters. Cluster 4 (n = 34) was characterized by elderly males with concomitant chronic obstructive pulmonary disease (COPD). Although cluster 5 (n = 39) had very mild symptoms at baseline according to the patient questionnaires, 41% had previously been hospitalized for asthma. This study demonstrated that significant heterogeneity exists among patients with severe or life-threatening asthma exacerbation. Differences were observed in the severity of asthma symptoms and use of inhaled corticosteroids at baseline, and the presence of comorbid COPD. These findings may contribute to a deeper understanding and better management of this patient population. © 2016 The Authors. Clinical & Experimental Allergy Published by John Wiley & Sons Ltd.
Variation of heavy metals in recent sediments from Piratininga Lagoon (Brazil): interpretation of geochemical data with the aid of multivariate analysis

NASA Astrophysics Data System (ADS)

Huang, W.; Campredon, R.; Abrao, J. J.; Bernat, M.; Latouche, C.

1994-06-01

In the last decade, the Atlantic coast of south-eastern Brazil has been affected by increasing deforestation and anthropogenic effluents. Sediments in the coastal lagoons have recorded the process of such environmental change. Thirty-seven sediment samples from three cores in Piratininga Lagoon, Rio de Janeiro, were analyzed for their major components and minor element concentrations in order to examine geochemical characteristics and the depositional environment and to investigate the variation of heavy metals of environmental concern. Two multivariate analysis methods, principal component analysis and cluster analysis, were performed on the analytical data set to help visualize the sample clusters and the element associations. On the whole, the sediment samples from each core are similar and the sample clusters corresponding to the three cores are clearly separated, as a result of the different conditions of sedimentation. Some changes in the depositional environment are recognized using the results of multivariate analysis. The enrichment of Pb, Cu, and Zn in the upper parts of cores is in agreement with increasing anthropogenic influx (pollution).
The Performance of Methods to Test Upper-Level Mediation in the Presence of Nonnormal Data

ERIC Educational Resources Information Center

Pituch, Keenan A.; Stapleton, Laura M.

2008-01-01

A Monte Carlo study compared the statistical performance of standard and robust multilevel mediation analysis methods to test indirect effects for a cluster randomized experimental design under various departures from normality. The performance of these methods was examined for an upper-level mediation process, where the indirect effect is a fixed…
Influence of diet, menstruation and genetic factors on iron status: a cross-sectional study in Spanish women of childbearing age.

PubMed

Blanco-Rojo, Ruth; Toxqui, Laura; López-Parra, Ana M; Baeza-Richer, Carlos; Pérez-Granados, Ana M; Arroyo-Pardo, Eduardo; Vaquero, M Pilar

2014-03-06

The aim of this study was to investigate the combined influence of diet, menstruation and genetic factors on iron status in Spanish menstruating women (n = 142). Dietary intake was assessed by a 72-h detailed dietary report and menstrual blood loss by a questionnaire, to determine a Menstrual Blood Loss Coefficient (MBLC). Five selected SNPs were genotyped: rs3811647, rs1799852 (Tf gene); rs1375515 (CACNA2D3 gene); and rs1800562 and rs1799945 (HFE gene, mutations C282Y and H63D, respectively). Iron biomarkers were determined and cluster analysis was performed. Differences among clusters in dietary intake, menstrual blood loss parameters and genotype frequencies distribution were studied. A categorical regression was performed to identify factors associated with cluster belonging. Three clusters were identified: women with poor iron status close to developing iron deficiency anemia (Cluster 1, n = 26); women with mild iron deficiency (Cluster 2, n = 59) and women with normal iron status (Cluster 3, n = 57). Three independent factors, red meat consumption, MBLC and mutation C282Y, were included in the model that better explained cluster belonging (R2 = 0.142, p < 0.001). In conclusion, the combination of high red meat consumption, low menstrual blood loss and the HFE C282Y mutation may protect from iron deficiency in women of childbearing age. These findings could be useful to implement adequate strategies to prevent iron deficiency anemia.
[Analysis of Time-to-onset of Interstitial Lung Disease after the Administration of Small Molecule Molecularly-targeted Drugs].

PubMed

Komada, Fusao

2018-01-01

　The aim of this study was to investigate the time-to-onset of drug-induced interstitial lung disease (DILD) following the administration of small molecule molecularly-targeted drugs via the use of the spontaneous adverse reaction reporting system of the Japanese Adverse Drug Event Report database. DILD datasets for afatinib, alectinib, bortezomib, crizotinib, dasatinib, erlotinib, everolimus, gefitinib, imatinib, lapatinib, nilotinib, osimertinib, sorafenib, sunitinib, temsirolimus, and tofacitinib were used to calculate the median onset times of DILD and the Weibull distribution parameters, and to perform the hierarchical cluster analysis. The median onset times of DILD for afatinib, bortezomib, crizotinib, erlotinib, gefitinib, and nilotinib were within one month. The median onset times of DILD for dasatinib, everolimus, lapatinib, osimertinib, and temsirolimus ranged from 1 to 2 months. The median onset times of the DILD for alectinib, imatinib, and tofacitinib ranged from 2 to 3 months. The median onset times of the DILD for sunitinib and sorafenib ranged from 8 to 9 months. Weibull distributions for these drugs when using the cluster analysis showed that there were 4 clusters. Cluster 1 described a subgroup with early to later onset DILD and early failure type profiles or a random failure type profile. Cluster 2 exhibited early failure type profiles or a random failure type profile with early onset DILD. Cluster 3 exhibited a random failure type profile or wear out failure type profiles with later onset DILD. Cluster 4 exhibited an early failure type profile or a random failure type profile with the latest onset DILD.
Sense of coherence, self-regulated learning and academic performance in first year nursing students: A cluster analysis approach.

PubMed

Salamonson, Yenna; Ramjan, Lucie M; van den Nieuwenhuizen, Simon; Metcalfe, Lauren; Chang, Sungwon; Everett, Bronwyn

2016-03-01

This paper examines the relationship between nursing students' sense of coherence, self-regulated learning and academic performance in bioscience. While there is increasing recognition of a need to foster students' self-regulated learning, little is known about the relationship of psychological strengths, particularly sense of coherence and academic performance. Using a prospective, correlational design, 563 first year nursing students completed the three dimensions of sense of coherence scale - comprehensibility, manageability and meaningfulness, and five components of self-regulated learning strategy - elaboration, organisation, rehearsal, self-efficacy and task value. Cluster analysis was used to group respondents into three clusters, based on their sense of coherence subscale scores. Although there were no sociodemographic differences in sense of coherence subscale scores, those with higher sense of coherence were more likely to adopt self-regulated learning strategies. Furthermore, academic grades collected at the end of semester revealed that higher sense of coherence was consistently related to achieving higher academic grades across all four units of study. Students with higher sense of coherence were more self-regulated in their learning approach. More importantly, the study suggests that sense of coherence may be an explanatory factor for students' successful adaptation and transition in higher education, as indicated by the positive relationship of sense of coherence to academic performance. Copyright © 2016 Elsevier Ltd. All rights reserved.
Performance Analysis of Combined Methods of Genetic Algorithm and K-Means Clustering in Determining the Value of Centroid

NASA Astrophysics Data System (ADS)

Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna

2017-12-01

The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.
Identifying the ideal profile of French yogurts for different clusters of consumers.

PubMed

Masson, M; Saint-Eve, A; Delarue, J; Blumenthal, D

2016-05-01

Identifying the sensory properties that affect consumer preferences for food products is an important feature of product development. Different methods, such as external preference mapping or partial least squares regression, are used to establish relationships between sensory data and consumer preferences and to identify sensory attributes that drive consumer preferences, by highlighting optimum products. Plain French yogurts were evaluated by a sensory profiling method performed by 12 trained judges. In parallel, 180 consumers were asked to score their overall liking and complete a cognitive restraint questionnaire. After hierarchical cluster analysis on the liking scores, preference mapping using a quadratic regression model was performed. Five clusters of consumers were identified as a function of different preference patterns. Contrary to our expectations, fat levels were not discriminating. For each cluster, the results of preference mapping enabled the identification of optimum products. A comparison of the 5 sensory profiles revealed numerous differences between key sensory attributes. For example, one consumer cluster had a strong preference for products perceived as very thick, grainy, but with a less flowing texture, less sticky, whey presence and color, in contrast to other clusters. In addition, each segment of consumers was characterized according to the results of the cognitive restraint questionnaire. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Defining syndromes using cattle meat inspection data for syndromic surveillance purposes: a statistical approach with the 2005–2010 data from ten French slaughterhouses

PubMed Central

2013-01-01

Background The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging. The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Results Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer’s lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. Conclusion The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases. PMID:23628140
Development of a Computing Cluster At the University of Richmond

NASA Astrophysics Data System (ADS)

Carbonneau, J.; Gilfoyle, G. P.; Bunn, E. F.

2010-11-01

The University of Richmond has developed a computing cluster to support the massive simulation and data analysis requirements for programs in intermediate-energy nuclear physics, and cosmology. It is a 20-node, 240-core system running Red Hat Enterprise Linux 5. We have built and installed the physics software packages (Geant4, gemc, MADmap...) and developed shell and Perl scripts for running those programs on the remote nodes. The system has a theoretical processing peak of about 2500 GFLOPS. Testing with the High Performance Linpack (HPL) benchmarking program (one of the standard benchmarks used by the TOP500 list of fastest supercomputers) resulted in speeds of over 900 GFLOPS. The difference between the maximum and measured speeds is due to limitations in the communication speed among the nodes; creating a bottleneck for large memory problems. As HPL sends data between nodes, the gigabit Ethernet connection cannot keep up with the processing power. We will show how both the theoretical and actual performance of the cluster compares with other current and past clusters, as well as the cost per GFLOP. We will also examine the scaling of the performance when distributed to increasing numbers of nodes.
Different disease subtypes with distinct clinical expression in familial Mediterranean fever: results of a cluster analysis.

PubMed

Akar, Servet; Solmaz, Dilek; Kasifoglu, Timucin; Bilge, Sule Yasar; Sari, Ismail; Gumus, Zeynep Zehra; Tunca, Mehmet

2016-02-01

The aim of this study was to evaluate whether there are clinical subgroups that may have different prognoses among FMF patients. The cumulative clinical features of a large group of FMF patients [1168 patients, 593 (50.8%) male, mean age 35.3 years (s.d. 12.4)] were studied. To analyse our data and identify groups of FMF patients with similar clinical characteristics, a two-step cluster analysis using log-likelihood distance measures was performed. For clustering the FMF patients, we evaluated the following variables: gender, current age, age at symptom onset, age at diagnosis, presence of major clinical features, variables related with therapy and family history for FMF, renal failure and carriage of M694V. Three distinct groups of FMF patients were identified. Cluster 1 was characterized by a high prevalence of arthritis, pleuritis, erysipelas-like erythema (ELE) and febrile myalgia. The dosage of colchicine and the frequency of amyloidosis were lower in cluster 1. Patients in cluster 2 had an earlier age of disease onset and diagnosis. M694V carriage and amyloidosis prevalence were the highest in cluster 2. This group of patients was using the highest dose of colchicine. Patients in cluster 3 had the lowest prevalence of arthritis, ELE and febrile myalgia. The frequencies of M694V carriage and amyloidosis were lower in cluster 3 than the overall FMF patients. Non-response to colchicine was also slightly lower in cluster 3. Patients with FMF can be clustered into distinct patterns of clinical and genetic manifestations and these patterns may have different prognostic significance. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Delineating Scholarly Types of College and University Faculty Members

ERIC Educational Resources Information Center

Park, Toby J.; Braxton, John M.

2013-01-01

This study was conducted using cluster analysis as well as discriminant analysis to empirically identify types of faculty based on their patterns of performance of scholarship reflective of one or more of Boyer's four domains of scholarship. (Contains 5 tables and 1 figure.)
Patterns of Self-care in Adults With Heart Failure and Their Associations With Sociodemographic and Clinical Characteristics, Quality of Life, and Hospitalizations: A Cluster Analysis.

PubMed

Vellone, Ercole; Fida, Roberta; Ghezzi, Valerio; D'Agostino, Fabio; Biagioli, Valentina; Paturzo, Marco; Strömberg, Anna; Alvaro, Rosaria; Jaarsma, Tiny

Self-care is important in heart failure (HF) treatment, but patients may have difficulties and be inconsistent in its performance. Inconsistencies in self-care behaviors may mirror patterns of self-care in HF patients that are worth identifying to provide interventions tailored to patients. The aims of this study are to identify clusters of HF patients in relation to self-care behaviors and to examine and compare the profile of each HF patient cluster considering the patient's sociodemographics, clinical variables, quality of life, and hospitalizations. This was a secondary analysis of data from a cross-sectional study in which we enrolled 1192 HF patients across Italy. A cluster analysis was used to identify clusters of patients based on the European Heart Failure Self-care Behaviour Scale factor scores. Analysis of variance and χ test were used to examine the characteristics of each cluster. Patients were 72.4 years old on average, and 58% were men. Four clusters of patients were identified: (1) high consistent adherence with high consulting behaviors, characterized by younger patients, with higher formal education and higher income, less clinically compromised, with the best physical and mental quality of life (QOL) and lowest hospitalization rates; (2) low consistent adherence with low consulting behaviors, characterized mainly by male patients, with lower formal education and lowest income, more clinically compromised, and worse mental QOL; (3) inconsistent adherence with low consulting behaviors, characterized by patients who were less likely to have a caregiver, with the longest illness duration, the highest number of prescribed medications, and the best mental QOL; (4) and inconsistent adherence with high consulting behaviors, characterized by patients who were mostly female, with lower formal education, worst cognitive impairment, worst physical and mental QOL, and higher hospitalization rates. The 4 clusters identified in this study and their associated characteristics could be used to tailor interventions aimed at improving self-care behaviors in HF patients.

Semantic distance-based creation of clusters of pharmacovigilance terms and their evaluation.

PubMed

Dupuch, Marie; Grabar, Natalia

2015-04-01

Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs or biologics. The detection of adverse drug reactions is performed using statistical algorithms and groupings of ADR terms from the MedDRA (Medical Dictionary for Drug Regulatory Activities) terminology. Standardized MedDRA Queries (SMQs) are the groupings which become a standard for assisting the retrieval and evaluation of MedDRA-coded ADR reports worldwide. Currently 84 SMQs have been created, while several important safety topics are not yet covered. Creation of SMQs is a long and tedious process performed by the experts. It relies on manual analysis of MedDRA in order to find out all the relevant terms to be included in a SMQ. Our objective is to propose an automatic method for assisting the creation of SMQs using the clustering of terms which are semantically similar. The experimental method relies on a specific semantic resource, and also on the semantic distance algorithms and clustering approaches. We perform several experiments in order to define the optimal parameters. Our results show that the proposed method can assist the creation of SMQs and make this process faster and systematic. The average performance of the method is precision 59% and recall 26%. The correlation of the results obtained is 0.72 against the medical doctors judgments and 0.78 against the medical coders judgments. These results and additional evaluation indicate that the generated clusters can be efficiently used for the detection of pharmacovigilance signals, as they provide better signal detection than the existing SMQs. Copyright © 2014. Published by Elsevier Inc.
Using preoperative unsupervised cluster analysis of chronic rhinosinusitis to inform patient decision and endoscopic sinus surgery outcome.

PubMed

Adnane, Choaib; Adouly, Taoufik; Khallouk, Amine; Rouadi, Sami; Abada, Redallah; Roubal, Mohamed; Mahtar, Mohamed

2017-02-01

The purpose of this study is to use unsupervised cluster methodology to identify phenotype and mucosal eosinophilia endotype subgroups of patients with medical refractory chronic rhinosinusitis (CRS), and evaluate the difference in quality of life (QOL) outcomes after endoscopic sinus surgery (ESS) between these clusters for better surgical case selection. A prospective cohort study included 131 patients with medical refractory CRS who elected ESS. The Sino-Nasal Outcome Test (SNOT-22) was used to evaluate QOL before and 12 months after surgery. Unsupervised two-step clustering method was performed. One hundred and thirteen subjects were retained in this study: 46 patients with CRS without nasal polyps and 67 patients with nasal polyps. Nasal polyps, gender, mucosal eosinophilia profile, and prior sinus surgery were the most discriminating factors in the generated clusters. Three clusters were identified. A significant clinical improvement was observed in all clusters 12 months after surgery with a reduction of SNOT-22 scores. There was a significant difference in QOL outcomes between clusters; cluster 1 had the worst QOL improvement after FESS in comparison with the other clusters 2 and 3. All patients in cluster 1 presented CRSwNP with the highest mucosal eosinophilia endotype. Clustering method is able to classify CRS phenotypes and endotypes with different associated surgical outcomes.
Acute Physiological and Thermoregulatory Responses to Extended Interval Training in Endurance Runners: Influence of Athletic Performance and Age

PubMed Central

García-Pinillos, Felipe; Soto-Hermoso, Víctor Manuel; Latorre-Román, Pedro Ángel

2015-01-01

This study aimed to describe the acute impact of extended interval training (EIT) on physiological and thermoregulatory levels, as well as to determine the influence of athletic performance and age effect on the aforementioned response in endurance runners. Thirty-one experienced recreational male endurance runners voluntarily participated in this study. Subjects performed EIT on an outdoor running track, which consisted of 12 runs of 400 m. The rate of perceived exertion, physiological response through the peak and recovery heart rate, blood lactate, and thermoregulatory response through tympanic temperature, were controlled. A repeated measures analysis revealed significant differences throughout EIT in examined variables. Cluster analysis grouped according to the average performance in 400 m runs led to distinguish between athletes with a higher and lower sports level. Cluster analysis was also performed according to age, obtaining an older group and a younger group. The one-way analysis of variance between groups revealed no significant differences (p≥0.05) in the response to EIT. The results provide a detailed description of physiological and thermoregulatory responses to EIT in experienced endurance runners. This allows a better understanding of the impact of a common training stimulus on the physiological level inducing greater accuracy in the training prescription. Moreover, despite the differences in athletic performance or age, the acute physiological and thermoregulatory responses in endurance runners were similar, as long as EIT was performed at similar relative intensity. PMID:26839621
Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes.

PubMed

Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H

2015-11-30

We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.
Phrase Mining of Textual Data to Analyze Extracellular Matrix Protein Patterns Across Cardiovascular Disease.

PubMed

Liem, David Alexandre; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, J Harry; Wang, Wei; Ping, Peipei; Han, Jiawei

2018-05-18

Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. By using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely ischemic heart disease (IHD), cardiomyopathies (CM), cerebrovascular accident (CVA), congenital heart disease (CHD), arrhythmias (ARR), and valve disease (VD), anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the six groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-aware Semantic Online Analytical Processing (CaseOLAP) was applied to semantically rank the association of proteins to each and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein is visualized as a six dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all six CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters associated with a targeted CVD; analyses revealed unexpected insights underlying ECM-pathogenesis of CVDs.
Optical spectroscopy and velocity dispersions of galaxy clusters from the SPT-SZ survey

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruel, J.; Bayliss, M.; Bazin, G.

2014-09-01

We present optical spectroscopy of galaxies in clusters detected through the Sunyaev-Zel'dovich (SZ) effect with the South Pole Telescope (SPT). We report our own measurements of 61 spectroscopic cluster redshifts, and 48 velocity dispersions each calculated with more than 15 member galaxies. This catalog also includes 19 dispersions of SPT-observed clusters previously reported in the literature. The majority of the clusters in this paper are SPT-discovered; of these, most have been previously reported in other SPT cluster catalogs, and five are reported here as SPT discoveries for the first time. By performing a resampling analysis of galaxy velocities, we findmore » that unbiased velocity dispersions can be obtained from a relatively small number of member galaxies (≲ 30), but with increased systematic scatter. We use this analysis to determine statistical confidence intervals that include the effect of membership selection. We fit scaling relations between the observed cluster velocity dispersions and mass estimates from SZ and X-ray observables. In both cases, the results are consistent with the scaling relation between velocity dispersion and mass expected from dark-matter simulations. We measure a ∼30% log-normal scatter in dispersion at fixed mass, and a ∼10% offset in the normalization of the dispersion-mass relation when compared to the expectation from simulations, which is within the expected level of systematic uncertainty.« less
Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

PubMed

Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Molecular reclassification of Crohn's disease: a cautionary note on population stratification.

PubMed

Maus, Bärbel; Jung, Camille; Mahachie John, Jestinah M; Hugot, Jean-Pierre; Génin, Emmanuelle; Van Steen, Kristel

2013-01-01

Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn's disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn's disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.
Molecular Reclassification of Crohn’s Disease: A Cautionary Note on Population Stratification

PubMed Central

Maus, Bärbel; Jung, Camille; Mahachie John, Jestinah M.; Hugot, Jean-Pierre; Génin, Emmanuelle; Van Steen, Kristel

2013-01-01

Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals. PMID:24147066
Quantum chemical calculations in the structural analysis of phloretin

NASA Astrophysics Data System (ADS)

Gómez-Zavaglia, Andrea

2009-07-01

In this work, a conformational search on the molecule of phloretin [2',4',6'-Trihydroxy-3-(4-hydroxyphenyl)-propiophenone] has been performed. The molecule of phloretin has eight dihedral angles, four of them taking part in the carbon backbone and the other four, related with the orientation of the hydroxyl groups. A systematic search involving a random variation of the dihedral angles has been used to generate input structures for the quantum chemical calculations. Calculations at the DFT(B3LYP)/6-311++G(d,p) level of theory permitted the identification of 58 local minima belonging to the C 1 symmetry point group. The molecular structures of the conformers have been analyzed using hierarchical cluster analysis. This method allowed us to group conformers according to their similarities, and thus, to correlate the conformers' stability with structural parameters. The dendrogram obtained from the hierarchical cluster analysis depicted two main clusters. Cluster I included all the conformers with relative energies lower than 25 kJ mol -1 and cluster II, the remaining conformers. The possibility of forming intramolecular hydrogen bonds resulted the main factor contributing for the stability. Accordingly, all conformers depicting intramolecular H-bonds belong to cluster I. These conformations are clearly favored when the carbon backbone is as planar as possible. The values of the νC dbnd O and νOH vibrational modes were compared among all the conformers of phloretin. The redshifts associated with intramolecular H-bonds were correlated with the H-bonds distances and energies.
Multiple imputation methods for bivariate outcomes in cluster randomised trials.

PubMed

DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

2016-09-10

Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Strong-lensing analysis of A2744 with MUSE and Hubble Frontier Fields images

NASA Astrophysics Data System (ADS)

Mahler, G.; Richard, J.; Clément, B.; Lagattuta, D.; Schmidt, K.; Patrício, V.; Soucail, G.; Bacon, R.; Pello, R.; Bouwens, R.; Maseda, M.; Martinez, J.; Carollo, M.; Inami, H.; Leclercq, F.; Wisotzki, L.

2018-01-01

We present an analysis of Multi Unit Spectroscopic Explorer (MUSE) observations obtained on the massive Frontier Fields (FFs) cluster A2744. This new data set covers the entire multiply imaged region around the cluster core. The combined catalogue consists of 514 spectroscopic redshifts (with 414 new identifications). We use this redshift information to perform a strong-lensing analysis revising multiple images previously found in the deep FF images, and add three new MUSE-detected multiply imaged systems with no obvious Hubble Space Telescope counterpart. The combined strong-lensing constraints include a total of 60 systems producing 188 images altogether, out of which 29 systems and 83 images are spectroscopically confirmed, making A2744 one of the most well-constrained clusters to date. Thanks to the large amount of spectroscopic redshifts, we model the influence of substructures at larger radii, using a parametrization including two cluster-scale components in the cluster core and several group scale in the outskirts. The resulting model accurately reproduces all the spectroscopic multiple systems, reaching an rms of 0.67 arcsec in the image plane. The large number of MUSE spectroscopic redshifts gives us a robust model, which we estimate reduces the systematic uncertainty on the 2D mass distribution by up to ∼2.5 times the statistical uncertainty in the cluster core. In addition, from a combination of the parametrization and the set of constraints, we estimate the relative systematic uncertainty to be up to 9 per cent at 200 kpc.
Analysis of multiplex gene expression maps obtained by voxelation.

PubMed

An, Li; Xie, Hongbo; Chin, Mark H; Obradovic, Zoran; Smith, Desmond J; Megalooikonomou, Vasileios

2009-04-29

Gene expression signatures in the mammalian brain hold the key to understanding neural development and neurological disease. Researchers have previously used voxelation in combination with microarrays for acquisition of genome-wide atlases of expression patterns in the mouse brain. On the other hand, some work has been performed on studying gene functions, without taking into account the location information of a gene's expression in a mouse brain. In this paper, we present an approach for identifying the relation between gene expression maps obtained by voxelation and gene functions. To analyze the dataset, we chose typical genes as queries and aimed at discovering similar gene groups. Gene similarity was determined by using the wavelet features extracted from the left and right hemispheres averaged gene expression maps, and by the Euclidean distance between each pair of feature vectors. We also performed a multiple clustering approach on the gene expression maps, combined with hierarchical clustering. Among each group of similar genes and clusters, the gene function similarity was measured by calculating the average gene function distances in the gene ontology structure. By applying our methodology to find similar genes to certain target genes we were able to improve our understanding of gene expression patterns and gene functions. By applying the clustering analysis method, we obtained significant clusters, which have both very similar gene expression maps and very similar gene functions respectively to their corresponding gene ontologies. The cellular component ontology resulted in prominent clusters expressed in cortex and corpus callosum. The molecular function ontology gave prominent clusters in cortex, corpus callosum and hypothalamus. The biological process ontology resulted in clusters in cortex, hypothalamus and choroid plexus. Clusters from all three ontologies combined were most prominently expressed in cortex and corpus callosum. The experimental results confirm the hypothesis that genes with similar gene expression maps might have similar gene functions. The voxelation data takes into account the location information of gene expression level in mouse brain, which is novel in related research. The proposed approach can potentially be used to predict gene functions and provide helpful suggestions to biologists.
Impact of missing data imputation methods on gene expression clustering and classification.

PubMed

de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G

2015-02-26

Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .
VizieR Online Data Catalog: Hogg 16 peculiar stars (Cariddi+, 2018)

NASA Astrophysics Data System (ADS)

Cariddi, S.; Azatyan, N. M.; Kurfurst, P.; Stofanova, L.; Netopil, M.; Paunzen, E.; Pintado, O. I.; Aidelman, Y. J.

2017-07-01

The photometric observations of Hogg 16 were performed on 2004 June 15, with the EFOSC2 instrument, installed on the 3.6m telescope at ESO - La Silla within the program 073.C-0144(A), and the target field was centred on the main concentration of stars in the cluster area (J2000 RA=13:29:18, DE=-61:12:00). The field-of-view is about 5.2'x5.2', and the 2x2 binning mode results in a resolution of 0.31"/pixel. Thus, we cover almost the complete cluster area if adopting a diameter of 6' as listed in the updated open cluster catalogue by Dias et al. (2002, version 3.5, Cat. B/ocl). We used a Δa filter set with the following characteristics: g1 (λc=5007Å, FWHM=126Å, TP=78%), g2 (5199, 95, 68), and y (5466, 108, 70). We have investigated 150 stars in the area of the young open cluster Hogg 16 using the Delta-a photometric system. We have performed a membership analysis and identified several chemically peculiar cluster stars. (1 data file).
Spectral reflectance of surface soils - A statistical analysis

NASA Technical Reports Server (NTRS)

Crouse, K. R.; Henninger, D. L.; Thompson, D. R.

1983-01-01

The relationship of the physical and chemical properties of soils to their spectral reflectance as measured at six wavebands of Thematic Mapper (TM) aboard NASA's Landsat-4 satellite was examined. The results of performing regressions of over 20 soil properties on the six TM bands indicated that organic matter, water, clay, cation exchange capacity, and calcium were the properties most readily predicted from TM data. The middle infrared bands, bands 5 and 7, were the best bands for predicting soil properties, and the near infrared band, band 4, was nearly as good. Clustering 234 soil samples on the TM bands and characterizing the clusters on the basis of soil properties revealed several clear relationships between properties and reflectance. Discriminant analysis found organic matter, fine sand, base saturation, sand, extractable acidity, and water to be significant in discriminating among clusters.
Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.

PubMed

Fiero, Mallorie H; Huang, Shuang; Oren, Eyal; Bell, Melanie L

2016-02-09

Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.
The use of cluster analysis for plant grouping by their tolerance to soil contamination with hydrocarbons at the germination stage.

PubMed

Potashev, Konstantin; Sharonova, Natalia; Breus, Irina

2014-07-01

Clustering was employed for the analysis of obtained experimental data set (42 plants in total) on seed germination in leached chernozem contaminated with kerosene. Among investigated plants were 31 cultivated plants from 11 families (27 species and 20 varieties) and 11 wild plant species from 7 families, 23 annual and 19 perennial/biannual plant species, 11 monocotyledonous and 31 dicotyledonous plants. Two-dimensional (two-parameter) clustering approach, allowing the estimation of tolerance of germinating seeds using a pair of independent parameters (С75%, V7%) was found to be most effective. These parameters characterized the ability of seeds to both withstand high concentrations of contaminants without the significant reduction of the germination, and maintain high germination rate within certain contaminant concentrations. The performed clustering revealed a number of plant features, which define the relation of a particular plant to a particular tolerance cluster; it has also demonstrated the possibility of generalizing the kerosene results for n-tridecane, which is one of the typical kerosene components. In contrast to the "manual" plant ranking based on the assessment of germination at discrete concentrations of the contaminant, the proposed clustering approach allowed a generalized characterization of the seed tolerance/sensitivity to hydrocarbon contaminants. Copyright © 2014 Elsevier B.V. All rights reserved.
Cluster formation and drag reduction-proposed mechanism of particle recirculation within the partition column of the bottom spray fluid-bed coater.

PubMed

Wang, Li Kun; Heng, Paul Wan Sia; Liew, Celine Valeria

2015-04-01

Bottom spray fluid-bed coating is a common technique for coating multiparticulates. Under the quality-by-design framework, particle recirculation within the partition column is one of the main variability sources affecting particle coating and coat uniformity. However, the occurrence and mechanism of particle recirculation within the partition column of the coater are not well understood. The purpose of this study was to visualize and define particle recirculation within the partition column. Based on different combinations of partition gap setting, air accelerator insert diameter, and particle size fraction, particle movements within the partition column were captured using a high-speed video camera. The particle recirculation probability and voidage information were mapped using a visiometric process analyzer. High-speed images showed that particles contributing to the recirculation phenomenon were behaving as clustered colonies. Fluid dynamics analysis indicated that particle recirculation within the partition column may be attributed to the combined effect of cluster formation and drag reduction. Both visiometric process analysis and particle coating experiments showed that smaller particles had greater propensity toward cluster formation than larger particles. The influence of cluster formation on coating performance and possible solutions to cluster formation were further discussed. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.
Typology of schizotypy in non-clinical young adults: Psychopathological and personality disorder traits correlates.

PubMed

Raynal, Patrick; Goutaudier, Nelly; Nidetch, Victoria; Chabrol, Henri

2016-12-30

Few typological studies address schizotypy in young adults. Schizotypal traits were assessed on 466 college students using the Schizotypal Personality Questionnaire-Brief (SPQ-B). Other measures evaluated personality traits previously associated with schizotypy (borderline, obsessionnal, and autistic traits), psychopathological symptoms (suicidal ideations, depressive and obsessive-compulsive symptoms) and psychosocial functioning. A factor analysis was first performed on SPQ-B results, leading to four factors: negative schizotypy, positive schizotypy, social anxiety, and reference ideas. Based on these factors, a cluster analysis was conducted, which yielded four clearly distinct groups characterized by "Low" (non schizotypy), "High schizotypy" (mixed positive and negative), "Positive schizotypy", and "Social impairment". Regarding personality disorder traits and psychopathological symptoms, the "High schizotypy" cluster scored higher than the "Positive" and the "Social impairment" groups, which scored higher than the "Low" cluster. The "Positive" group had higher levels of interpersonal relationships than in the "High" and the "Social impairment" clusters, suggesting that positive schizotypy was associated to benefits such as perceived social relationships. Nevertheless the "Positive" cluster was also linked to high levels of personality disorder traits and psychopathological symptoms, and to low academic achievement, at levels similar those observed in the "Social impairment" cluster, confirming an unhealthy side to positive schizotypy. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

High Performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions

DTIC Science & Technology

2016-08-30

High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions A dedicated high-performance computer cluster was...SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS (ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Computer cluster ...peer-reviewed journals: Final Report: High-performance Computer Cluster for Theoretical Studies of Roaming in Chemical Reactions Report Title A dedicated
Rapid Disaster Damage Estimation

NASA Astrophysics Data System (ADS)

Vu, T. T.

2012-07-01

The experiences from recent disaster events showed that detailed information derived from high-resolution satellite images could accommodate the requirements from damage analysts and disaster management practitioners. Richer information contained in such high-resolution images, however, increases the complexity of image analysis. As a result, few image analysis solutions can be practically used under time pressure in the context of post-disaster and emergency responses. To fill the gap in employment of remote sensing in disaster response, this research develops a rapid high-resolution satellite mapping solution built upon a dual-scale contextual framework to support damage estimation after a catastrophe. The target objects are building (or building blocks) and their condition. On the coarse processing level, statistical region merging deployed to group pixels into a number of coarse clusters. Based on majority rule of vegetation index, water and shadow index, it is possible to eliminate the irrelevant clusters. The remaining clusters likely consist of building structures and others. On the fine processing level details, within each considering clusters, smaller objects are formed using morphological analysis. Numerous indicators including spectral, textural and shape indices are computed to be used in a rule-based object classification. Computation time of raster-based analysis highly depends on the image size or number of processed pixels in order words. Breaking into 2 level processing helps to reduce the processed number of pixels and the redundancy of processing irrelevant information. In addition, it allows a data- and tasks- based parallel implementation. The performance is demonstrated with QuickBird images captured a disaster-affected area of Phanga, Thailand by the 2004 Indian Ocean tsunami are used for demonstration of the performance. The developed solution will be implemented in different platforms as well as a web processing service for operational uses.
ANALYSIS AND CHARACTERIZATION OF OZONE-RICH EPISODES IN NORTHEAST PORTUGAL

NASA Astrophysics Data System (ADS)

Carvalho, A.; Monteiro, A.; Ribeiro, I.; Tchepel, O.; Miranda, A.; Borrego, C.; Saavedra, S.; Souto, J. A.; Casares, J. J.

2009-12-01

Each summer period extremely high ozone levels are registered at the rural background station of Lamas d’Olo, located in the Northeast of Portugal. In average, 30% of the total alert threshold registered in Portugal is detected at this site. The main purpose of this study is to characterize the atmospheric conditions that lead to the ozone-rich episodes. Synoptic patterns anomalies and back trajectories cluster analysis were performed for a period of 76 days where ozone maximum concentrations were above 200 µg.m-3. This analysis was performed for the period between 2004 and 2007. The obtained anomaly fields suggested that a positive temperature anomaly is visible above the Iberian Peninsula. In addition, a strong wind flow pattern from NE is visible in the North of Portugal and Galicia, in Spain. These two features may lead to an enhancement of the photochemical production and to the transport of pollutants from Spain to Portugal. In addition, the 3D mean back trajectories associated to the ozone episode days were analysed. A clustering method has been applied to the obtained back trajectories. Four main clusters of ozone-rich episodes were identified, with different frequencies of occurrence: north-westerly flows (11%); north-easterly flows (45%), southern flow (4%) and westerly flows (40%). Both analyses highlight the NE flow as a dominant pattern over the North of Portugal. The analysis of the ozone concentrations for each selected cluster indicates that this northeast circulation pattern, together with the southern flow, is responsible for the highest ozone peak episodes. This also suggests that long-range transport of atmospheric pollutants may be the main contributor to the ozone levels registered at Lamas d’Olo. This is also highlighted by the correlation of the ozone time series with the meteorological parameters analysed in the frequency domain.
FY17 Status Report on the Computing Systems for the Yucca Mountain Project TSPA-LA Models.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Appel, Gordon John; Hadgu, Teklu; Appel, Gordon John

Sandia National Laboratories (SNL) continued evaluation of total system performance assessment (TSPA) computing systems for the previously considered Yucca Mountain Project (YMP). This was done to maintain the operational readiness of the computing infrastructure (computer hardware and software) and knowledge capability for total system performance assessment (TSPA) type analysis, as directed by the National Nuclear Security Administration (NNSA), DOE 2010. This work is a continuation of the ongoing readiness evaluation reported in Lee and Hadgu (2014), Hadgu et al. (2015) and Hadgu and Appel (2016). The TSPA computing hardware (CL2014) and storage system described in Hadgu et al. (2015) weremore » used for the current analysis. One floating license of GoldSim with Versions 9.60.300, 10.5, 11.1 and 12.0 was installed on the cluster head node, and its distributed processing capability was mapped on the cluster processors. Other supporting software were tested and installed to support the TSPA- type analysis on the server cluster. The current tasks included preliminary upgrade of the TSPA-LA from Version 9.60.300 to the latest version 12.0 and address DLL-related issues observed in the FY16 work. The model upgrade task successfully converted the Nominal Modeling case to GoldSim Versions 11.1/12. Conversions of the rest of the TSPA models were also attempted but program and operational difficulties precluded this. Upgrade of the remaining of the modeling cases and distributed processing tasks is expected to continue. The 2014 server cluster and supporting software systems are fully operational to support TSPA-LA type analysis.« less
Comparison of organs' shapes with geometric and Zernike 3D moments.

PubMed

Broggio, D; Moignier, A; Ben Brahim, K; Gardumi, A; Grandgirard, N; Pierrat, N; Chea, M; Derreumaux, S; Desbrée, A; Boisserie, G; Aubert, B; Mazeron, J-J; Franck, D

2013-09-01

The morphological similarity of organs is studied with feature vectors based on geometric and Zernike 3D moments. It is particularly investigated if outliers and average models can be identified. For this purpose, the relative proximity to the mean feature vector is defined, principal coordinate and clustering analyses are also performed. To study the consistency and usefulness of this approach, 17 livers and 76 hearts voxel models from several sources are considered. In the liver case, models with similar morphological feature are identified. For the limited amount of studied cases, the liver of the ICRP male voxel model is identified as a better surrogate than the female one. For hearts, the clustering analysis shows that three heart shapes represent about 80% of the morphological variations. The relative proximity and clustering analysis rather consistently identify outliers and average models. For the two cases, identification of outliers and surrogate of average models is rather robust. However, deeper classification of morphological feature is subject to caution and can only be performed after cross analysis of at least two kinds of feature vectors. Finally, the Zernike moments contain all the information needed to re-construct the studied objects and thus appear as a promising tool to derive statistical organ shapes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Performance map of a cluster detection test using extended power

PubMed Central

2013-01-01

Background Conventional power studies possess limited ability to assess the performance of cluster detection tests. In particular, they cannot evaluate the accuracy of the cluster location, which is essential in such assessments. Furthermore, they usually estimate power for one or a few particular alternative hypotheses and thus cannot assess performance over an entire region. Takahashi and Tango developed the concept of extended power that indicates both the rate of null hypothesis rejection and the accuracy of the cluster location. We propose a systematic assessment method, using here extended power, to produce a map showing the performance of cluster detection tests over an entire region. Methods To explore the behavior of a cluster detection test on identical cluster types at any possible location, we successively applied four different spatial and epidemiological parameters. These parameters determined four cluster collections, each covering the entire study region. We simulated 1,000 datasets for each cluster and analyzed them with Kulldorff’s spatial scan statistic. From the area under the extended power curve, we constructed a map for each parameter set showing the performance of the test across the entire region. Results Consistent with previous studies, the performance of the spatial scan statistic increased with the baseline incidence of disease, the size of the at-risk population and the strength of the cluster (i.e., the relative risk). Performance was heterogeneous, however, even for very similar clusters (i.e., similar with respect to the aforementioned factors), suggesting the influence of other factors. Conclusions The area under the extended power curve is a single measure of performance and, although needing further exploration, it is suitable to conduct a systematic spatial evaluation of performance. The performance map we propose enables epidemiologists to assess cluster detection tests across an entire study region. PMID:24156765
Cosmology from galaxy clusters as observed by Planck

NASA Astrophysics Data System (ADS)

Pierpaoli, Elena

We propose to use current all-sky data on galaxy clusters in the radio/infrared bands in order to constrain cosmology. This will be achieved performing parameter estimation with number counts and power spectra for galaxy clusters detected by Planck through their Sunyaev—Zeldovich signature. The ultimate goal of this proposal is to use clusters as tracers of matter density in order to provide information about fundamental properties of our Universe, such as the law of gravity on large scale, early Universe phenomena, structure formation and the nature of dark matter and dark energy. We will leverage on the availability of a larger and deeper cluster catalog from the latest Planck data release in order to include, for the first time, the cluster power spectrum in the cosmological parameter determination analysis. Furthermore, we will extend clusters' analysis to cosmological models not yet investigated by the Planck collaboration. These aims require a diverse set of activities, ranging from the characterization of the clusters' selection function, the choice of the cosmological cluster sample to be used for parameter estimation, the construction of mock samples in the various cosmological models with correct correlation properties in order to produce reliable selection functions and noise covariance matrices, and finally the construction of the appropriate likelihood for number counts and power spectra. We plan to make the final code available to the community and compatible with the most widely used cosmological parameter estimation code. This research makes use of data from the NASA satellites Planck and, less directly, Chandra, in order to constrain cosmology; and therefore perfectly fits the NASA objectives and the specifications of this solicitation.
Morphological and Inter Simple Sequence Repeat (ISSR) markers analyses of Corynespora cassiicola isolates from rubber plantations in Malaysia.

PubMed

Nghia, Nguyen Anh; Kadir, Jugah; Sunderasan, E; Puad Abdullah, Mohd; Malik, Adam; Napis, Suhaimi

2008-10-01

Morphological features and Inter Simple Sequence Repeat (ISSR) polymorphism were employed to analyse 21 Corynespora cassiicola isolates obtained from a number of Hevea clones grown in rubber plantations in Malaysia. The C. cassiicola isolates used in this study were collected from several states in Malaysia from 1998 to 2005. The morphology of the isolates was characteristic of that previously described for C. cassiicola. Variations in colony and conidial morphology were observed not only among isolates but also within a single isolate with no inclination to either clonal or geographical origin of the isolates. ISSR analysis delineated the isolates into two distinct clusters. The dendrogram created from UPGMA analysis based on Nei and Li's coefficient (calculated from the binary matrix data of 106 amplified DNA bands generated from 8 ISSR primers) showed that cluster 1 encompasses 12 isolates from the states of Johor and Selangor (this cluster was further split into 2 sub clusters (1A, 1B), sub cluster 1B consists of a unique isolate, CKT05D); while cluster 2 comprises of 9 isolates that were obtained from the other states. Detached leaf assay performed on selected Hevea clones showed that the pathogenicity of representative isolates from cluster 1 (with the exception of CKT05D) resembled that of race 1; and isolates in cluster 2 showed pathogenicity similar to race 2 of the fungus that was previously identified in Malaysia. The isolate CKT05D from sub cluster 1B showed pathogenicity dissimilar to either race 1 or race 2.
Spatial Hotspot Analysis of Acute Myocardial Infarction Events in an Urban Population: A Correlation Study of Health Problems and Industrial Installation

PubMed Central

NAMAYANDE, Motahareh Sadat; NEJADKOORKI, Farhad; NAMAYANDE, Seyedeh Mahdieh; DEHGHAN, Hamidreza

2016-01-01

Background: The current study’s objectives were to find any possible spatial patterns and hotspot of cardiovascular events and to perform a correlation study to find any possible relevance between cardiovascular disease (CVE) and location of industrial installation said above. Methods: We used the Acute Myocardial Infarction (AMI) hospital admission record in three main hospitals in Yazd, Yazd Province, Iran during 2013, because of CVDs and searched for possible correlation between industries as point-source pollutants and non-random distribution of AMI events. Results: MI incidence rate in Yazd was obtained 531 per 100,000 person-year among men, 458 per 100,000 person-year among women and 783/100,000 person-yr totally. We applied a GIS Hotspot analysis to determine feasible clusters and two sets of clusters were observed. Mean age of 56 AMI events occurred in the cluster cells was calculated as 62.21±14.75 yr. Age and sex as main confounders of AMI were evaluated in the cluster areas in comparison to other areas. We observed no significant difference regarding sex (59% in cluster cells versus 55% in total for men) and age (62.21±14.7 in cluster cells versus 63.28±13.98 in total for men). Conclusion: We found proximity of AMI events cluster to industries installations, and a steel industry, specifically. There could be an association between road-related pollutants and the observed sets of cluster due to the proximity exist between rather crowded highways nearby the events cluster. PMID:27057527
Microarray characterization of gene expression changes in blood during acute ethanol exposure

PubMed Central

2013-01-01

Background As part of the civil aviation safety program to define the adverse effects of ethanol on flying performance, we performed a DNA microarray analysis of human whole blood samples from a five-time point study of subjects administered ethanol orally, followed by breathalyzer analysis, to monitor blood alcohol concentration (BAC) to discover significant gene expression changes in response to the ethanol exposure. Methods Subjects were administered either orange juice or orange juice with ethanol. Blood samples were taken based on BAC and total RNA was isolated from PaxGene™ blood tubes. The amplified cDNA was used in microarray and quantitative real-time polymerase chain reaction (RT-qPCR) analyses to evaluate differential gene expression. Microarray data was analyzed in a pipeline fashion to summarize and normalize and the results evaluated for relative expression across time points with multiple methods. Candidate genes showing distinctive expression patterns in response to ethanol were clustered by pattern and further analyzed for related function, pathway membership and common transcription factor binding within and across clusters. RT-qPCR was used with representative genes to confirm relative transcript levels across time to those detected in microarrays. Results Microarray analysis of samples representing 0%, 0.04%, 0.08%, return to 0.04%, and 0.02% wt/vol BAC showed that changes in gene expression could be detected across the time course. The expression changes were verified by qRT-PCR. The candidate genes of interest (GOI) identified from the microarray analysis and clustered by expression pattern across the five BAC points showed seven coordinately expressed groups. Analysis showed function-based networks, shared transcription factor binding sites and signaling pathways for members of the clusters. These include hematological functions, innate immunity and inflammation functions, metabolic functions expected of ethanol metabolism, and pancreatic and hepatic function. Five of the seven clusters showed links to the p38 MAPK pathway. Conclusions The results of this study provide a first look at changing gene expression patterns in human blood during an acute rise in blood ethanol concentration and its depletion because of metabolism and excretion, and demonstrate that it is possible to detect changes in gene expression using total RNA isolated from whole blood. The analysis approach for this study serves as a workflow to investigate the biology linked to expression changes across a time course and from these changes, to identify target genes that could serve as biomarkers linked to pilot performance. PMID:23883607
Grouping of Bulgarian wines according to grape variety by using statistical methods

NASA Astrophysics Data System (ADS)

Milev, M.; Nikolova, Kr.; Ivanova, Ir.; Minkova, St.; Evtimov, T.; Krustev, St.

2017-12-01

68 different types of Bulgarian wines were studied in accordance with 9 optical parameters as follows: color parameters in XYZ and SIE Lab color systems, lightness, Hue angle, chroma, fluorescence intensity and emission wavelength. The main objective of this research is using hierarchical cluster analysis to evaluate the similarity and the distance between examined different types of Bulgarian wines and their grouping based on physical parameters. We have found that wines are grouped in clusters on the base of the degree of identity between them. There are two main clusters each one with two subclusters. The first one contains white wines and Sira, the second contains red wines and rose. The results from cluster analysis are presented graphically by a dendrogram. The other statistical technique used is factor analysis performed by the Method of Principal Components (PCA). The aim is to reduce the large number of variables to a few factors by grouping the correlated variables into one factor and subdividing the noncorrelated variables into different factors. Moreover the factor analysis provided the possibility to determine the parameters with the greatest influence over the distribution of samples in different clusters. In our study after the rotation of the factors with Varimax method the parameters were combined into two factors, which explain about 80 % of the total variation. The first one explains the 61.49% and correlates with color characteristics, the second one explains 18.34% from the variation and correlates with the parameters connected with fluorescence spectroscopy.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward

There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a statistical replication of a cluster quasi-randomized stepped-wedge trial

PubMed Central

Davey, Calum; Aiken, Alexander M; Hayes, Richard J; Hargreaves, James R

2015-01-01

Introduction: Helminth (worm) infections cause morbidity among poor communities worldwide. An influential study conducted in Kenya in 1998–99 reported that a school-based drug-and-educational intervention had benefits for worm infections and school attendance. Methods: In this statistical replication, we re-analysed data from this cluster quasi-randomized stepped-wedge trial, specifying two co-primary outcomes: school attendance and examination performance. We estimated intention-to-treat effects using year-stratified cluster-summary analysis and observation-level random-effects regression, and combined both years with a random-effects model accounting for year. The participants were not blinded to allocation status, and other interventions were concurrently conducted in a sub-set of schools. A protocol guiding outcome data collection was not available. Results: Quasi-randomization resulted in three similar groups of 25 schools. There was a substantial amount of missing data. In year-stratified cluster-summary analysis, there was no clear evidence for improvement in either school attendance or examination performance. In year-stratified regression models, there was some evidence of improvement in school attendance [adjusted odds ratios (aOR): year 1: 1.48, 95% confidence interval (CI) 0.88–2.52, P = 0.147; year 2: 1.23, 95% CI 1.01–1.51, P = 0.044], but not examination performance (adjusted differences: year 1: −0.135, 95% CI −0.323–0.054, P = 0.161; year 2: −0.017, 95% CI −0.201–0.166, P = 0.854). When both years were combined, there was strong evidence of an effect on attendance (aOR 1.82, 95% CI 1.74–1.91, P < 0.001), but not examination performance (adjusted difference −0.121, 95% CI −0.293–0.052, P = 0.169). Conclusions: The evidence supporting an improvement in school attendance differed by analysis method. This, and various other important limitations of the data, caution against over-interpretation of the results. We find that the study provides some evidence, but with high risk of bias, that a school-based drug-treatment and health-education intervention improved school attendance and no evidence of effect on examination performance. PMID:26203171
Method and apparatus for offloading compute resources to a flash co-processing appliance

DOEpatents

Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing -bung

2015-10-13

Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.

PubMed

Roche, Kimberly E; Weinstein, Marvin; Dunwoodie, Leland J; Poehlman, William L; Feltus, Frank A

2018-05-25

We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.
Jade: using on-demand cloud analysis to give scientists back their flow

NASA Astrophysics Data System (ADS)

Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.

2017-12-01

The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
Improving local clustering based top-L link prediction methods via asymmetric link clustering information

NASA Astrophysics Data System (ADS)

Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

2018-02-01

Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.
Construction of the energy matrix for complex atoms. Part VIII: Hyperfine structure HPC calculations for terbium atom

NASA Astrophysics Data System (ADS)

Elantkowska, Magdalena; Ruczkowski, Jarosław; Sikorski, Andrzej; Dembczyński, Jerzy

2017-11-01

A parametric analysis of the hyperfine structure (hfs) for the even parity configurations of atomic terbium (Tb I) is presented in this work. We introduce the complete set of 4fN-core states in our high-performance computing (HPC) calculations. For calculations of the huge hyperfine structure matrix, requiring approximately 5000 hours when run on a single CPU, we propose the methods utilizing a personal computer cluster or, alternatively a cluster of Microsoft Azure virtual machines (VM). These methods give a factor 12 performance boost, enabling the calculations to complete in an acceptable time.
Altitude as a risk factor for the development of hypospadias. Geographical cluster distribution analysis in South America.

PubMed

Fernández, Nicolas; Lorenzo, Armando; Bägli, Darius; Zarante, Ignacio

2016-10-01

Hypospadias is the most common congenital anomaly affecting the genitals. It has been established as a multifactorial disease with increasing prevalence. Many risk factors have been identified such as prematurity, birth weight, mother's age, and exposure to endocrine disruptors. In recent decades multiple authors using surveillance systems have described an increase in prevalence of hypospadias, but most of the published literature comes from developed countries in Europe and North America and few of the published studies have involved cluster analysis. Few large-scale studies have been performed addressing the effect of altitude and other geographical aspects on the development of hypospadias. Acknowledging this limitation, we present novel results of a multinational spatial scan statistical analysis over a 30-year period in South America and an altitude analysis of hypospadias distribution on a continent level. A retrospective review was performed of the Latin American collaborative study of congenital malformations (ECLAMC). A total of 4,020,384 newborns was surveyed between 1982 and December 2011 in all participating centers. We selected all patients with hypospadias. All degrees of clinical severity were included in the analysis. Each participating center was geographically identified with its coordinates and altitude above sea level. A spatial scan statistical analysis was performed using Kulldorf's methodology and a prevalence trend analysis over time in centers below and above 2000 m. During the study period we found 159 hospitals in six different countries (Colombia, Bolivia, Brazil, Argentina, Chile, and Uruguay) with 4,537 cases of hypospadias and a global prevalence rate of 11.3/10,000 newborns. Trend analysis showed that centers below 2000 m had an increasing trend with an average of 10/10,000 newborns as opposed to those centers above 2000 m that showed a reducing trend with an average prevalence of 7.8 (p = 0.1246). We identified clusters with significant increases of prevalence in five centers along the coast at an average altitude of 219.8 m above sea level (p > 0.0000). Reduction in prevalence was found in clusters located in two centers on the Andes mountains. Altitude of 2,000 m was associated with hypospadias (Figure), with an OR 0.59 (0.5-0.69). There are ethnic arguments to support our results supported by protective polymorphism distribution in high lands. Altitude above 2,000 m is suggested to have a protective effect for hypospadias. Specific clusters have been identified with increased risk for hypospadias. Environmental risk factors in these areas need to be further studied given the association seen between altitude and the distribution of more severe cases. Copyright © 2016 Journal of Pediatric Urology Company. Published by Elsevier Ltd. All rights reserved.
Molecular dynamics simulations of the structure evolutions of Cu-Zr metallic glasses under irradiation

NASA Astrophysics Data System (ADS)

Lang, Lin; Tian, Zean; Xiao, Shifang; Deng, Huiqiu; Ao, Bingyun; Chen, Piheng; Hu, Wangyu

2017-02-01

Molecular dynamics simulations have been performed to investigate the structural evolution of Cu64.5Zr35.5 metallic glasses under irradiation. The largest standard cluster analysis (LSCA) method was used to quantify the microstructure within the collision cascade regions. It is found that the majority of clusters within the collision cascade regions are full and defective icosahedrons. Not only the smaller structures (common neighbor subcluster) but also primary clusters greatly changed during the collision cascades; while most of these radiation damages self-recover quickly in the following quench states. These findings indicate the Cu-Zr metallic glasses have excellent irradiation-resistance properties.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.