Environmental Gradient Analysis, Ordination, and Classification in Environmental Impact Assessments.
1987-09-01
agglomerative clustering algorithms for mainframe computers: (1) the unweighted pair-group method that V uses arithmetic averages ( UPGMA ), (2) the...hierarchical agglomerative unweighted pair-group method using arithmetic averages ( UPGMA ), which is also called average linkage clustering. This method was...dendrograms produced by weighted clustering (93). Sneath and Sokal (94), Romesburg (84), and Seber• (90) also strongly recommend the UPGMA . A dendrogram
NASA Astrophysics Data System (ADS)
Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan
2018-04-01
Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Mining a Web Citation Database for Author Co-Citation Analysis.
ERIC Educational Resources Information Center
He, Yulan; Hui, Siu Cheung
2002-01-01
Proposes a mining process to automate author co-citation analysis based on the Web Citation Database, a data warehouse for storing citation indices of Web publications. Describes the use of agglomerative hierarchical clustering for author clustering and multidimensional scaling for displaying author cluster maps, and explains PubSearch, a…
ERIC Educational Resources Information Center
van der Kloot, Willem A.; Spaans, Alexander M. J.; Heiser, Willem J.
2005-01-01
Hierarchical agglomerative cluster analysis (HACA) may yield different solutions under permutations of the input order of the data. This instability is caused by ties, either in the initial proximity matrix or arising during agglomeration. The authors recommend to repeat the analysis on a large number of random permutations of the rows and columns…
2015-01-01
Background Cellular processes are known to be modular and are realized by groups of proteins implicated in common biological functions. Such groups of proteins are called functional modules, and many community detection methods have been devised for their discovery from protein interaction networks (PINs) data. In current agglomerative clustering approaches, vertices with just a very few neighbors are often classified as separate clusters, which does not make sense biologically. Also, a major limitation of agglomerative techniques is that their computational efficiency do not scale well to large PINs. Finally, PIN data obtained from large scale experiments generally contain many false positives, and this makes it hard for agglomerative clustering methods to find the correct clusters, since they are known to be sensitive to noisy data. Results We propose a local similarity premetric, the relative vertex clustering value, as a new criterion allowing to decide when a node can be added to a given node's cluster and which addresses the above three issues. Based on this criterion, we introduce a novel and very fast agglomerative clustering technique, FAC-PIN, for discovering functional modules and protein complexes from a PIN data. Conclusions Our proposed FAC-PIN algorithm is applied to nine PIN data from eight different species including the yeast PIN, and the identified functional modules are validated using Gene Ontology (GO) annotations from DAVID Bioinformatics Resources. Identified protein complexes are also validated using experimentally verified complexes. Computational results show that FAC-PIN can discover functional modules or protein complexes from PINs more accurately and more efficiently than HC-PIN and CNM, the current state-of-the-art approaches for clustering PINs in an agglomerative manner. PMID:25734691
The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis
ERIC Educational Resources Information Center
Blashfield, Roger
1977-01-01
Three different software programs which contain hierarchical agglomerative cluster analysis procedures were shown to generate different solutions on the same data set using apparently the same options. The basis for the differences in the solutions was the formulae used to calculate Euclidean distance. (Author/JKS)
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-07-01
In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Application of agglomerative clustering for analyzing phylogenetically on bacterium of saliva
NASA Astrophysics Data System (ADS)
Bustamam, A.; Fitria, I.; Umam, K.
2017-07-01
Analyzing population of Streptococcus bacteria is important since these species can cause dental caries, periodontal, halitosis (bad breath) and more problems. This paper will discuss the phylogenetically relation between the bacterium Streptococcus in saliva using a phylogenetic tree of agglomerative clustering methods. Starting with the bacterium Streptococcus DNA sequence obtained from the GenBank, then performed characteristic extraction of DNA sequences. The characteristic extraction result is matrix form, then performed normalization using min-max normalization and calculate genetic distance using Manhattan distance. Agglomerative clustering technique consisting of single linkage, complete linkage and average linkage. In this agglomerative algorithm number of group is started with the number of individual species. The most similar species is grouped until the similarity decreases and then formed a single group. Results of grouping is a phylogenetic tree and branches that join an established level of distance, that the smaller the distance the more the similarity of the larger species implementation is using R, an open source program.
Segmenting Student Markets with a Student Satisfaction and Priorities Survey.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1995-01-01
A market segmentation analysis of 872 university students compared 2 hierarchical clustering procedures for deriving market segments: 1 using matching-type measures and an agglomerative clustering algorithm, and 1 using the chi-square based automatic interaction detection. Results and implications for planning, evaluating, and improving academic…
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-11-01
In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
A meadow site classification for the Sierra Nevada, California
Raymond D. Ratliff
1982-01-01
This report describes 14 meadow site classes derived through techniques of agglomerative cluster analysis. The class names are: Carex rostrata (beaked sedge), Poa (Kentucky bluegrass), Heleocharis/Heleocharis (ephemeral-lake), Hypericum/Polygonum/ Viola (hillside bog), Trifolium/...
Strong influence of variable treatment on the performance of numerically defined ecological regions.
Snelder, Ton; Lehmann, Anthony; Lamouroux, Nicolas; Leathwick, John; Allenbach, Karin
2009-10-01
Numerical clustering has frequently been used to define hierarchically organized ecological regionalizations, but there has been little robust evaluation of their performance (i.e., the degree to which regions discriminate areas with similar ecological character). In this study we investigated the effect of the weighting and treatment of input variables on the performance of regionalizations defined by agglomerative clustering across a range of hierarchical levels. For this purpose, we developed three ecological regionalizations of Switzerland of increasing complexity using agglomerative clustering. Environmental data for our analysis were drawn from a 400 m grid and consisted of estimates of 11 environmental variables for each grid cell describing climate, topography and lithology. Regionalization 1 was defined from the environmental variables which were given equal weights. We used the same variables in Regionalization 2 but weighted and transformed them on the basis of a dissimilarity model that was fitted to land cover composition data derived for a random sample of cells from interpretation of aerial photographs. Regionalization 3 was a further two-stage development of Regionalization 2 where specific classifications, also weighted and transformed using dissimilarity models, were applied to 25 small scale "sub-domains" defined by Regionalization 2. Performance was assessed in terms of the discrimination of land cover composition for an independent set of sites using classification strength (CS), which measured the similarity of land cover composition within classes and the dissimilarity between classes. Regionalization 2 performed significantly better than Regionalization 1, but the largest gains in performance, compared to Regionalization 1, occurred at coarse hierarchical levels (i.e., CS did not increase significantly beyond the 25-region level). Regionalization 3 performed better than Regionalization 2 beyond the 25-region level and CS values continued to increase to the 95-region level. The results show that the performance of regionalizations defined by agglomerative clustering are sensitive to variable weighting and transformation. We conclude that large gains in performance can be achieved by training classifications using dissimilarity models. However, these gains are restricted to a narrow range of hierarchical levels because agglomerative clustering is unable to represent the variation in importance of variables at different spatial scales. We suggest that further advances in the numerical definition of hierarchically organized ecological regionalizations will be possible with techniques developed in the field of statistical modeling of the distribution of community composition.
Wilderness ecology: virgin plant communities of the Boundary Waters Canoe Area.
Lewis F. Ohmann; Robert R. Ream
1971-01-01
Describes virgin plant communities in the Boundary Waters Canoe Area. Data from all vegetative components of 106 virgin upland stands were used to construct a community classification through a combination of agglomerative clustering and principal components analysis. Discusses the relation of communities to their environment and to past wildfires.
High- and low-level hierarchical classification algorithm based on source separation process
NASA Astrophysics Data System (ADS)
Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber
2016-10-01
High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Strand, Edythe A; McCauley, Rebecca J; Weigand, Stephen D; Stoeckel, Ruth E; Baas, Becky S
2013-04-01
In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Participants were 81 children between 36 and 79 months of age who were referred to the Mayo Clinic for diagnosis of speech sound disorders. Children were given the DEMSS and a standard speech and language test battery as part of routine evaluations. Subsequently, intrajudge, interjudge, and test-retest reliability were evaluated for a subset of participants. Construct validity was explored for all 81 participants through the use of agglomerative cluster analysis, sensitivity measures, and likelihood ratios. The mean percentage of agreement for 171 judgments was 89% for test-retest reliability, 89% for intrajudge reliability, and 91% for interjudge reliability. Agglomerative hierarchical cluster analysis showed that total DEMSS scores largely differentiated clusters of children with CAS vs. mild CAS vs. other speech disorders. Positive and negative likelihood ratios and measures of sensitivity and specificity suggested that the DEMSS does not overdiagnose CAS but sometimes fails to identify children with CAS. The value of the DEMSS in differential diagnosis of severe speech impairments was supported on the basis of evidence of reliability and validity.
OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.
Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A
2014-10-16
The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.
A Comparison of Two Approaches to Beta-Flexible Clustering.
ERIC Educational Resources Information Center
Belbin, Lee; And Others
1992-01-01
A method for hierarchical agglomerative polythetic (multivariate) clustering, based on unweighted pair group using arithmetic averages (UPGMA) is compared with the original beta-flexible technique, a weighted average method. Reasons the flexible UPGMA strategy is recommended are discussed, focusing on the ability to recover cluster structure over…
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Empirical Identification of Hierarchies.
ERIC Educational Resources Information Center
McCormick, Douglas; And Others
Outlining a cluster procedure which maximizes specific criteria while building scales from binary measures using a sequential, agglomerative, overlapping, non-hierarchic method results in indices giving truer results than exploratory facotr analyses or multidimensional scaling. In a series of eleven figures, patterns within cluster histories…
Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A
2016-08-05
Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Hierarchic Agglomerative Clustering Methods for Automatic Document Classification.
ERIC Educational Resources Information Center
Griffiths, Alan; And Others
1984-01-01
Considers classifications produced by application of single linkage, complete linkage, group average, and word clustering methods to Keen and Cranfield document test collections, and studies structure of hierarchies produced, extent to which methods distort input similarity matrices during classification generation, and retrieval effectiveness…
A similarity based agglomerative clustering algorithm in networks
NASA Astrophysics Data System (ADS)
Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong
2018-04-01
The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.
Optimal wavelength band clustering for multispectral iris recognition.
Gong, Yazhuo; Zhang, David; Shi, Pengfei; Yan, Jingqi
2012-07-01
This work explores the possibility of clustering spectral wavelengths based on the maximum dissimilarity of iris textures. The eventual goal is to determine how many bands of spectral wavelengths will be enough for iris multispectral fusion and to find these bands that will provide higher performance of iris multispectral recognition. A multispectral acquisition system was first designed for imaging the iris at narrow spectral bands in the range of 420 to 940 nm. Next, a set of 60 human iris images that correspond to the right and left eyes of 30 different subjects were acquired for an analysis. Finally, we determined that 3 clusters were enough to represent the 10 feature bands of spectral wavelengths using the agglomerative clustering based on two-dimensional principal component analysis. The experimental results suggest (1) the number, center, and composition of clusters of spectral wavelengths and (2) the higher performance of iris multispectral recognition based on a three wavelengths-bands fusion.
Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal
2013-07-01
Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts-landscape architects, regional planners, and geographers-revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.
NASA Astrophysics Data System (ADS)
Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal
2013-07-01
Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts—landscape architects, regional planners, and geographers—revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.
Alexander, Nathan; Woetzel, Nils; Meiler, Jens
2011-02-01
Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.
Orsi, Rebecca
2017-02-01
Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.
Verma, Priyanka; Kumar, Manoj; Mishra, Girish; Sahoo, Dinabandhu
2017-02-01
In the present study bio prospecting of thirty seaweeds from Indian coasts was analyzed for their biochemical components including pigments, fatty acid and ash content. Multivariate analysis of biochemical components and fatty acids was done using Principal Component Analysis (PCA) and Agglomerative hierarchical clustering (AHC) to manifest chemotaxonomic relationship among various seaweeds. The overall analysis suggests that these seaweeds have multi-functional properties and can be utilized as promising bioresource for proteins, lipids, pigments and carbohydrates for the food/feed and biofuel industry. Copyright © 2016. Published by Elsevier Ltd.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
Interactive visual exploration and analysis of origin-destination data
NASA Astrophysics Data System (ADS)
Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.
2018-05-01
In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.
Iron status as a covariate in methylmercury-associated neurotoxicity risk.
Fonseca, Márlon de Freitas; De Souza Hacon, Sandra; Grandjean, Philippe; Choi, Anna Lai; Bastos, Wanderley Rodrigues
2014-04-01
Intrauterine methylmercury exposure and prenatal iron deficiency negatively affect offspring's brain development. Since fish is a major source of both methylmercury and iron, occurrence of negative confounding may affect the interpretation of studies concerning cognition. We assessed relationships between methylmercury exposure and iron-status in childbearing females from a population naturally exposed to methylmercury through fish intake (Amazon). We concluded a census (refuse <20%) collecting samples from 274 healthy females (12-49 years) for hair-mercury determination and assessed iron-status through red cell tests and determination of serum ferritin and iron. Reactive C protein and thyroid hormones was used for excluding inflammation and severe thyroid dysfunctions that could affect results. We assessed the association between iron-status and hair-mercury by bivariate correlation analysis and also by different multivariate models: linear regression (to check trends); hierarchical agglomerative clustering method (groups of variables correlated with each other); and factor analysis (to examine redundancy or duplication from a set of correlated variables). Hair-mercury correlated weakly with mean corpuscular volume (r=.141; P=.020) and corpuscular hemoglobin (r=.132; .029), but not with the best biomarker of iron-status, ferritin (r=.037; P=.545). In the linear regression analysis, methylmercury exposure showed weak association with age-adjusted ferritin; age had a significant coefficient (Beta=.015; 95% CI: .003-.027; P=.016) but ferritin did not (Beta=.034; 95% CI: -.147 to .216; P=.711). In the hierarchical agglomerative clustering method, hair-mercury and iron-status showed the smallest similarities. Regarding factor analysis, iron-status and hair-mercury loaded different uncorrelated components. We concluded that iron-status and methylmercury exposure probably occur in an independent way. Copyright © 2013 Elsevier Ltd. All rights reserved.
Huh, Yong; Yu, Kiyun; Park, Woojin
2016-01-01
This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48.
Hawkins, Misty A.W.; Schaefer, Julie T.; Gunstad, John; Dolansky, Mary A.; Redle, Joseph D.; Josephson, Richard; Moore, Shirley M.; Hughes, Joel W.
2014-01-01
Purpose To determine whether patients with heart failure (HF) have distinct profiles of cognitive impairment. Background Cognitive impairment is common in HF. Recent work found three cognitive profiles in HF patients— (1) intact, (2) impaired, and (3) memory-impaired. We examined the reproducibility of these profiles and clarified mechanisms. Methods HF patients (68.6±9.7years; N=329) completed neuropsychological testing. Composite scores were created for cognitive domains and used to identify clusters via agglomerative-hierarchical cluster analysis. Results A 3-cluster solution emerged. Cluster 1 (n=109) had intact cognition. Cluster 2 (n=123) was impaired across all domains. Cluster 3 (n=97) had impaired memory only. Clusters differed in age, race, education, SES, IQ, BMI, and diabetes (ps ≤.026) but not in mood, anxiety, cardiovascular, or pulmonary disease (ps≥.118). Conclusions We replicated three distinct patterns of cognitive function in persons with HF. These profiles may help providers offer tailored care to patients with different cognitive and clinical needs. PMID:25510559
NASA Astrophysics Data System (ADS)
Farsadnia, F.; Rostami Kamrood, M.; Moghaddam Nia, A.; Modarres, R.; Bray, M. T.; Han, D.; Sadatinejad, J.
2014-02-01
One of the several methods in estimating flood quantiles in ungauged or data-scarce watersheds is regional frequency analysis. Amongst the approaches to regional frequency analysis, different clustering techniques have been proposed to determine hydrologically homogeneous regions in the literature. Recently, Self-Organization feature Map (SOM), a modern hydroinformatic tool, has been applied in several studies for clustering watersheds. However, further studies are still needed with SOM on the interpretation of SOM output map for identifying hydrologically homogeneous regions. In this study, two-level SOM and three clustering methods (fuzzy c-mean, K-mean, and Ward's Agglomerative hierarchical clustering) are applied in an effort to identify hydrologically homogeneous regions in Mazandaran province watersheds in the north of Iran, and their results are compared with each other. Firstly the SOM is used to form a two-dimensional feature map. Next, the output nodes of the SOM are clustered by using unified distance matrix algorithm and three clustering methods to form regions for flood frequency analysis. The heterogeneity test indicates the four regions achieved by the two-level SOM and Ward approach after adjustments are sufficiently homogeneous. The results suggest that the combination of SOM and Ward is much better than the combination of either SOM and FCM or SOM and K-mean.
Mishra, K K; Pal, R S; Arunkumar, R; Chandrashekara, C; Jain, S K; Bhatt, J C
2013-06-01
Total phenolics, radical scavenging activity (RSA) on DPPH, ascorbic acid content and chelating activity on Fe(2+) of Pleurotus citrinopileatus, Pleurotus djamor, Pleurotus eryngii, Pleurotus flabellatus, Pleurotus florida, Pleurotus ostreatus, Pleurotus sajor-caju and Hypsizygus ulmarius have been evaluated. The assayed mushrooms contained 3.94-21.67 mg TAE of phenolics, 13.63-69.67% DPPH scavenging activity, 3.76-6.76 mg ascorbic acid and 60.25-82.7% chelating activity. Principal Component Analysis (PCA) revealed that significantly higher total phenolics, RSA on DPPH and growth/day was present in P. eryngii whereas P. citrinopileatus showed higher ascorbic acid and chelating activity. Agglomerative hierarchical clustering analysis revealed that studied mushroom species fall into two clusters; Cluster I included P. djamor, P. eryngii and P. flabellatus, while Cluster II included H. ulmarius, P. sajor-caju, P. citrinopileatus, P. ostreatus and P. florida. Enhanced yield of P. eryngii was achieved on spent compost casing material. Use of casing materials enhanced yield by 21-107% over non-cased substrate. Copyright © 2012 Elsevier Ltd. All rights reserved.
Predicting healthcare outcomes in prematurely born infants using cluster analysis.
MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne
2018-05-23
Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.
Yang, Guang; Raschke, Felix; Barrick, Thomas R; Howe, Franklyn A
2015-09-01
To investigate whether nonlinear dimensionality reduction improves unsupervised classification of (1) H MRS brain tumor data compared with a linear method. In vivo single-voxel (1) H magnetic resonance spectroscopy (55 patients) and (1) H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With (1) H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of (1) H MRSI data after cluster analysis. © 2014 Wiley Periodicals, Inc.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
Dimitrijevic, Marija V; Mitic, Violeta D; Jovanovic, Olga P; Stankov Jovanovic, Vesna P; Nikolic, Jelena S; Petrovic, Goran M; Stojanovic, Gordana S
2018-01-01
Eleven species of wild mushrooms which belong to Boletaceae and Russulaceae families were examined by gas chromatography (GC) and gas chromatography-mass spectrometry (GC/MS) analysis for the presence of fatty acids. As far as we know, the fatty acid profiles of B. purpureus and B. rhodoxanthus were described for the first time. Twenty-six fatty acids were determined. Linoleic (19.5 - 72%), oleic (0.11 - 64%), palmitic (5.9 - 22%) and stearic acids (0.81 - 57%) were present in the highest contents. In all samples, unsaturated fatty acids dominate. Agglomerative hierarchical clustering was used to display the correlation between the fatty acids and their relationships with the mushroom species. Based on the fatty acids profile in the samples, the mushrooms can be divided into two families: Boletaceae and Russulaceae families, using cluster analysis. © 2018 Wiley-VHCA AG, Zurich, Switzerland.
Suicide in the oldest old: an observational study and cluster analysis.
Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth
2016-01-01
The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.
Coronal Mass Ejection Data Clustering and Visualization of Decision Trees
NASA Astrophysics Data System (ADS)
Ma, Ruizhe; Angryk, Rafal A.; Riley, Pete; Filali Boubrahimi, Soukaina
2018-05-01
Coronal mass ejections (CMEs) can be categorized as either “magnetic clouds” (MCs) or non-MCs. Features such as a large magnetic field, low plasma-beta, and low proton temperature suggest that a CME event is also an MC event; however, so far there is neither a definitive method nor an automatic process to distinguish the two. Human labeling is time-consuming, and results can fluctuate owing to the imprecise definition of such events. In this study, we approach the problem of MC and non-MC distinction from a time series data analysis perspective and show how clustering can shed some light on this problem. Although many algorithms exist for traditional data clustering in the Euclidean space, they are not well suited for time series data. Problems such as inadequate distance measure, inaccurate cluster center description, and lack of intuitive cluster representations need to be addressed for effective time series clustering. Our data analysis in this work is twofold: clustering and visualization. For clustering we compared the results from the popular hierarchical agglomerative clustering technique to a distance density clustering heuristic we developed previously for time series data clustering. In both cases, dynamic time warping will be used for similarity measure. For classification as well as visualization, we use decision trees to aggregate single-dimensional clustering results to form a multidimensional time series decision tree, with averaged time series to present each decision. In this study, we achieved modest accuracy and, more importantly, an intuitive interpretation of how different parameters contribute to an MC event.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
Analysis of the nutritional status of algae by Fourier transform infrared chemical imaging
NASA Astrophysics Data System (ADS)
Hirschmugl, Carol J.; Bayarri, Zuheir-El; Bunta, Maria; Holt, Justin B.; Giordano, Mario
2006-09-01
A new non-destructive method to study the nutritional status of algal cells and their environments is demonstrated. This approach allows rapid examination of whole cells without any or little pre-treatment providing a large amount of information on the biochemical composition of cells and growth medium. The method is based on the analysis of a collection of infrared (IR) spectra for individual cells; each spectrum describes the biochemical composition of a portion of a cell; a complete set of spectra is used to reconstruct an image of the entire cell. To obtain spatially resolved information synchrotron radiation was used as a bright IR source. We tested this method on the green flagellate Euglena gracilis; a comparison was conducted between cells grown in nutrient replete conditions (Type 1) and on cells allowed to deplete their medium (Type 2). Complete sets of spectra for individual cells of both types were analyzed with agglomerative hierarchical clustering, leading to distinct clusters representative of the two types of cells. The average spectra for the clusters confirmed the similarities between the clusters and the types of cells. The clustering analysis, therefore, allows the distinction of cells of the same species, but with different nutritional histories. In order to facilitate the application of the method and reduce manipulation (washing), we analyzed the cells in the presence of residual medium. The results obtained showed that even with residual medium the outcome of the clustering analysis is reliable. Our results demonstrate the applicability FTIR microspectroscopy for ecological and ecophysiological studies.
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.
Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel
2017-09-18
The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Lester; Nachman, Benjamin; Schwartzman, Ariel
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets . To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets , are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variablesmore » in boosted topologies. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.« less
Mackey, Lester; Nachman, Benjamin; Schwartzman, Ariel; ...
2016-06-01
Collimated streams of particles produced in high energy physics experiments are organized using clustering algorithms to form jets . To construct jets, the experimental collaborations based at the Large Hadron Collider (LHC) primarily use agglomerative hierarchical clustering schemes known as sequential recombination. We propose a new class of algorithms for clustering jets that use infrared and collinear safe mixture models. These new algorithms, known as fuzzy jets , are clustered using maximum likelihood techniques and can dynamically determine various properties of jets like their size. We show that the fuzzy jet size adds additional information to conventional jet tagging variablesmore » in boosted topologies. Furthermore, we study the impact of pileup and show that with some slight modifications to the algorithm, fuzzy jets can be stable up to high pileup interaction multiplicities.« less
Recuerda, Maximilien; Périé, Delphine; Gilbert, Guillaume; Beaudoin, Gilles
2012-10-12
The treatment planning of spine pathologies requires information on the rigidity and permeability of the intervertebral discs (IVDs). Magnetic resonance imaging (MRI) offers great potential as a sensitive and non-invasive technique for describing the mechanical properties of IVDs. However, the literature reported small correlation coefficients between mechanical properties and MRI parameters. Our hypothesis is that the compressive modulus and the permeability of the IVD can be predicted by a linear combination of MRI parameters. Sixty IVDs were harvested from bovine tails, and randomly separated in four groups (in-situ, digested-6h, digested-18h, digested-24h). Multi-parametric MRI acquisitions were used to quantify the relaxation times T1 and T2, the magnetization transfer ratio MTR, the apparent diffusion coefficient ADC and the fractional anisotropy FA. Unconfined compression, confined compression and direct permeability measurements were performed to quantify the compressive moduli and the hydraulic permeabilities. Differences between groups were evaluated from a one way ANOVA. Multi linear regressions were performed between dependent mechanical properties and independent MRI parameters to verify our hypothesis. A principal component analysis was used to convert the set of possibly correlated variables into a set of linearly uncorrelated variables. Agglomerative Hierarchical Clustering was performed on the 3 principal components. Multilinear regressions showed that 45 to 80% of the Young's modulus E, the aggregate modulus in absence of deformation HA0, the radial permeability kr and the axial permeability in absence of deformation k0 can be explained by the MRI parameters within both the nucleus pulposus and the annulus pulposus. The principal component analysis reduced our variables to two principal components with a cumulative variability of 52-65%, which increased to 70-82% when considering the third principal component. The dendograms showed a natural division into four clusters for the nucleus pulposus and into three or four clusters for the annulus fibrosus. The compressive moduli and the permeabilities of isolated IVDs can be assessed mostly by MT and diffusion sequences. However, the relationships have to be improved with the inclusion of MRI parameters more sensitive to IVD degeneration. Before the use of this technique to quantify the mechanical properties of IVDs in vivo on patients suffering from various diseases, the relationships have to be defined for each degeneration state of the tissue that mimics the pathology. Our MRI protocol associated to principal component analysis and agglomerative hierarchical clustering are promising tools to classify the degenerated intervertebral discs and further find biomarkers and predictive factors of the evolution of the pathologies.
Quantitative comparison of alternative methods for coarse-graining biological networks
Bowman, Gregory R.; Meng, Luming; Huang, Xuhui
2013-01-01
Markov models and master equations are a powerful means of modeling dynamic processes like protein conformational changes. However, these models are often difficult to understand because of the enormous number of components and connections between them. Therefore, a variety of methods have been developed to facilitate understanding by coarse-graining these complex models. Here, we employ Bayesian model comparison to determine which of these coarse-graining methods provides the models that are most faithful to the original set of states. We find that the Bayesian agglomerative clustering engine and the hierarchical Nyström expansion graph (HNEG) typically provide the best performance. Surprisingly, the original Perron cluster cluster analysis (PCCA) method often provides the next best results, outperforming the newer PCCA+ method and the most probable paths algorithm. We also show that the differences between the models are qualitatively significant, rather than being minor shifts in the boundaries between states. The performance of the methods correlates well with the entropy of the resulting coarse-grainings, suggesting that finding states with more similar populations (i.e., avoiding low population states that may just be noise) gives better results. PMID:24089717
Calderón, Félix; Barros, David; Bueno, José María; Coterón, José Miguel; Fernández, Esther; Gamo, Francisco Javier; Lavandera, José Luís; León, María Luisa; Macdonald, Simon J F; Mallo, Araceli; Manzano, Pilar; Porras, Esther; Fiandor, José María; Castro, Julia
2011-10-13
In 2010, GlaxoSmithKline published the structures of 13533 chemical starting points for antimalarial lead identification. By using an agglomerative structural clustering technique followed by computational filters such as antimalarial activity, physicochemical properties, and dissimilarity to known antimalarial structures, we have identified 47 starting points for lead optimization. Their structures are provided. We invite potential collaborators to work with us to discover new clinical candidates.
Nilsson, Daniel; Lindman, Magdalena; Victor, Trent; Dozza, Marco
2018-04-01
Single-vehicle run-off-road crashes are a major traffic safety concern, as they are associated with a high proportion of fatal outcomes. In addressing run-off-road crashes, the development and evaluation of advanced driver assistance systems requires test scenarios that are representative of the variability found in real-world crashes. We apply hierarchical agglomerative cluster analysis to define similarities in a set of crash data variables, these clusters can then be used as the basis in test scenario development. Out of 13 clusters, nine test scenarios are derived, corresponding to crashes characterised by: drivers drifting off the road in daytime and night-time, high speed departures, high-angle departures on narrow roads, highways, snowy roads, loss-of-control on wet roadways, sharp curves, and high speeds on roads with severe road surface conditions. In addition, each cluster was analysed with respect to crash variables related to the crash cause and reason for the unintended lane departure. The study shows that cluster analysis of representative data provides a statistically based method to identify relevant properties for run-off-road test scenarios. This was done to support development of vehicle-based run-off-road countermeasures and driver behaviour models used in virtual testing. Future studies should use driver behaviour from naturalistic driving data to further define how test-scenarios and behavioural causation mechanisms should be included. Copyright © 2018 Elsevier Ltd. All rights reserved.
ClusCo: clustering and comparison of protein models.
Jamroz, Michal; Kolinski, Andrzej
2013-02-22
The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
Cognitive Clusters in Specific Learning Disorder.
Poletti, Michele; Carretta, Elisa; Bonvicini, Laura; Giorgi-Rossi, Paolo
The heterogeneity among children with learning disabilities still represents a barrier and a challenge in their conceptualization. Although a dimensional approach has been gaining support, the categorical approach is still the most adopted, as in the recent fifth edition of the Diagnostic and Statistical Manual of Mental Disorders. The introduction of the single overarching diagnostic category of specific learning disorder (SLD) could underemphasize interindividual clinical differences regarding intracategory cognitive functioning and learning proficiency, according to current models of multiple cognitive deficits at the basis of neurodevelopmental disorders. The characterization of specific cognitive profiles associated with an already manifest SLD could help identify possible early cognitive markers of SLD risk and distinct trajectories of atypical cognitive development leading to SLD. In this perspective, we applied a cluster analysis to identify groups of children with a Diagnostic and Statistical Manual-based diagnosis of SLD with similar cognitive profiles and to describe the association between clusters and SLD subtypes. A sample of 205 children with a diagnosis of SLD were enrolled. Cluster analyses (agglomerative hierarchical and nonhierarchical iterative clustering technique) were used successively on 10 core subtests of the Wechsler Intelligence Scale for Children-Fourth Edition. The 4-cluster solution was adopted, and external validation found differences in terms of SLD subtype frequencies and learning proficiency among clusters. Clinical implications of these findings are discussed, tracing directions for further studies.
Stojanovic, Gordana S; Jovanović, Snežana C; Zlatković, Bojan K
2015-06-01
The present study is engaged in the chemical composition of methanol extracts of Sedum taxa from the central part of the Balkan Peninsula, and representatives from other genera of Crassulaceae (Crassula, Echeveria and Kalanchoe) considered as out-groups. The chemical composition of extracts was determined by HPLC analysis, according to retention time of standards and characteristic absorption spectra of components. Identified components were considered as original variables with possible chemotaxonomic significance. Relationships of examined plant samples were investigated by agglomerative hierarchical cluster analysis (AHC). The obtained results showed how the distribution of methanol extract components (mostly phenolics) affected grouping of the examined samples. The obtained clustering showed satisfactory grouping of the examined samples, among which some representatives of the Sedum series, Rupestria and Magellensia, are the most remote. The out-group samples were not clearly singled out with regard to Sedum samples as expected; this especially applies to samples of Crassula ovata and Echeveria lilacina, while Kalanchoe daigremontiana was more separated from most of the Sedum samples.
Spatial assessment of air quality patterns in Malaysia using multivariate analysis
NASA Astrophysics Data System (ADS)
Dominick, Doreena; Juahir, Hafizan; Latif, Mohd Talib; Zain, Sharifuddin M.; Aris, Ahmad Zaharin
2012-12-01
This study aims to investigate possible sources of air pollutants and the spatial patterns within the eight selected Malaysian air monitoring stations based on a two-year database (2008-2009). The multivariate analysis was applied on the dataset. It incorporated Hierarchical Agglomerative Cluster Analysis (HACA) to access the spatial patterns, Principal Component Analysis (PCA) to determine the major sources of the air pollution and Multiple Linear Regression (MLR) to assess the percentage contribution of each air pollutant. The HACA results grouped the eight monitoring stations into three different clusters, based on the characteristics of the air pollutants and meteorological parameters. The PCA analysis showed that the major sources of air pollution were emissions from motor vehicles, aircraft, industries and areas of high population density. The MLR analysis demonstrated that the main pollutant contributing to variability in the Air Pollutant Index (API) at all stations was particulate matter with a diameter of less than 10 μm (PM10). Further MLR analysis showed that the main air pollutant influencing the high concentration of PM10 was carbon monoxide (CO). This was due to combustion processes, particularly originating from motor vehicles. Meteorological factors such as ambient temperature, wind speed and humidity were also noted to influence the concentration of PM10.
Deckersbach, Thilo; Peters, Amy T.; Sylvia, Louisa G.; Gold, Alexandra K.; da Silva Magalhaes, Pedro Vieira; Henry, David B.; Frank, Ellen; Otto, Michael W.; Berk, Michael; Dougherty, Darin D.; Nierenberg, Andrew A.; Miklowitz, David J.
2016-01-01
Background We sought to address how predictors and moderators of psychotherapy for bipolar depression – identified individually in prior analyses – can inform the development of a metric for prospectively classifying treatment outcome in intensive psychotherapy (IP) versus collaborative care (CC) adjunctive to pharmacotherapy in the Systematic Treatment Enhancement Program (STEP-BD) study. Methods We conducted post-hoc analyses on 135 STEP-BD participants using cluster analysis to identify subsets of participants with similar clinical profiles and investigated this combined metric as a moderator and predictor of response to IP. We used agglomerative hierarchical cluster analyses and k-means clustering to determine the content of the clinical profiles. Logistic regression and Cox proportional hazard models were used to evaluate whether the resulting clusters predicted or moderated likelihood of recovery or time until recovery. Results The cluster analysis yielded a two-cluster solution: 1) “less-recurrent/severe” and 2) “chronic/recurrent.” Rates of recovery in IP were similar for less-recurrent/severe and chronic/recurrent participants. Less-recurrent/severe patients were more likely than chronic/recurrent patients to achieve recovery in CC (p = .040, OR = 4.56). IP yielded a faster recovery for chronic/recurrent participants, whereas CC led to recovery sooner in the less-recurrent/severe cluster (p = .034, OR = 2.62). Limitations Cluster analyses require list-wise deletion of cases with missing data so we were unable to conduct analyses on all STEP-BD participants. Conclusions A well-powered, parametric approach can distinguish patients based on illness history and provide clinicians with symptom profiles of patients that confer differential prognosis in CC vs. IP. PMID:27289316
Multi-documents summarization based on clustering of learning object using hierarchical clustering
NASA Astrophysics Data System (ADS)
Mustamiin, M.; Budi, I.; Santoso, H. B.
2018-03-01
The Open Educational Resources (OER) is a portal of teaching, learning and research resources that is available in public domain and freely accessible. Learning contents or Learning Objects (LO) are granular and can be reused for constructing new learning materials. LO ontology-based searching techniques can be used to search for LO in the Indonesia OER. In this research, LO from search results are used as an ingredient to create new learning materials according to the topic searched by users. Summarizing-based grouping of LO use Hierarchical Agglomerative Clustering (HAC) with the dependency context to the user’s query which has an average value F-Measure of 0.487, while summarizing by K-Means F-Measure only has an average value of 0.336.
NASA Astrophysics Data System (ADS)
Guo, Lei; Safi, Zaki S.; Kaya, Savas; Shi, Wei; Tüzün, Burak; Altunay, Nail; Kaya, Cemal
2018-05-01
It is known that iron is one of the most widely used metals in industrial production. In this work, the inhibition performances of three thiophene derivatives on the corrosion of iron were investigated in the light of several theoretical approaches. In the section including DFT calculations, several global reactivity descriptors such as EHOMO, ELUMO, ionization energy (I), electron affinity (A), HOMO-LUMO energy gap (ΔE), chemical hardness (η), softness (σ), as well as local reactivity descriptors like Fukui indices, local softness, and local electrophilicity were considered and discussed. The adsorption behaviors of considered thiophene derivatives on Fe(110) surface were investigated using molecular dynamics simulation approach. To determine the most active corrosion inhibitor among studied thiophene derivatives, we used the principle component analysis (PCA) and agglomerative hierarchical cluster analysis (AHCA). Accordingly, all data obtained using various theoretical calculation techniques are consistent with experiments.
Mechanical properties of experimental composites with different calcium phosphates fillers.
Okulus, Zuzanna; Voelkel, Adam
2017-09-01
Calcium phosphates (CaPs)-containing composites have already shown good properties from the point of view of dental restorative materials. The purpose of this study was to examine the crucial mechanical properties of twelve hydroxyapatite- or tricalcium phosphate-filled composites. The raw and surface-treated forms of both CaP fillers were applied. As a reference materials two experimental glass-containing composites and one commercial dental restorative composite were applied. Nano-hardness, elastic modulus, compressive, flexural and diametral tensile strength of all studied materials were determined. Application of statistical methods (one-way analysis of variance and cluster agglomerative analysis) allowed for assessing the similarities between examined materials according to the values of studied parameters. The obtained results show that in almost all cases the mechanical properties of experimental CaPs-composites are comparable or even better than mechanical properties of examined reference materials. Copyright © 2017 Elsevier B.V. All rights reserved.
Syazwan, AI; Rafee, B Mohd; Juahir, Hafizan; Azman, AZF; Nizar, AM; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, AA; Yunos, MA Syafiq; Anita, AR; Hanafiah, J Muhamad; Shaharuddin, MS; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, MN Mohamad; Azizan, HS; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, FT
2012-01-01
Purpose To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. Design A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. Method A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Result Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. Conclusion This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures. PMID:23055779
Syazwan, Ai; Rafee, B Mohd; Juahir, Hafizan; Azman, Azf; Nizar, Am; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, Aa; Yunos, Ma Syafiq; Anita, Ar; Hanafiah, J Muhamad; Shaharuddin, Ms; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, Mn Mohamad; Azizan, Hs; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, Ft
2012-01-01
To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures.
Monitoring the sensory quality of canned white asparagus through cluster analysis.
Arana, Inés; Ibañez, Francisco C; Torre, Paloma
2016-05-01
White asparagus is one of the 30 vegetables most consumed in the world. This paper unifies the stages of their sensory quality control. The aims of this work were to describe the sensory properties of canned white asparagus and their quality control and to evaluate the applicability of agglomerative hierarchical clustering (AHC) for classifying and monitoring the sensory quality of manufacturers. Sixteen sensory descriptors and their evaluation technique were defined. The sensory profile of canned white asparagus was high flavor characteristic, little acidity and bitterness, medium firmness and very light fibrosity, among other characteristics. The dendrogram established groups of manufacturers that had similar scores in the same set of descriptors, and each cluster grouped the manufacturers that had a similar quality profile. The sensory profile of canned white asparagus was clearly defined through the intensity evaluation of 16 descriptors, and the sensory quality report provided to the manufacturers is in detail and of easy interpretation. AHC grouped the manufacturers according to the highest quality scores in certain descriptors and is a useful tool because it is very visual. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.
Low Back Pain Subgroups using Fear-Avoidance Model Measures: Results of a Cluster Analysis
Beneciuk, Jason M.; Robinson, Michael E.; George, Steven Z.
2012-01-01
Objectives The purpose of this secondary analysis was to test the hypothesis that an empirically derived psychological subgrouping scheme based on multiple Fear-Avoidance Model (FAM) constructs would provide additional capabilities for clinical outcomes in comparison to a single FAM construct. Methods Patients (n = 108) with acute or sub-acute low back pain (LBP) enrolled in a clinical trial comparing behavioral physical therapy interventions to classification based physical therapy completed baseline questionnaires for pain catastrophizing (PCS), fear-avoidance beliefs (FABQ-PA, FABQ-W), and patient-specific fear (FDAQ). Clinical outcomes were pain intensity and disability measured at baseline, 4-weeks, and 6-months. A hierarchical agglomerative cluster analysis was used to create distinct cluster profiles among FAM measures and discriminant analysis was used to interpret clusters. Changes in clinical outcomes were investigated with repeated measures ANOVA and differences in results based on cluster membership were compared to FABQ-PA subgrouping used in the original trial. Results Three distinct FAM subgroups (Low Risk, High Specific Fear, and High Fear & Catastrophizing) emerged from cluster analysis. Subgroups differed on baseline pain and disability (p’s<.01) with the High Fear & Catastrophizing subgroup associated with greater pain than the Low Risk subgroup (p<.01) and the greatest disability (p’s<.05). Subgroup × time interactions were detected for both pain and disability (p’s<.05) with the High Fear & Catastrophizing subgroup reporting greater changes in pain and disability than other subgroups (p’s<.05). In contrast, FABQ-PA subgroups used in the original trial were not associated with interactions for clinical outcomes. Discussion These data suggest that subgrouping based on multiple FAM measures may provide additional information on clinical outcomes in comparison to determining subgroup status by FABQ-PA alone. Subgrouping methods for patients with LBP should include multiple psychological factors to further explore if patients can be matched with appropriate interventions. PMID:22510537
Value-based customer grouping from large retail data sets
NASA Astrophysics Data System (ADS)
Strehl, Alexander; Ghosh, Joydeep
2000-04-01
In this paper, we propose OPOSSUM, a novel similarity-based clustering algorithm using constrained, weighted graph- partitioning. Instead of binary presence or absence of products in a market-basket, we use an extended 'revenue per product' measure to better account for management objectives. Typically the number of clusters desired in a database marketing application is only in the teens or less. OPOSSUM proceeds top-down, which is more efficient and takes a small number of steps to attain the desired number of clusters as compared to bottom-up agglomerative clustering approaches. OPOSSUM delivers clusters that are balanced in terms of either customers (samples) or revenue (value). To facilitate data exploration and validation of results we introduce CLUSION, a visualization toolkit for high-dimensional clustering problems. To enable closed loop deployment of the algorithm, OPOSSUM has no user-specified parameters. Thresholding heuristics are avoided and the optimal number of clusters is automatically determined by a search for maximum performance. Results are presented on a real retail industry data-set of several thousand customers and products, to demonstrate the power of the proposed technique.
Chemical Polymorphism of Essential Oils of Artemisia vulgaris Growing Wild in Lithuania.
Judzentiene, Asta; Budiene, Jurga
2018-02-01
Compositional variability of mugwort (Artemisia vulgaris L.) essential oils has been investigated in the study. Plant material (over ground parts at full flowering stage) was collected from forty-four wild populations in Lithuania. The oils from aerial parts were obtained by hydrodistillation and analyzed by GC(FID) and GC/MS. In total, up to 111 components were determined in the oils. As the major constituents were found: sabinene, 1,8-cineole, artemisia ketone, both thujone isomers, camphor, cis-chrysanthenyl acetate, davanone and davanone B. The compositional data were subjected to statistical analysis. The application of PCA (Principal Component Analysis) and AHC (Agglomerative Hierarchical Clustering) allowed grouping the oils into six clusters. AHC permitted to distinguish an artemisia ketone chemotype, which, to the best of our knowledge, is very scarce. Additionally, two rare cis-chrysanthenyl acetate and sabinene oil types were determined for the plants growing in Lithuania. Besides, davanone was found for the first time as a principal component in mugwort oils. The performed study revealed significant chemical polymorphism of essential oils in mugwort plants native to Lithuania; it has expanded our chemotaxonomic knowledge both of A. vulgaris species and Artemisia genus. © 2018 Wiley-VHCA AG, Zurich, Switzerland.
Random whole metagenomic sequencing for forensic discrimination of soils.
Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian
2014-01-01
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Ecological characteristics of Simulium breeding sites in West Africa.
Cheke, Robert A; Young, Stephen; Garms, Rolf
2017-03-01
Twenty-nine taxa of Simulium were identified amongst 527 collections of larvae and pupae from untreated rivers and streams in Liberia (362 collections in 1967-71 & 1989), Togo (125 in 1979-81), Benin (35 in 1979-81) and Ghana (5 in 1980-81). Presence or absence of associations between different taxa were used to group them into six clusters using Ward agglomerative hierarchical cluster analysis. Environmental data associated with the pre-imaginal habitats were then analysed in relation to the six clusters by one way ANOVA. The results revealed significant effects in determining the clusters of maximum river width (all P<0.001 unless stated otherwise), water temperature, dry bulb air temperature, relative humidity, altitude, type of water (on a range from trickle to large river), water level, slope, current, vegetation, light conditions, discharge, length of breeding area, environs, terrain, river bed type (P<0.01), and the supports to which the insects were attached (P<0.01). When four non-significant contributors (wet bulb temperature, river features, height of waterfall and depth) were excluded and the reduced data-set analysed by principal components analysis (PCA), the first two principal components (PCs) accounted for 87% of the variance, with geographical features dominant in PC1 and hydrological characteristics in PC2. The analyses also revealed the ecological characteristics of each taxon's pre-imaginal habitats, which are discussed with particular reference to members of the Simulium damnosum species complex, whose breeding site distributions were further analysed by canonical correspondence analysis (CCA), a method also applied to the data on non-vector species. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kamer, Yavor; Ouillon, Guy; Sornette, Didier; Wössner, Jochen
2014-05-01
We present applications of a new clustering method for fault network reconstruction based on the spatial distribution of seismicity. Unlike common approaches that start from the simplest large scale and gradually increase the complexity trying to explain the small scales, our method uses a bottom-up approach, by an initial sampling of the small scales and then reducing the complexity. The new approach also exploits the location uncertainty associated with each event in order to obtain a more accurate representation of the spatial probability distribution of the seismicity. For a given dataset, we first construct an agglomerative hierarchical cluster (AHC) tree based on Ward's minimum variance linkage. Such a tree starts out with one cluster and progressively branches out into an increasing number of clusters. To atomize the structure into its constitutive protoclusters, we initialize a Gaussian Mixture Modeling (GMM) at a given level of the hierarchical clustering tree. We then let the GMM converge using an Expectation Maximization (EM) algorithm. The kernels that become ill defined (less than 4 points) at the end of the EM are discarded. By incrementing the number of initialization clusters (by atomizing at increasingly populated levels of the AHC tree) and repeating the procedure above, we are able to determine the maximum number of Gaussian kernels the structure can hold. The kernels in this configuration constitute our protoclusters. In this setting, merging of any pair will lessen the likelihood (calculated over the pdf of the kernels) but in turn will reduce the model's complexity. The information loss/gain of any possible merging can thus be quantified based on the Minimum Description Length (MDL) principle. Similar to an inter-distance matrix, where the matrix element di,j gives the distance between points i and j, we can construct a MDL gain/loss matrix where mi,j gives the information gain/loss resulting from the merging of kernels i and j. Based on this matrix, merging events resulting in MDL gain are performed in descending order until no gainful merging is possible anymore. We envision that the results of this study could lead to a better understanding of the complex interactions within the Californian fault system and hopefully use the acquired insights for earthquake forecasting.
NASA Astrophysics Data System (ADS)
Munandar, T. A.; Azhari; Mushdholifah, A.; Arsyad, L.
2017-03-01
Disparities in regional development methods are commonly identified using the Klassen Typology and Location Quotient. Both methods typically use the data on the gross regional domestic product (GRDP) sectors of a particular region. The Klassen approach can identify regional disparities by classifying the GRDP sector data into four classes, namely Quadrants I, II, III, and IV. Each quadrant indicates a certain level of regional disparities based on the GRDP sector value of the said region. Meanwhile, the Location Quotient (LQ) is usually used to identify potential sectors in a particular region so as to determine which sectors are potential and which ones are not potential. LQ classifies each sector into three classes namely, the basic sector, the non-basic sector with a competitive advantage, and the non-basic sector which can only meet its own necessities. Both Klassen Typology and LQ are unable to visualize the relationship of achievements in the development clearly of each region and sector. This research aimed to develop a new approach to the identification of disparities in regional development in the form of hierarchical clustering. The method of Hierarchical Agglomerative Clustering (HAC) was employed as the basis of the hierarchical clustering model for identifying disparities in regional development. Modifications were made to HAC using the Klassen Typology and LQ. Then, HAC which had been modified using the Klassen Typology was called MHACK while HAC which had been modified using LQ was called MACLoQ. Both algorithms can be used to identify regional disparities (MHACK) and potential sectors (MACLoQ), respectively, in the form of hierarchical clusters. Based on the MHACK in 31 regencies in Central Java Province, it is identified that 3 regencies (Demak, Jepara, and Magelang City) fall into the category of developed and rapidly-growing regions, while the other 28 regencies fall into the category of developed but depressed regions. Results of the MACLoQ implementation suggest that there is only 1 regency which falls into the basic-sector category (Banyumas), while the other regencies fall into the non-basic non-competitive sector category.
NASA Astrophysics Data System (ADS)
Su, Lihong
In remote sensing communities, support vector machine (SVM) learning has recently received increasing attention. SVM learning usually requires large memory and enormous amounts of computation time on large training sets. According to SVM algorithms, the SVM classification decision function is fully determined by support vectors, which compose a subset of the training sets. In this regard, a solution to optimize SVM learning is to efficiently reduce training sets. In this paper, a data reduction method based on agglomerative hierarchical clustering is proposed to obtain smaller training sets for SVM learning. Using a multiple angle remote sensing dataset of a semi-arid region, the effectiveness of the proposed method is evaluated by classification experiments with a series of reduced training sets. The experiments show that there is no loss of SVM accuracy when the original training set is reduced to 34% using the proposed approach. Maximum likelihood classification (MLC) also is applied on the reduced training sets. The results show that MLC can also maintain the classification accuracy. This implies that the most informative data instances can be retained by this approach.
Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C; Morris, John H; Ferrin, Thomas E; Poole, Leslie B; Fetrow, Jacquelyn S
2017-02-01
Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
Babbitt, Patricia C.; Ferrin, Thomas E.
2017-01-01
Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially—MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method’s novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences. PMID:28187133
Yang, Albert C.; Tsai, Shih-Jen; Hong, Chen-Jee; Wang, Cynthia; Chen, Tai-Jui; Liou, Ying-Jay; Peng, Chung-Kang
2011-01-01
Background Genetic polymorphisms in the gene encoding the β-adrenergic receptors (β-AR) have a pivotal role in the functions of the autonomic nervous system. Using heart rate variability (HRV) as an indicator of autonomic function, we present a bottom-up genotype–phenotype analysis to investigate the association between β-AR gene polymorphisms and heart rate dynamics. Methods A total of 221 healthy Han Chinese adults (59 males and 162 females, aged 33.6±10.8 years, range 19 to 63 years) were recruited and genotyped for three common β-AR polymorphisms: β1-AR Ser49Gly, β2-AR Arg16Gly and β2-AR Gln27Glu. Each subject underwent two hours of electrocardiogram monitoring at rest. We applied an information-based similarity (IBS) index to measure the pairwise dissimilarity of heart rate dynamics among study subjects. Results With the aid of agglomerative hierarchical cluster analysis, we categorized subjects into major clusters, which were found to have significantly different distributions of β2-AR Arg16Gly genotype. Furthermore, the non-randomness index, a nonlinear HRV measure derived from the IBS method, was significantly lower in Arg16 homozygotes than in Gly16 carriers. The non-randomness index was negatively correlated with parasympathetic-related HRV variables and positively correlated with those HRV indices reflecting a sympathovagal shift toward sympathetic activity. Conclusions We demonstrate a bottom-up categorization approach combining the IBS method and hierarchical cluster analysis to detect subgroups of subjects with HRV phenotypes associated with β-AR polymorphisms. Our results provide evidence that β2-AR polymorphisms are significantly associated with the acceleration/deceleration pattern of heart rate oscillation, reflecting the underlying mode of autonomic nervous system control. PMID:21573230
Jonsson, Anders; Bonander, Carl; Nilson, Finn; Huss, Fredrik
2017-09-01
Residential fires represent the largest category of fatal fires in Sweden. The purpose of this study was to describe the epidemiology of fatal residential fires in Sweden and to identify clusters of events. Data was collected from a database that combines information on fatal fires with data from forensic examinations and the Swedish Cause of Death-register. Mortality rates were calculated for different strata using population statistics and rescue service turnout reports. Cluster analysis was performed using multiple correspondence analysis with agglomerative hierarchical clustering. Male sex, old age, smoking, and alcohol were identified as risk factors, and the most common primary injury diagnosis was exposure to toxic gases. Compared to non-fatal fires, fatal residential fires more often originated in the bedroom, were more often caused by smoking, and were more likely to occur at night. Six clusters were identified. The first two clusters were both smoking-related, but were separated into (1) fatalities that often involved elderly people, usually female, whose clothes were ignited (17% of the sample), (2) middle-aged (45-64years old), (often) intoxicated men, where the fire usually originated in furniture (30%). Other clusters that were identified in the analysis were related to (3) fires caused by technical fault, started in electrical installations in single houses (13%), (4) cooking appliances left on (8%), (5) events with unknown cause, room and object of origin (25%), and (6) deliberately set fires (7%). Fatal residential fires were unevenly distributed in the Swedish population. To further reduce the incidence of fire mortality, specialized prevention efforts that focus on the different needs of each cluster are required. Cooperation between various societal functions, e.g. rescue services, elderly care, psychiatric clinics and other social services, with an application of both human and technological interventions, should reduce residential fire mortality in Sweden. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
MSClique: Multiple Structure Discovery through the Maximum Weighted Clique Problem.
Sanroma, Gerard; Penate-Sanchez, Adrian; Alquézar, René; Serratosa, Francesc; Moreno-Noguer, Francesc; Andrade-Cetto, Juan; González Ballester, Miguel Ángel
2016-01-01
We present a novel approach for feature correspondence and multiple structure discovery in computer vision. In contrast to existing methods, we exploit the fact that point-sets on the same structure usually lie close to each other, thus forming clusters in the image. Given a pair of input images, we initially extract points of interest and extract hierarchical representations by agglomerative clustering. We use the maximum weighted clique problem to find the set of corresponding clusters with maximum number of inliers representing the multiple structures at the correct scales. Our method is parameter-free and only needs two sets of points along with their tentative correspondences, thus being extremely easy to use. We demonstrate the effectiveness of our method in multiple-structure fitting experiments in both publicly available and in-house datasets. As shown in the experiments, our approach finds a higher number of structures containing fewer outliers compared to state-of-the-art methods.
2011-01-01
Background Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. Methods We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity) we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. Results More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7). The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. Conclusions The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients should be investigated by future studies. Our findings have particular relevance for falls prevention strategies, clinical practice and planning of follow-up services for these patients. PMID:21851627
Vu, Trang; Finch, Caroline F; Day, Lesley
2011-08-18
Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity) we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7). The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients should be investigated by future studies. Our findings have particular relevance for falls prevention strategies, clinical practice and planning of follow-up services for these patients.
Pipelining Architecture of Indexing Using Agglomerative Clustering
NASA Astrophysics Data System (ADS)
Goyal, Deepika; Goyal, Deepti; Gupta, Parul
2010-11-01
The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.
Lopez-Meyer, Paulo; Schuckers, Stephanie; Makeyev, Oleksandr; Fontana, Juan M; Sazonov, Edward
2012-09-01
The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy >95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions.
Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z
2006-08-01
This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.
Bialosky, Joel E.; Robinson, Michael E.
2014-01-01
Background Cluster analysis can be used to identify individuals similar in profile based on response to multiple pain sensitivity measures. There are limited investigations into how empirically derived pain sensitivity subgroups influence clinical outcomes for individuals with spine pain. Objective The purposes of this study were: (1) to investigate empirically derived subgroups based on pressure and thermal pain sensitivity in individuals with spine pain and (2) to examine subgroup influence on 2-week clinical pain intensity and disability outcomes. Design A secondary analysis of data from 2 randomized trials was conducted. Methods Baseline and 2-week outcome data from 157 participants with low back pain (n=110) and neck pain (n=47) were examined. Participants completed demographic, psychological, and clinical information and were assessed using pain sensitivity protocols, including pressure (suprathreshold pressure pain) and thermal pain sensitivity (thermal heat threshold and tolerance, suprathreshold heat pain, temporal summation). A hierarchical agglomerative cluster analysis was used to create subgroups based on pain sensitivity responses. Differences in data for baseline variables, clinical pain intensity, and disability were examined. Results Three pain sensitivity cluster groups were derived: low pain sensitivity, high thermal static sensitivity, and high pressure and thermal dynamic sensitivity. There were differences in the proportion of individuals meeting a 30% change in pain intensity, where fewer individuals within the high pressure and thermal dynamic sensitivity group (adjusted odds ratio=0.3; 95% confidence interval=0.1, 0.8) achieved successful outcomes. Limitations Only 2-week outcomes are reported. Conclusions Distinct pain sensitivity cluster groups for individuals with spine pain were identified, with the high pressure and thermal dynamic sensitivity group showing worse clinical outcome for pain intensity. Future studies should aim to confirm these findings. PMID:24764070
Needle Terpenes as Chemotaxonomic Markers in Pinus: Subsections Pinus and Pinaster.
Mitić, Zorica S; Jovanović, Snežana Č; Zlatković, Bojan K; Nikolić, Biljana M; Stojanović, Gordana S; Marin, Petar D
2017-05-01
Chemical compositions of needle essential oils of 27 taxa from the section Pinus, including 20 and 7 taxa of the subsections Pinus and Pinaster, respectively, were compared in order to determine chemotaxonomic significance of terpenes at infrageneric level. According to analysis of variance, six out of 31 studied terpene characters were characterized by a high level of significance, indicating statistically significant difference between the examined subsections. Agglomerative hierarchical cluster analysis has shown separation of eight groups, where representatives of subsect. Pinaster were distributed within the first seven groups on the dendrogram together with P. nigra subsp. laricio and P. merkusii from the subsect. Pinus. On the other hand, the eighth group included the majority of the members of subsect. Pinus. Our findings, based on terpene characters, complement those obtained from morphological, biochemical, and molecular parameters studied over the past two decades. In addition, results presented in this article confirmed that terpenes are good markers at infrageneric level. © 2017 Wiley-VHCA AG, Zurich, Switzerland.
Correlation between the pattern volatiles and the overall aroma of wild edible mushrooms.
de Pinho, P Guedes; Ribeiro, Bárbara; Gonçalves, Rui F; Baptista, Paula; Valentão, Patrícia; Seabra, Rosa M; Andrade, Paula B
2008-03-12
Volatile and semivolatile components of 11 wild edible mushrooms, Suillus bellini, Suillus luteus, Suillus granulatus, Tricholomopsis rutilans, Hygrophorus agathosmus, Amanita rubescens, Russula cyanoxantha, Boletus edulis, Tricholoma equestre, Fistulina hepatica, and Cantharellus cibarius, were determined by headspace solid-phase microextraction (HS-SPME) and by liquid extraction combined with gas chromatography-mass spectrometry (GC-MS). Fifty volatiles and nonvolatiles components were formally identified and 13 others were tentatively identified. Using sensorial analysis, the descriptors "mushroomlike", "farm-feed", "floral", "honeylike", "hay-herb", and "nutty" were obtained. A correlation between sensory descriptors and volatiles was observed by applying multivariate analysis (principal component analysis and agglomerative hierarchic cluster analysis) to the sensorial and chemical data. The studied edible mushrooms can be divided in three groups. One of them is rich in C8 derivatives, such as 3-octanol, 1-octen-3-ol, trans-2-octen-1-ol, 3-octanone, and 1-octen-3-one; another one is rich in terpenic volatile compounds; and the last one is rich in methional. The presence and contents of these compounds give a considerable contribution to the sensory characteristics of the analyzed species.
Classifying the Basic Parameters of Ultraviolet Copper Bromide Laser
NASA Astrophysics Data System (ADS)
Gocheva-Ilieva, S. G.; Iliev, I. P.; Temelkov, K. A.; Vuchkov, N. K.; Sabotinov, N. V.
2009-10-01
The performance of deep ultraviolet copper bromide lasers is of great importance because of their applications in medicine, microbiology, high-precision processing of new materials, high-resolution laser lithography in microelectronics, high-density optical recording of information, laser-induced fluorescence in plasma and wide-gap semiconductors and more. In this paper we present a statistical study on the classification of 12 basic lasing parameters, by using different agglomerative methods of cluster analysis. The results are based on a big amount of experimental data for UV Cu+ Ne-CuBr laser with wavelengths 248.6 nm, 252.9 nm, 260.0 nm and 270.3 nm, obtained in Georgi Nadjakov Institute of Solid State Physics, Bulgarian Academy of Sciences. The relevant influence of parameters on laser generation is also evaluated. The results are applicable in computer modeling and planning the experiments and further laser development with improved output characteristics.
NASA Astrophysics Data System (ADS)
Witek, M.; van der Lee, S.; Kang, T. S.; Chang, S. J.; Ning, J.; Ning, S.
2017-12-01
We have measured Rayleigh wave group velocity dispersion curves from one year of station-pair cross-correlations of continuous vertical-component broadband data from 1082 seismic stations in regional networks across China, Korea, Taiwan, and Japan for the year 2011. We use the measurements to map local dispersion anomalies for periods in the range 6-40 s. We combined our ambient noise data set with the earthquake group velocity data set of Ma et al. (2014), and then applied agglomerative hierarchical clustering to the localized dispersion curves. We find that the dispersion curves naturally organize themselves into distinct tectonic regions. For our distribution of interstation distances, only 8 distinct regions need to be defined. Additional clusters reduce the overall data misfit by increasingly smaller amounts. The size and number of clusters needed to suitably predict the data may give an indication of the resolving power of the data set. The regions that emerge from the cluster analysis include Tibet, the Sea of Japan, the South China Block and the Korean peninsula, the Ordos and Yangtze cratons, and Mesozoic rift basins such as the Songliao, Bohai Bay and Ulleung basins. We also performed a traditional inversion for 3D S-velocity structure, and the resulting model fits the data as well as the 8-cluster model, while both models fit the earthquake data and ambient noise data better than the LITHO1.0 model of Pasyanos et al. (2014). Our 3D model of the crust and upper mantle confirms many of the features seen in previous studies of the region, most notably the lithospheric thinning going from west to east and low velocity zones in the crust on the Tibetan periphery. We conclude that cluster analysis is able to greatly reduce the dimensionality of surface wave dispersion data, in the sense that a data set of over half a million dispersion curves is sufficiently predicted by appropriately averaging over a relatively small set of distinct tectonic regions. The resulting clustered model objectively quantifies the more intuitive ways in which we usually tend to interpret tomographic models.
The Application of Data Mining Techniques to Create Promotion Strategy for Mobile Phone Shop
NASA Astrophysics Data System (ADS)
Khasanah, A. U.; Wibowo, K. S.; Dewantoro, H. F.
2017-12-01
The number of mobile shop is growing very fast in various regions in Indonesia including in Yogyakarta due to the increasing demand of mobile phone. This fact leads high competition among the mobile phone shops. In these conditions the mobile phone shop should have a good promotion strategy in order to survive in competition, especially for a small mobile phone shop. To create attractive promotion strategy, the companies/shops should know their customer segmentation and the buying pattern of their target market. These kind of analysis can be done using Data mining technique. This study aims to segment customer using Agglomerative Hierarchical Clustering and know customer buying pattern using Association Rule Mining. This result conducted in a mobile shop in Sleman Yogyakarta. The clustering result shows that the biggest customer segment of the shop was male university student who come on weekend and from association rule mining, it can be concluded that tempered glass and smart phone “x” as well as action camera and waterproof monopod and power bank have strong relationship. This results that used to create promotion strategies which are presented in the end of the study.
Nonredundant sparse feature extraction using autoencoders with receptive fields clustering.
Ayinde, Babajide O; Zurada, Jacek M
2017-09-01
This paper proposes new techniques for data representation in the context of deep learning using agglomerative clustering. Existing autoencoder-based data representation techniques tend to produce a number of encoding and decoding receptive fields of layered autoencoders that are duplicative, thereby leading to extraction of similar features, thus resulting in filtering redundancy. We propose a way to address this problem and show that such redundancy can be eliminated. This yields smaller networks and produces unique receptive fields that extract distinct features. It is also shown that autoencoders with nonnegativity constraints on weights are capable of extracting fewer redundant features than conventional sparse autoencoders. The concept is illustrated using conventional sparse autoencoder and nonnegativity-constrained autoencoders with MNIST digits recognition, NORB normalized-uniform object data and Yale face dataset. Copyright © 2017 Elsevier Ltd. All rights reserved.
Crossmaps: Visualization of overlapping relationships in collections of journal papers
Morris, Steven A.; Yen, Gary G.
2004-01-01
A crossmapping technique is introduced for visualizing multiple and overlapping relations among entity types in collections of journal articles. Groups of entities from two entity types are crossplotted to show correspondence of relations. For example, author collaboration groups are plotted on the x axis against groups of papers (research fronts) on the y axis. At the intersection of each pair of author group/research front pairs a circular symbol is plotted whose size is proportional to the number of times that authors in the group appear as authors in papers in the research front. Entity groups are found by agglomerative hierarchical clustering using conventional similarity measures. Crossmaps comprise a simple technique that is particularly suited to showing overlap in relations among entity groups. Particularly useful crossmaps are: research fronts against base reference clusters, research fronts against author collaboration groups, and research fronts against term co-occurrence clusters. When exploring the knowledge domain of a collection of journal papers, it is useful to have several crossmaps of different entity pairs, complemented by research front timelines and base reference cluster timelines. PMID:14762168
Lawrence, K E; Forsyth, S F; Vaatstra, B L; McFadden, Amj; Pulford, D J; Govindaraju, K; Pomroy, W E
2017-11-01
AIM To determine the most commonly used words in the clinical histories of animals naturally infected with Theileria orientalis Ikeda type; whether these words differed between cases categorised by age, farm type or haematocrit (HCT), and if there was any clustering of the common words in relation to these categories. METHODS Clinical histories were transcribed for 605 cases of bovine anaemia associated with T. orientalis (TABA), that were submitted to laboratories with blood samples which tested positive for T. orientalis Ikeda type infection by PCR analysis, between October 2012 and November 2014. χ 2 tests were used to determine whether the proportion of submissions for each word was similar across the categories of HCT (normal, moderate anaemia or severe anaemia), farm type (dairy or beef) and age (young or old). Correspondence analysis (CA) was carried out on a contingency table of the frequency of the 28 most commonly used history words, cross-tabulated by age categories (young, old or unknown). Agglomerative hierarchical clustering, using Ward's method, was then performed on the coordinates from the correspondence analysis. RESULTS The six most commonly used history words were jaundice (204/605), lethargic (162/605), pale mucous membranes (161/605), cow (151/605), anaemia (147/605), and off milk (115/605). The proportion of cases with some history words differed between categories of age, farm type and HCT. The cluster analysis indicated that the recorded history words were grouped in two main clusters. The first included the words weight loss, tachycardia, pale mucous membranes, anaemia, lethargic and thin, and was associated with adult (p<0.001), severe anaemia (p<0.001) and dairy (p<0.001). The second cluster included the words deaths, ill-thrift, calves, calf and diarrhoea, and was associated with young (p<0.001), normal HCT (p<0.001), beef (p<0.001) and moderate anaemia (p<0.001). CONCLUSIONS AND CLINICAL RELEVANCE Cluster analysis of words recorded in clinical histories submitted with blood samples from cases of TABA indicates that two potentially different disease syndromes were associated with T. orientalis Ikeda type infection. One was consistent with the affected cattle suffering from a severe regenerative extravascular haemolytic anaemia, the second displaying as ill thrift and diarrhoea, particularly in young beef cattle.
Saxena, Raghvendra; Chandra, Amaresh
2011-11-01
Transferability of sequence-tagged-sites (STS) markers was assessed for genetic relationships study among accessions of marvel grass (Dichanthium annulatum Forsk.). In total, 17 STS primers of Stylosanthes origin were tested for their reactivity with thirty accessions of Dichanthium annulatum. Of these, 14 (82.4%) reacted and a total 106 (84 polymorphic) bands were scored. The number of bands generated by individual primer pairs ranged from 4 to 11 with an average of 7.57 bands, whereas polymorphic bands ranged from 4 to 9 with an average of 6.0 bands accounts to an average polymorphism of 80.1%. Polymorphic information content (PIC) ranged from 0.222 to 0.499 and marker index (MI) from 1.33 to 4.49. Utilizing Dice coefficient of genetic similarity dendrogram was generated through un-weighted pairgroup method with arithmetic mean (UPGMA) algorithm. Further, clustering through sequential agglomerative hierarchical and nested (SAHN) method resulted three main clusters constituted all accessions except IGBANG-D-2. Though there was intermixing of few accessions of one agro-climatic region to another, largely groupings of accessions were with their regions of collections. Bootstrap analysis at 1000 scale also showed large number of nodes (11 to 17) having strong clustering (> 50). Thus, results demonstrate the utility of STS markers of Stylosanthes in studying the genetic relationships among accessions of Dichanthium.
NASA Astrophysics Data System (ADS)
Crawford, I.; Lloyd, G.; Bower, K. N.; Connolly, P. J.; Flynn, M. J.; Kaye, P. H.; Choularton, T. W.; Gallagher, M. W.
2015-09-01
The fluorescent nature of aerosol at a high Alpine site was studied using a wide-band integrated bioaerosol (WIBS-4) single particle multi-channel ultra violet-light induced fluorescence (UV-LIF) spectrometer. This was supported by comprehensive cloud microphysics and meteorological measurements with the aims of cataloguing concentrations of bio-fluorescent aerosols at this high altitude site and also investigating possible influences of UV-fluorescent particle types on cloud-aerosol processes. Analysis of background free tropospheric air masses, using a total aerosol inlet, showed there to be a minor but statistically insignificant increase in the fluorescent aerosol fraction during in-cloud cases compared to out of cloud cases. The size dependence of the fluorescent aerosol fraction showed the larger aerosol to be more likely to be fluorescent with 80 % of 10 μm particles being fluorescent. Whilst the fluorescent particles were in the minority (NFl/NAll = 0.27±0.19), a new hierarchical agglomerative cluster analysis approach, Crawford et al. (2015) revealed the majority of the fluorescent aerosol were likely to be representative of fluorescent mineral dust. A minor episodic contribution from a cluster likely to be representative of primary biological aerosol particles (PBAP) was also observed with a wintertime baseline concentration of 0.1±0.4 L-1. Given the low concentration of this cluster and the typically low ice active fraction of studied PBAP (e.g. pseudomonas syringae) we suggest that the contribution to the observed ice crystal concentration at this location is not significant during the wintertime.
Ryberg, Karen R.
2006-01-01
As a result of the Dakota Water Resources Act of 2000, the Bureau of Reclamation, U.S. Department of the Interior, identified eight water-supply alternatives (including a no-action alternative) to meet future water needs in portions of the Red River of the North (Red River) Basin. Of those alternatives, four include the interbasin transfer of water from the Missouri River Basin to the Red River Basin. Three of the interbasin transfer alternatives would use the McClusky Canal, located in central North Dakota, to transport the water. Therefore, the water quality of the McClusky Canal and the sources of its water, Lake Sakakawea and Audubon Lake, is of interest to water-quality stakeholders. The Bureau of Reclamation collected water-quality samples at 23 sites on Lake Sakakawea, Audubon Lake, and the McClusky Canal system from 1990 through 2003. Physical properties and water-quality constituents from these samples were summarized and analyzed by the U.S. Geological Survey using hierarchical agglomerative cluster analysis (HACA). HACA separated the samples into related clusters, or groups. These groups were examined for statistical significance and relation to structure of the McClusky Canal system. Statistically, the sample groupings found using HACA were significantly different from each other and appear to result from spatial and temporal water-quality differences corresponding with different sections of the canal and different operational conditions. Future operational changes of the canal system may justify additional water-quality sampling to characterize possible water-quality changes.
Demersal fish assemblages off southern New Zealand in relation to depth and temperature
NASA Astrophysics Data System (ADS)
Jacob, W.; McClatchie, S.; Probert, P. K.; Hurst, R. J.
1998-12-01
We examined the relationship between demersal fish assemblage and depth, temperature, latitude and longitude off southern New Zealand (46-54°S and 165-180°E) in water depths of 80-787 m. Catch weight data were analysed by two-way indicator analysis (TWIA), groupaverage agglomerative clustering (UPGMA) and Detrended Correspondence Analysis (DCA). The spatial pattern of demersal fish off southern New Zealand conforms to the concept of species groups or fish assemblages related to environmental gradients. Shallow-water assemblages were dominated by species from the families Gempylidae, Squalidae, Triakidae and Moridae, mainly represented by Thyrsites atun, Squalus acanthias, Galeorhinus australis, and Pseudophycis bachus. Deep water assemblages were dominated by Chimaeridae, Argentinidae, Merlucciidae and Macrouridae, mainly represented by Hydrolagus novaezelandiae, Argentina elongata, Macruronus novaezelandiae, and Lepidorhynchus denticulatus. Total catch weight was often dominated by Merlucciidae, Macrouridae and Gempylidae. Fish assemblages were related to discrete ranges of depth (< and >300 m) and temperature (< and >9.5°C), but the range of sediment types was too narrow to show any correlation.
Mitic, Violeta; Stankov Jovanovic, Vesna; Ilic, Marija; Jovanovic, Olga; Djordjevic, Aleksandra; Stojanovic, Gordana
2016-01-01
The chemical composition and in vitro antimicrobial activities of Dittrichia graveolens (L.) Greuter essential oil was studied. Moreover, using agglomerative hierarchical cluster (AHC) and principal component analyses (PCA), the interrelationships of the D. graveolens essential-oil profiles characterized so far (including the sample from this study) were investigated. To evaluate the chemical composition of the essential oil, GC-FID and GC/MS analyses were performed. Altogether, 54 compounds were identified, accounting for 92.9% of the total oil composition. The D. graveolens oil belongs to the monoterpenoid chemotype, with monoterpenoids comprising 87.4% of the totally identified compounds. The major components were borneol (43.6%) and bornyl acetate (38.3%). Multivariate analysis showed that the compounds borneol and bornyl acetate exerted the greatest influence on the spatial differences in the composition of the reported oils. The antimicrobial activity against five bacterial and one fungal strain was determined using a disk-diffusion assay. The studied essential oil was active only against Gram-positive bacteria. Copyright © 2016 Verlag Helvetica Chimica Acta AG, Zürich.
Interactive Machine Learning at Scale with CHISSL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arendt, Dustin L.; Grace, Emily A.; Volkova, Svitlana
We demonstrate CHISSL, a scalable client-server system for real-time interactive machine learning. Our system is capa- ble of incorporating user feedback incrementally and imme- diately without a structured or pre-defined prediction task. Computation is partitioned between a lightweight web-client and a heavyweight server. The server relies on representation learning and agglomerative clustering to learn a dendrogram, a hierarchical approximation of a representation space. The client uses only this dendrogram to incorporate user feedback into the model via transduction. Distances and predictions for each unlabeled instance are updated incrementally and deter- ministically, with O(n) space and time complexity. Our al- gorithmmore » is implemented in a functional prototype, designed to be easy to use by non-experts. The prototype organizes the large amounts of data into recommendations. This allows the user to interact with actual instances by dragging and drop- ping to provide feedback in an intuitive manner. We applied CHISSL to several domains including cyber, social media, and geo-temporal analysis.« less
NASA Astrophysics Data System (ADS)
Sarparandeh, Mohammadali; Hezarkhani, Ardeshir
2017-12-01
The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed methods and geological studies leads to finding some hidden information, and this approach has the best results compared to using only one of them.
Transport in the Subtropical Lowermost Stratosphere during CRYSTAL-FACE
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NO(y)) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations, and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and lastly by the extent of convective influence, potentially related to the latitude of convective injection [Dessler and Sherwuud, 2004]. We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and non-local events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, Elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NOy) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and last by the extent of convective influence, potentially related to the latitude of convective injection (Dessler and Sherwood, 2004). We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and nonlocal events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
Tošić, Snežana B; Mitić, Snežana S; Velimirović, Dragan S; Stojanović, Gordana S; Pavlović, Aleksandra N; Pecev-Marinković, Emilija T
2015-08-30
An inductively coupled plasma-optical emission spectrometry method for the speedy simultaneous detection of 19 elements in edible nuts (walnuts: Juglans nigra; almonds: Prunus dulcis; hazelnuts: Corylus avellana; Brazil nuts: Bertholletia excelsa; cashews: Anacardium occidentalle; pistachios: Pistacia vera; and peanuts: Arachis hypogaea) available on the Serbian markets, was optimized and validated through the selection of instrumental parameters and analytical lines free from spectral interference and with the lowest matrix effects. The analysed macro-elements were present in the following descending order: Na > Mg > Ca > K. Of all the trace elements, the tested samples showed the highest content of Fe. The micro-element Se was detected in all the samples of nuts. The toxic elements As, Cd and Pb were either not detected or the contents were below the limit of detection. One-way analysis of variance, Student's t-test, Tukey's HSD post hoc test and hierarchical agglomerative cluster analysis were applied in the statistical analysis of the results. Based on the detected content of analysed elements it can be concluded that nuts may be a good additional source of minerals as micronutrients. © 2014 Society of Chemical Industry.
Compulsive buying disorder clustering based on sex, age, onset and personality traits.
Granero, Roser; Fernández-Aranda, Fernando; Baño, Marta; Steward, Trevor; Mestre-Bach, Gemma; Del Pino-Gutiérrez, Amparo; Moragas, Laura; Mallorquí-Bagué, Núria; Aymamí, Neus; Goméz-Peña, Mónica; Tárrega, Salomé; Menchón, José M; Jiménez-Murcia, Susana
2016-07-01
In spite of the revived interest in compulsive buying disorder (CBD), its classification into the contemporary nosologic systems continues to be debated, and scarce studies have addressed heterogeneity in the clinical phenotype through methodologies based on a person-centered approach. To identify empirical clusters of CBD employing personality traits, as well as patients' sex, age and the age of CBD onset as indicators. An agglomerative hierarchical clustering method defining a combination of the Schwarz Bayesian Information Criterion and log-likelihood was used. Three clusters were identified in a sample of n=110 patients attending a specialized CBD unit a) "male compulsive buyers" reported the highest prevalence of comorbid gambling disorder and the lowest levels of reward dependence; b) "female low-dysfunctional" mainly included employed women, with the highest level of education, the oldest age of onset, the lowest scores in harm avoidance and the highest levels of persistence, self-directedness and cooperativeness; and c) "female highly-dysfunctional" with the youngest age of onset, the highest levels of comorbid psychopathology and harm avoidance, and the lowest score in self-directedness. Sociodemographic characteristics and personality traits can be used to determine CBD clusters which represent different clinical subtypes. These subtypes should be considered when developing assessment instruments, preventive programs and treatment interventions. Copyright © 2016 Elsevier Inc. All rights reserved.
Exploring Dance Movement Data Using Sequence Alignment Methods
Chavoshi, Seyed Hossein; De Baets, Bernard; Neutens, Tijs; De Tré, Guy; Van de Weghe, Nico
2015-01-01
Despite the abundance of research on knowledge discovery from moving object databases, only a limited number of studies have examined the interaction between moving point objects in space over time. This paper describes a novel approach for measuring similarity in the interaction between moving objects. The proposed approach consists of three steps. First, we transform movement data into sequences of successive qualitative relations based on the Qualitative Trajectory Calculus (QTC). Second, sequence alignment methods are applied to measure the similarity between movement sequences. Finally, movement sequences are grouped based on similarity by means of an agglomerative hierarchical clustering method. The applicability of this approach is tested using movement data from samba and tango dancers. PMID:26181435
A clustering approach applied to time-lapse ERT interpretation - Case study of Lascaux cave
NASA Astrophysics Data System (ADS)
Xu, Shan; Sirieix, Colette; Riss, Joëlle; Malaurent, Philippe
2017-09-01
The Lascaux cave, located in southwest France, is one of the most important prehistoric cave in the world that shows Paleolithic paintings. This study aims to characterize the structure of the weathered epikarst setting located above the cave using Time-Lapse Electrical Resistivity Tomography (ERT) combined with local hydrogeological and climatic environmental data. Twenty ERT profiles were carried out for two years and helped us to record the seasonal and spatial variations of the electrical resistivity of the hydraulic upstream area of the Lascaux cave. The 20 interpreted resistivity models were merged into a single synthetic model using a multidimensional statistical method (Hierarchical Agglomerative Clustering). The individual blocks from the synthetic model associated with a similar resistivity variability were gathered into 7 clusters. We combined the resistivity temporal variations with climatic and hydrogeological data to propose a geo-electrical model that relates to a conceptual geological model. We provide a geological interpretation for each cluster regarding epikarst features. The superficial clusters (no 1 & 2) are linked to effective rainfall and trees, probably a fractured limestone. Another two clusters (no 6 & 7) are linked to detrital formations (sand and clay respectively). The cluster 3 may correspond to a marly limestone that forms a non-permeable horizon. Finally, the electrical behavior of the last two clusters (no 4 & 5) is correlated with the variation of flow rate; they may be a privileged feed zone of the flow in the cave.
Clustering recommendations to compute agent reputation
NASA Astrophysics Data System (ADS)
Bedi, Punam; Kaur, Harmeet
2005-03-01
Traditional centralized approaches to security are difficult to apply to multi-agent systems which are used nowadays in e-commerce applications. Developing a notion of trust that is based on the reputation of an agent can provide a softer notion of security that is sufficient for many multi-agent applications. Our paper proposes a mechanism for computing reputation of the trustee agent for use by the trustier agent. The trustier agent computes the reputation based on its own experience as well as the experience the peer agents have with the trustee agents. The trustier agents intentionally interact with the peer agents to get their experience information in the form of recommendations. We have also considered the case of unintentional encounters between the referee agents and the trustee agent, which can be directly between them or indirectly through a set of interacting agents. The clustering is done to filter off the noise in the recommendations in the form of outliers. The trustier agent clusters the recommendations received from referee agents on the basis of the distances between recommendations using the hierarchical agglomerative method. The dendogram hence obtained is cut at the required similarity level which restricts the maximum distance between any two recommendations within a cluster. The cluster with maximum number of elements denotes the views of the majority of recommenders. The center of this cluster represents the reputation of the trustee agent which can be computed using c-means algorithm.
Advanced Treatment Monitoring for Olympic-Level Athletes Using Unsupervised Modeling Techniques
Siedlik, Jacob A.; Bergeron, Charles; Cooper, Michael; Emmons, Russell; Moreau, William; Nabhan, Dustin; Gallagher, Philip; Vardiman, John P.
2016-01-01
Context Analysis of injury and illness data collected at large international competitions provides the US Olympic Committee and the national governing bodies for each sport with information to best prepare for future competitions. Research in which authors have evaluated medical contacts to provide the expected level of medical care and sports medicine services at international competitions is limited. Objective To analyze the medical-contact data for athletes, staff, and coaches who participated in the 2011 Pan American Games in Guadalajara, Mexico, using unsupervised modeling techniques to identify underlying treatment patterns. Design Descriptive epidemiology study. Setting Pan American Games. Patients or Other Participants A total of 618 US athletes (337 males, 281 females) participated in the 2011 Pan American Games. Main Outcome Measure(s) Medical data were recorded from the injury-evaluation and injury-treatment forms used by clinicians assigned to the central US Olympic Committee Sport Medicine Clinic and satellite locations during the operational 17-day period of the 2011 Pan American Games. We used principal components analysis and agglomerative clustering algorithms to identify and define grouped modalities. Lift statistics were calculated for within-cluster subgroups. Results Principal component analyses identified 3 components, accounting for 72.3% of the variability in datasets. Plots of the principal components showed that individual contacts focused on 4 treatment clusters: massage, paired manipulation and mobilization, soft tissue therapy, and general medical. Conclusions Unsupervised modeling techniques were useful for visualizing complex treatment data and provided insights for improved treatment modeling in athletes. Given its ability to detect clinically relevant treatment pairings in large datasets, unsupervised modeling should be considered a feasible option for future analyses of medical-contact data from international competitions. PMID:26794628
NASA Astrophysics Data System (ADS)
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.
NASA Astrophysics Data System (ADS)
Rahman, Inayat Ur; Khan, Nasrullah; Ali, Kishwar
2017-04-01
An understory vegetation survey of the Pinus wallichiana-dominated temperate forests of Swat District was carried out to inspect the structure, composition and ecological associations of the forest vegetation. A quadrat method of sampling was used to record the floristic and phytosociological data necessary for the analysis using 300 quadrats of 10 × 10 m each. Some vegetation parameters viz. frequency and density for trees (overstory vegetation) as well as for the understory vegetation were recorded. The results revealed that in total, 92 species belonging to 77 different genera and 45 families existed in the area. The largest families were Asteraceae, Rosaceae and Lamiaceae with 12, ten and nine species, respectively. Ward's agglomerative cluster analysis for tree species resulted in three floristically and ecologically distinct community types along different topographic and soil variables. Importance value indices (IVI) were also calculated for understory vegetation and were subjected to ordination techniques, i.e. canonical correspondence analysis (CCA) and detrended correspondence analysis (DCA). DCA bi-plots for stands show that most of the stands were scattered around the centre of the DCA bi-plot, identified by two slightly scattered clusters. DCA for species bi-plot clearly identified three clusters of species revealing three types of understory communities in the study area. Results of the CCA were somewhat different from the DCA showing the impact of environmental variables on the understory species. CCA results reveal that three environmental variables, i.e. altitude, slope and P (mg/kg), have a strong influence on distribution of stands and species. Impact of tree species on the understory vegetation was also tested by CCA which showed that four tree species, i.e. P. wallichiana A.B. Jackson, Juglans regia Linn., Quercus dilatata Lindl. ex Royle and Cedrus deodara (Roxb. ex Lamb.) G. Don, have strong influences on associated understory vegetation. It is therefore concluded that Swat District has various microclimatic zones with suitable environmental variables to support distinct flora.
NASA Astrophysics Data System (ADS)
Massiot, Cécile; Townend, John; Nicol, Andrew; McNamara, David D.
2017-08-01
Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
N Liu; P Yu
2011-12-31
The objective of this study was to use molecular spectral analyses with the diffuse reflectance Fourier transform infrared spectroscopy (DRIFT) bioanlytical technique to study carbohydrate conformation features, molecular clustering and interrelationships in hull and seed among six barley cultivars (AC Metcalfe, CDC Dolly, McLeod, CDC Helgason, CDC Trey, CDC Cowboy), which had different degradation kinetics in rumen. The molecular structure spectral analyses in both hull and seed involved the fingerprint regions of ca. 1536-1484 cm{sup -1} (attributed mainly to aromatic lignin semicircle ring stretch), ca. 1293-1212 cm{sup -1} (attributed mainly to cellulosic compounds in the hull), ca. 1269-1217 cm{sup -1}more » (attributed mainly to cellulosic compound in the seeds), and ca. 1180-800 cm{sup -1} (attributed mainly to total CHO C-O stretching vibrations) together with an agglomerative hierarchical cluster (AHCA) and principal component spectral analyses (PCA). The results showed that the DRIFT technique plus AHCA and PCA molecular analyses were able to reveal carbohydrate conformation features and identify carbohydrate molecular structure differences in both hull and seeds among the barley varieties. The carbohydrate molecular spectral analyses at the region of ca. 1185-800 cm{sup -1} together with the AHCA and PCA were able to show that the barley seed inherent structures exhibited distinguishable differences among the barley varieties. CDC Helgason had differences from AC Metcalfe, MeLeod, CDC Cowboy and CDC Dolly in carbohydrate conformation in the seed. Clear molecular cluster classes could be distinguished and identified in AHCA analysis and the separate ellipses could be grouped in PCA analysis. But CDC Helgason had no distinguished differences from CDC Trey in carbohydrate conformation. These carbohydrate conformation/structure difference could partially explain why the varieties were different in digestive behaviors in animals. The molecular spectroscopy technique used in this study could also be used for other plant-based feed and food structure studies.« less
Object-Oriented Image Clustering Method Using UAS Photogrammetric Imagery
NASA Astrophysics Data System (ADS)
Lin, Y.; Larson, A.; Schultz-Fellenz, E. S.; Sussman, A. J.; Swanson, E.; Coppersmith, R.
2016-12-01
Unmanned Aerial Systems (UAS) have been used widely as an imaging modality to obtain remotely sensed multi-band surface imagery, and are growing in popularity due to their efficiency, ease of use, and affordability. Los Alamos National Laboratory (LANL) has employed the use of UAS for geologic site characterization and change detection studies at a variety of field sites. The deployed UAS equipped with a standard visible band camera to collect imagery datasets. Based on the imagery collected, we use deep sparse algorithmic processing to detect and discriminate subtle topographic features created or impacted by subsurface activities. In this work, we develop an object-oriented remote sensing imagery clustering method for land cover classification. To improve the clustering and segmentation accuracy, instead of using conventional pixel-based clustering methods, we integrate the spatial information from neighboring regions to create super-pixels to avoid salt-and-pepper noise and subsequent over-segmentation. To further improve robustness of our clustering method, we also incorporate a custom digital elevation model (DEM) dataset generated using a structure-from-motion (SfM) algorithm together with the red, green, and blue (RGB) band data for clustering. In particular, we first employ an agglomerative clustering to create an initial segmentation map, from where every object is treated as a single (new) pixel. Based on the new pixels obtained, we generate new features to implement another level of clustering. We employ our clustering method to the RGB+DEM datasets collected at the field site. Through binary clustering and multi-object clustering tests, we verify that our method can accurately separate vegetation from non-vegetation regions, and are also able to differentiate object features on the surface.
Seekamp, Erin; Cerveny, Lee K; McCreary, Allie
2011-09-01
Federal land management agencies, such as the USDA Forest Service, have expanded the role of recreation partners reflecting constrained growth in appropriations and broader societal trends towards civic environmental governance. Partnerships with individual volunteers, service groups, commercial outfitters, and other government agencies provide the USDA Forest Service with the resources necessary to complete projects and meet goals under fiscal constraints. Existing partnership typologies typically focus on collaborative or strategic alliances and highlight organizational dimensions (e.g., structure and process) defined by researchers. This paper presents a partner typology constructed from USDA Forest Service partnership practitioners' conceptualizations of 35 common partner types. Multidimensional scaling of data from unconstrained pile sorts identified 3 distinct cultural dimensions of recreation partners--specifically, partnership character, partner impact, and partner motivations--that represent institutional, individual, and socio-cultural cognitive domains. A hierarchical agglomerative cluster analysis provides further insight into the various domains of agency personnel's conceptualizations. While three dimensions with high reliability (RSQ = 0.83) and corresponding hierarchical clusters illustrate commonality between agency personnel's partnership suppositions, this study also reveals variance in personnel's familiarity and affinity for specific partnership types. This real-world perspective on partner types highlights that agency practitioners not only make strategic choices when selecting and cultivating partnerships to accomplish critical task, but also elect to work with partners for the primary purpose of providing public service and fostering land stewardship.
NASA Astrophysics Data System (ADS)
Seekamp, Erin; Cerveny, Lee K.; McCreary, Allie
2011-09-01
Federal land management agencies, such as the USDA Forest Service, have expanded the role of recreation partners reflecting constrained growth in appropriations and broader societal trends towards civic environmental governance. Partnerships with individual volunteers, service groups, commercial outfitters, and other government agencies provide the USDA Forest Service with the resources necessary to complete projects and meet goals under fiscal constraints. Existing partnership typologies typically focus on collaborative or strategic alliances and highlight organizational dimensions (e.g., structure and process) defined by researchers. This paper presents a partner typology constructed from USDA Forest Service partnership practitioners' conceptualizations of 35 common partner types. Multidimensional scaling of data from unconstrained pile sorts identified 3 distinct cultural dimensions of recreation partners—specifically, partnership character, partner impact, and partner motivations—that represent institutional, individual, and socio-cultural cognitive domains. A hierarchical agglomerative cluster analysis provides further insight into the various domains of agency personnel's conceptualizations. While three dimensions with high reliability (RSQ = 0.83) and corresponding hierarchical clusters illustrate commonality between agency personnel's partnership suppositions, this study also reveals variance in personnel's familiarity and affinity for specific partnership types. This real-world perspective on partner types highlights that agency practitioners not only make strategic choices when selecting and cultivating partnerships to accomplish critical task, but also elect to work with partners for the primary purpose of providing public service and fostering land stewardship.
NASA Astrophysics Data System (ADS)
Zhou, Y.; Fang, Z.
2017-09-01
There existing a significant social and spatial differentiation in the residential communities in urban city. People live in different places have different socioeconomic background, resulting in various geographically activity patterns. This paper aims to label the characteristics of residential communities in a city using collective activity patterns derived from taxi trip data. Specifically, we first present a method to allocate the O/D (Origin/Destination) points of taxi trips to the land use parcels where the activities taken place in. Then several indices are employed to describe the collective activity patterns, including both activity intensity, travel distance, travel time, and activity space of residents by taking account of the geographical distribution of all O/Ds of the taxi trip related to that residential community. Followed by that, an agglomerative hierarchical clustering algorithm is introduced to cluster the residential communities with similar activity patterns. In the case study of Wuhan, the residential communities are clearly divided into eight clusters, which could be labelled as ordinary communities, privileged communities, old isolated communities, suburban communities, and so on. In this paper, we provide a new perspective to label the land use under same type from people's mobility patterns with the support of big trajectory data.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-07-01
UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
NASA Astrophysics Data System (ADS)
Singh, Awnesh; Delcroix, Thierry
2013-12-01
One of the leading theories to explain the oscillatory nature of the El Niño Southern Oscillation is the recharge-discharge oscillator paradigm, which roots on warm waters exchanged between the equatorial and off-equatorial regions. This study tests the relevance of this theory to account for the Eastern and recently mediated Central Pacific El Niño events. The recharge-discharge of the equatorial Pacific, measured here as changes in Warm (>20 °C) Water Volume (WWV), is analysed using monthly 1993-2010 sea level anomaly (a proxy for WWV) obtained from altimetry, and a validated 1958-2007 DRAKKAR simulation. An Agglomerative Hierarchical Clustering (AHC) technique performed on the observed and modelled WWV tendency shows the existence of five distinct clusters, which characterise the Eastern Pacific (EP) and Central Pacific (CP) El Niño, La Niña, after EP El Niño and neutral conditions. The AHC results, complemented with an analysis of lagged-regression analysis, and 3-month averages of typical EP and CP El Niño events, indicate that the equatorial band WWV discharge during CP is not as pronounced as during EP El Niño. To understand the differences, we analysed the balance of horizontal mass transports accounting for changes in WWV tendency. The analysis indicates an overall poleward transport during EP El Niño, which is not the case during CP El Niño. Instead, a compensating effect with a poleward (equatorward) transport occurring in the western (eastern) Pacific is evident, in line with changes in the zonal thermocline slopes occurring in the western (eastern) half of the basin. The WWV changes are discussed with respect to the conceptual phases of the recharge-discharge oscillator paradigm.
NASA Astrophysics Data System (ADS)
Takahashi, A.; Hashimoto, M.; Hu, J. C.; Fukahata, Y.
2017-12-01
Taiwan Island is composed of many geological structures. The main tectonic feature is the collision of the Luzon volcanic arc with the Eurasian continent, which propagates westward and generates complicated crustal deformation. One way to model crustal deformation is to divide Taiwan island into man rigid blocks that moves relatively each other along the boundaries (deformation zones) of the blocks. Since earthquakes tend to occur in the deformation zones, identification of such tectonic boundaries is important. So far, many tectonic boundaries have been proposed on the basis of geology, geomorphology, seismology and geodesy. However, which is the most significant boundary depends on disciplines and there is no way to objectively classify them. Here, we introduce an objective method to identify significant tectonic boundaries with a hierarchical representation proposed by Simpson et al. [2012].We apply a hierarchical agglomerative clustering algorithm to dense GNSS horizontal velocity data in Taiwan. One of the significant merits of the hierarchical representation of the clustering results is that we can consistently explore crustal structures from larger to smaller scales. This is because a higher hierarchy corresponds to a larger crustal structure, and a lower hierarchy corresponds to a smaller crustal structure. Relative motion between clusters can be obtained from this analysis.The first major boundary is identified along the eastern margin of the Longitudinal Valley, which corresponds to the separation of the Philippine Sea plate and the Eurasian continental margin. The second major boundary appears along the Chaochou fault and the Chishan fault in southwestern Taiwan. The third major boundary appears along the eastern margin of the coastal plane. The identified major clusters can be divided into several smaller blocks without losing consistency with geological boundaries. For example, the Fengshun fault, concealed beneath thick sediment layers, is identified. Furthermore, obtained relative motion between clusters demands a reverse fault or a left lateral fault in the off shore of the coastal range.Our clustering based block modeling is consistent with tectonics of Taiwan, implying that observed crustal deformation in Taiwan can be attributed to motion or deformation of shallow structures.
NASA Astrophysics Data System (ADS)
Flynn, Clare Marie; Pickering, Kenneth E.; Crawford, James H.; Weinheimer, Andrew J.; Diskin, Glenn; Thornhill, K. Lee; Loughner, Christopher; Lee, Pius; Strode, Sarah A.
2016-12-01
To investigate the variability of in situ profile shapes under a variety of meteorological and pollution conditions, results are presented of an agglomerative hierarchical cluster analysis of the in situ O3 and NO2 profiles for each of the four campaigns of the NASA DISCOVER-AQ mission. Understanding the observed profile variability for these trace gases is useful for understanding the accuracy of the assumed profile shapes used in satellite retrieval algorithms as well as for understanding the correlation between satellite column observations and surface concentrations. The four campaigns of the DISCOVER-AQ mission took place in Maryland during July 2011, the San Joaquin Valley of California during January-February 2013, the Houston, Texas, metropolitan region during September 2013, and the Denver-Front Range region of Colorado during July-August 2014. Several distinct profile clusters emerged for the California, Texas, and Colorado campaigns for O3, indicating significant variability of O3 profile shapes, while the Maryland campaign presented only one distinct O3 cluster. In contrast, very few distinct profile clusters emerged for NO2 during any campaign for this particular clustering technique, indicating the NO2 profile behavior was relatively uniform throughout each campaign. However, changes in NO2 profile shape were evident as the boundary layer evolved through the day, but they were apparently not significant enough to yield more clusters. The degree of vertical mixing (as indicated by temperature lapse rate) associated with each cluster exerted an important influence on the shapes of the median cluster profiles for O3, as well as impacted the correlations between the associated column and surface data for each cluster for O3. The correlation analyses suggest satellites may have the best chance to relate to surface O3 under the conditions encountered during the Maryland campaign Clusters 1 and 2, which include deep, convective boundary layers and few interruptions to this connection from complex meteorology, chemical environments, or orography. The regional CMAQ model captured the shape factors for O3, and moderately well captured the NO2 shape factors, for the conditions associated with the Maryland campaign, suggesting that a regional air quality model may adequately specify a priori profile shapes for remote sensing retrievals. CMAQ shape factor profiles were not as well represented for the other regions.
MINE: Module Identification in Networks
2011-01-01
Background Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks. Results MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties. Conclusions MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans. PMID:21605434
The Future of Wind Energy in California: Future Projections in Variable-Resolution CESM
NASA Astrophysics Data System (ADS)
Wang, M.; Ullrich, P. A.; Millstein, D.; Collier, C.
2017-12-01
This study focuses on the wind energy characterization and future projection at five primary wind turbine sites in California. Historical (1980-2000) and mid-century (2030-2050) simulations were produced using the Variable-Resolution Community Earth System Model (VR-CESM) to analyze the trends and variations in wind energy under climate change. Datasets from Det Norske Veritas Germanischer Llyod (DNV GL), MERRA-2, CFSR, NARR, as well as surface observational data were used for model validation and comparison. Significant seasonal wind speed changes under RCP8.5 were detected from several wind farm sites. Large-scale patterns were then investigated to analyze the synoptic-scale impact on localized wind change. The agglomerative clustering method was applied to analyze and group different wind patterns. The associated meteorological background of each cluster was investigated to analyze the drivers of different wind patterns. This study improves the characterization of uncertainty around the magnitude and variability in space and time of California's wind resources in the near future, and also enhances understanding of the physical mechanisms related to the trends in wind resource variability.
Đorđević, Aleksandra S; Jovanović, Olga P; Zlatković, Bojan K; Stojanović, Gordana S
2016-06-01
The essential oils isolated from fresh aerial parts of Ballota macedonica (two populations) and Ballota nigra ssp. foetida were analyzed by GC and GC/MS. Eighty five components were identified in total; 60 components in B. macedonica oil (population from the Former Yugoslav Republic of Macedonia), 34 components in B. macedonica oil (population from the Republic of Serbia), and 33 components in the oil of B. nigra ssp. foetida accounting for 93.9%, 98.4%, and 95.8% of the total oils, respectively. The most abundant components in B. macedonica oils were carotol (13.7 - 52.1%), germacrene D (8.6 - 24.6%), and (E)-caryophyllene (6.5 - 16.5%), while B. nigra ssp. foetida oil was dominated by (E)-phytol (56.9%), germacrene D (10.0%), and (E)-caryophyllene (4.7%). Multivariate statistical analyses (agglomerative hierarchical cluster analysis and principal component analysis) were used to compare and discuss relationships among Ballota species examined so far based on their volatile profiles. The chemical compositions of B. macedonica essential oils are reported for the first time. © 2016 Verlag Helvetica Chimica Acta AG, Zürich.
Whitwell, Jennifer L; Przybelski, Scott A; Weigand, Stephen D; Ivnik, Robert J; Vemuri, Prashanthi; Gunter, Jeffrey L; Senjem, Matthew L; Shiung, Maria M; Boeve, Bradley F; Knopman, David S; Parisi, Joseph E; Dickson, Dennis W; Petersen, Ronald C; Jack, Clifford R; Josephs, Keith A
2009-11-01
The behavioural variant of frontotemporal dementia is a progressive neurodegenerative syndrome characterized by changes in personality and behaviour. It is typically associated with frontal lobe atrophy, although patterns of atrophy are heterogeneous. The objective of this study was to examine case-by-case variability in patterns of grey matter atrophy in subjects with the behavioural variant of frontotemporal dementia and to investigate whether behavioural variant of frontotemporal dementia can be divided into distinct anatomical subtypes. Sixty-six subjects that fulfilled clinical criteria for a diagnosis of the behavioural variant of frontotemporal dementia with a volumetric magnetic resonance imaging scan were identified. Grey matter volumes were obtained for 26 regions of interest, covering frontal, temporal and parietal lobes, striatum, insula and supplemental motor area, using the automated anatomical labelling atlas. Regional volumes were divided by total grey matter volume. A hierarchical agglomerative cluster analysis using Ward's clustering linkage method was performed to cluster the behavioural variant of frontotemporal dementia subjects into different anatomical clusters. Voxel-based morphometry was used to assess patterns of grey matter loss in each identified cluster of subjects compared to an age and gender-matched control group at P < 0.05 (family-wise error corrected). We identified four potentially useful clusters with distinct patterns of grey matter loss, which we posit represent anatomical subtypes of the behavioural variant of frontotemporal dementia. Two of these subtypes were associated with temporal lobe volume loss, with one subtype showing loss restricted to temporal lobe regions (temporal-dominant subtype) and the other showing grey matter loss in the temporal lobes as well as frontal and parietal lobes (temporofrontoparietal subtype). Another two subtypes were characterized by a large amount of frontal lobe volume loss, with one subtype showing grey matter loss in the frontal lobes as well as loss of the temporal lobes (frontotemporal subtype) and the other subtype showing loss relatively restricted to the frontal lobes (frontal-dominant subtype). These four subtypes differed on clinical measures of executive function, episodic memory and confrontation naming. There were also associations between the four subtypes and genetic or pathological diagnoses which were obtained in 48% of the cohort. The clusters did not differ in behavioural severity as measured by the Neuropsychiatric Inventory; supporting the original classification of the behavioural variant of frontotemporal dementia in these subjects. Our findings suggest behavioural variant of frontotemporal dementia can therefore be subdivided into four different anatomical subtypes.
Taha, Zahari; Musa, Rabiu Muazu; P P Abdul Majeed, Anwar; Alim, Muhammad Muaz; Abdullah, Mohamad Razali
2018-02-01
Support Vector Machine (SVM) has been shown to be an effective learning algorithm for classification and prediction. However, the application of SVM for prediction and classification in specific sport has rarely been used to quantify/discriminate low and high-performance athletes. The present study classified and predicted high and low-potential archers from a set of fitness and motor ability variables trained on different SVMs kernel algorithms. 50 youth archers with the mean age and standard deviation of 17.0 ± 0.6 years drawn from various archery programmes completed a six arrows shooting score test. Standard fitness and ability measurements namely hand grip, vertical jump, standing broad jump, static balance, upper muscle strength and the core muscle strength were also recorded. Hierarchical agglomerative cluster analysis (HACA) was used to cluster the archers based on the performance variables tested. SVM models with linear, quadratic, cubic, fine RBF, medium RBF, as well as the coarse RBF kernel functions, were trained based on the measured performance variables. The HACA clustered the archers into high-potential archers (HPA) and low-potential archers (LPA), respectively. The linear, quadratic, cubic, as well as the medium RBF kernel functions models, demonstrated reasonably excellent classification accuracy of 97.5% and 2.5% error rate for the prediction of the HPA and the LPA. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from a combination of the selected few measured fitness and motor ability performance variables examined which would consequently save cost, time and effort during talent identification programme. Copyright © 2017 Elsevier B.V. All rights reserved.
Anomaly detection of flight routes through optimal waypoint
NASA Astrophysics Data System (ADS)
Pusadan, M. Y.; Buliali, J. L.; Ginardi, R. V. H.
2017-01-01
Deciding factor of flight, one of them is the flight route. Flight route determined by coordinate (latitude and longitude). flight routed is determined by its coordinates (latitude and longitude) as defined is waypoint. anomaly occurs, if the aircraft is flying outside the specified waypoint area. In the case of flight data, anomalies occur by identifying problems of the flight route based on data ADS-B. This study has an aim of to determine the optimal waypoints of the flight route. The proposed methods: i) Agglomerative Hierarchical Clustering (AHC) in several segments based on range area coordinates (latitude and longitude) in every waypoint; ii) The coefficient cophenetics correlation (c) to determine the correlation between the members in each cluster; iii) cubic spline interpolation as a graphic representation of the has connected between the coordinates on every waypoint; and iv). Euclidean distance to measure distances between waypoints with 2 centroid result of clustering AHC. The experiment results are value of coefficient cophenetics correlation (c): 0,691≤ c ≤ 0974, five segments the generated of the range area waypoint coordinates, and the shortest and longest distance between the centroid with waypoint are 0.46 and 2.18. Thus, concluded that the shortest distance is used as the reference coordinates of optimal waypoint, and farthest distance can be indicated potentially detected anomaly.
Spatial Analysis of Case-Mix and Dialysis Modality Associations.
Phirtskhalaishvili, Tamar; Bayer, Florian; Edet, Stephane; Bongiovanni, Isabelle; Hogan, Julien; Couchoud, Cécile
2016-01-01
♦ Health-care systems must attempt to provide appropriate, high-quality, and economically sustainable care that meets the needs and choices of patients with end-stage renal disease (ESRD). France offers 9 different modalities of dialysis, each characterized by dialysis technique, the extent of professional assistance, and the treatment site. The aim of this study was 1) to describe the various dialysis modalities in France and the patient characteristics associated with each of them, and 2) to analyze their regional patterns to identify possible unexpected associations between case-mixes and dialysis modalities. ♦ The clinical characteristics of the 37,421 adult patients treated by dialysis were described according to their treatment modality. Agglomerative hierarchical cluster analysis was used to aggregate the regions into clusters according to their use of these modalities and the characteristics of their patients. ♦ The gradient of patient characteristics was similar from home hemodialyis (HD) to in-center HD and from non-assisted automated peritoneal dialysis (APD) to assisted continuous ambulatory peritoneal dialysis (CAPD). Analyzing their spatial distribution, we found differences in the patient case-mix on dialysis across regions but also differences in the health-care provided for them. The classification of the regions into 6 different clusters allowed us to detect some unexpected associations between case-mixes and treatment modalities. ♦ The 9 modalities of treatment available make it theoretically possible to adapt treatment to patients' clinical characteristics and abilities. However, although we found an overall appropriate association of dialysis modalities to the case-mix, major inter-region heterogeneity and the low rate of peritoneal dialysis (PD) and home HD suggest that factors besides patients' clinical conditions impact the choice of dialysis modality. The French organization should now be evaluated in terms of patients' quality of life, satisfaction, survival, and global efficiency. Copyright © 2016 International Society for Peritoneal Dialysis.
Spatial Analysis of Case-Mix and Dialysis Modality Associations
Phirtskhalaishvili, Tamar; Bayer, Florian; Edet, Stephane; Bongiovanni, Isabelle; Hogan, Julien; Couchoud, Cécile
2016-01-01
♦ Background: Health-care systems must attempt to provide appropriate, high-quality, and economically sustainable care that meets the needs and choices of patients with end-stage renal disease (ESRD). France offers 9 different modalities of dialysis, each characterized by dialysis technique, the extent of professional assistance, and the treatment site. The aim of this study was 1) to describe the various dialysis modalities in France and the patient characteristics associated with each of them, and 2) to analyze their regional patterns to identify possible unexpected associations between case-mixes and dialysis modalities. ♦ Methods: The clinical characteristics of the 37,421 adult patients treated by dialysis were described according to their treatment modality. Agglomerative hierarchical cluster analysis was used to aggregate the regions into clusters according to their use of these modalities and the characteristics of their patients. ♦ Result: The gradient of patient characteristics was similar from home hemodialyis (HD) to in-center HD and from non-assisted automated peritoneal dialysis (APD) to assisted continuous ambulatory peritoneal dialysis (CAPD). Analyzing their spatial distribution, we found differences in the patient case-mix on dialysis across regions but also differences in the health-care provided for them. The classification of the regions into 6 different clusters allowed us to detect some unexpected associations between case-mixes and treatment modalities. ♦ Conclusions: The 9 modalities of treatment available make it theoretically possible to adapt treatment to patients' clinical characteristics and abilities. However, although we found an overall appropriate association of dialysis modalities to the case-mix, major inter-region heterogeneity and the low rate of peritoneal dialysis (PD) and home HD suggest that factors besides patients' clinical conditions impact the choice of dialysis modality. The French organization should now be evaluated in terms of patients' quality of life, satisfaction, survival, and global efficiency. PMID:26475843
Method for preventing plugging in the pyrolysis of agglomerative coals
Green, Norman W.
1979-01-23
To prevent plugging in a pyrolysis operation where an agglomerative coal in a nondeleteriously reactive carrier gas is injected as a turbulent jet from an opening into an elongate pyrolysis reactor, the coal is comminuted to a size where the particles under operating conditions will detackify prior to contact with internal reactor surfaces while a secondary flow of fluid is introduced along the peripheral inner surface of the reactor to prevent backflow of the coal particles. The pyrolysis operation is depicted by two equations which enable preselection of conditions which insure prevention of reactor plugging.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-01-01
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
Bizhani, Golnoosh; Grassberger, Peter; Paczuski, Maya
2011-12-01
We study the statistical behavior under random sequential renormalization (RSR) of several network models including Erdös-Rényi (ER) graphs, scale-free networks, and an annealed model related to ER graphs. In RSR the network is locally coarse grained by choosing at each renormalization step a node at random and joining it to all its neighbors. Compared to previous (quasi-)parallel renormalization methods [Song et al., Nature (London) 433, 392 (2005)], RSR allows a more fine-grained analysis of the renormalization group (RG) flow and unravels new features that were not discussed in the previous analyses. In particular, we find that all networks exhibit a second-order transition in their RG flow. This phase transition is associated with the emergence of a giant hub and can be viewed as a new variant of percolation, called agglomerative percolation. We claim that this transition exists also in previous graph renormalization schemes and explains some of the scaling behavior seen there. For critical trees it happens as N/N(0) → 0 in the limit of large systems (where N(0) is the initial size of the graph and N its size at a given RSR step). In contrast, it happens at finite N/N(0) in sparse ER graphs and in the annealed model, while it happens for N/N(0) → 1 on scale-free networks. Critical exponents seem to depend on the type of the graph but not on the average degree and obey usual scaling relations for percolation phenomena. For the annealed model they agree with the exponents obtained from a mean-field theory. At late times, the networks exhibit a starlike structure in agreement with the results of Radicchi et al. [Phys. Rev. Lett. 101, 148701 (2008)]. While degree distributions are of main interest when regarding the scheme as network renormalization, mass distributions (which are more relevant when considering "supernodes" as clusters) are much easier to study using the fast Newman-Ziff algorithm for percolation, allowing us to obtain very high statistics.
Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef
2012-10-01
The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Pinto, U; Maheshwari, B L; Ollerton, R L
2013-06-01
The Hawkesbury-Nepean River (HNR) system in South-Eastern Australia is the main source of water supply for the Sydney Metropolitan area and is one of the more complex river systems due to the influence of urbanisation and other activities in the peri-urban landscape through which it flows. The long-term monitoring of river water quality is likely to suffer from data gaps due to funding cuts, changes in priority and related reasons. Nevertheless, we need to assess river health based on the available information. In this study, we demonstrated how the Factor Analysis (FA), Hierarchical Agglomerative Cluster Analysis (HACA) and Trend Analysis (TA) can be applied to evaluate long-term historic data sets. Six water quality parameters, viz., temperature, chlorophyll-a, dissolved oxygen, oxides of nitrogen, suspended solids and reactive silicates, measured at weekly intervals between 1985 and 2008 at 12 monitoring stations located along the 300 km length of the HNR system were evaluated to understand the human and natural influences on the river system in a peri-urban landscape. The application of FA extracted three latent factors which explained more than 70 % of the total variance of the data and related to the 'bio-geographical', 'natural' and 'nutrient pollutant' dimensions of the HNR system. The bio-geographical and nutrient pollution factors more likely related to the direct influence of changes and activities of peri-urban natures and accounted for approximately 50 % of variability in water quality. The application of HACA indicated two major clusters representing clean and polluted zones of the river. On the spatial scale, one cluster was represented by the upper and lower sections of the river (clean zone) and accounted for approximately 158 km of the river. The other cluster was represented by the middle section (polluted zone) with a length of approximately 98 km. Trend Analysis indicated how the point sources influence river water quality on spatio-temporal scales, taking into account the various effects of nutrient and other pollutant loads from sewerage effluents, agriculture and other point and non-point sources along the river and major tributaries of the HNR. Over the past 26 years, water temperature has significantly increased while suspended solids have significantly decreased (p < 0.05). The analysis of water quality data through FA, HACA and TA helped to characterise the key sections and cluster the key water quality variables of the HNR system. The insights gained from this study have the potential to improve the effectiveness of river health-monitoring programs in terms of cost, time and effort, particularly in a peri-urban context.
NASA Astrophysics Data System (ADS)
Ali, Mian Ahsan; Bashir, Shazia; Akram, Mahreen; Mahmood, Khaliq; Faizan-ul-Haq; Hayat, Asma; Mutaza, G.; Chishti, Naveed Ahmed; Khan, M. Asad; Ahmad, Shahbaz
2018-05-01
Ion-induced modifications of brass in terms of surface morphology, elemental composition, phase changes, field emission properties and electrical conductivity have been investigated. Brass targets were irradiated by proton beam at constant energy of 3 MeV for various doses ranges from 1 × 1012 ions/cm2 to 1.5 × 1014 ions/cm2 using Pelletron Linear Accelerator. Field Emission Scanning Electron Microscope (FESEM) analysis reveals the formation of randomly distributed clusters, particulates, droplets and agglomers for lower ion doses which are explainable on the basis of cascade collisional process and thermal spike model. Whereas, at moderate ion doses, fiber like structures are formed due to incomplete melting. The formation of cellular like structure is observed at the maximum ion dose and is attributed to intense heating, melting and re-solidification. SRIM software analysis reveals that the penetration depth of 3 MeV protons in brass comes out to be 38 μm, whereas electronic and nuclear energy losses come out to be 5 × 10-1 and 3.1 × 10-4 eV/Å respectively. The evaluated values of energy deposited per atom vary from 0.01 to 1.5 eV with the variation of ion doses from 1 × 1012 ions/cm2 to 1.5 × 1014 ions/cm2. Both elemental analysis i.e. Energy Dispersive X-ray spectroscopy (EDX) and X-ray Diffraction (XRD) supports each other and no new element or phase is identified. However, slight change in peak intensity and angle shifting is observed. Field emission properties of ion-structured brass are explored by measuring I-V characteristics of targets under UHV condition in diode-configuration using self designed and fabricated setup. Improvement in field enhancement factor (β) is estimated from the slope of Fowler-Nordheim (F-N) plots and it shows significant increase from 5 to 1911, whereas a reduction in turn on field (Eo) from 65 V/μm to 30 V/μm and increment in maximum current density (Jmax) from 12 μA/cm2 to 3821 μA/cm2 is observed. These enhancements in field emission characteristics are correlated with the growth of surface structures, specifically agglomers which are responsible for electric field convergence. Electrical by four probe method has been correlated with maximum current density and decreasing trend is observed with increasing ion doses.
Syed Abdul Mutalib, Sharifah Norsukhairin; Juahir, Hafizan; Azid, Azman; Mohd Sharif, Sharifah; Latif, Mohd Talib; Aris, Ahmad Zaharin; Zain, Sharifuddin M; Dominick, Doreena
2013-09-01
The objective of this study is to identify spatial and temporal patterns in the air quality at three selected Malaysian air monitoring stations based on an eleven-year database (January 2000-December 2010). Four statistical methods, Discriminant Analysis (DA), Hierarchical Agglomerative Cluster Analysis (HACA), Principal Component Analysis (PCA) and Artificial Neural Networks (ANNs), were selected to analyze the datasets of five air quality parameters, namely: SO2, NO2, O3, CO and particulate matter with a diameter size of below 10 μm (PM10). The three selected air monitoring stations share the characteristic of being located in highly urbanized areas and are surrounded by a number of industries. The DA results show that spatial characterizations allow successful discrimination between the three stations, while HACA shows the temporal pattern from the monthly and yearly factor analysis which correlates with severe haze episodes that have happened in this country at certain periods of time. The PCA results show that the major source of air pollution is mostly due to the combustion of fossil fuel in motor vehicles and industrial activities. The spatial pattern recognition (S-ANN) results show a better prediction performance in discriminating between the regions, with an excellent percentage of correct classification compared to DA. This study presents the necessity and usefulness of environmetric techniques for the interpretation of large datasets aiming to obtain better information about air quality patterns based on spatial and temporal characterizations at the selected air monitoring stations.
NASA Astrophysics Data System (ADS)
D'Alessandro, A.; Mangano, G.; D'Anna, G.; Luzio, D.; Selvaggi, G.
2011-12-01
On September 6th 2002 the northern Sicily was hit by a strong earthquake (MW 5.9). In the following six months over a thousand aftershocks were located in the same area. On December 7th 2009, the INGV OBSLab deployed an OBS/H near the epicentral area of the main shock at a depth of 1500 m. The submarine station was recovered after 233 days. During the eight months of the experiment the OBS/H recorded about 250 small magnitude events of clear local origin. In order to identify seismic events generated by the same tectonic structure, we have applied a clustering technique based on the similarity of the waveforms. The similarity matrix was constructed using the maximum of the normalized cross-covariance function. To identify the multiplets, we used a clustering technique based on an agglomerative hierarchical algorithm, based on the nearest neighbor strategy. The results were summarized in the dendrogram of Fig. 1. The partitions have been obtained by "cutting" the dendrogram at a level of distance equal to 0.3. So we have identified 9 multiplets and some doublets and triplets. Fig. 2 shows as example the multiplet 1. The events of this cluster have a high level of similarity; 25 of the 31 micro-events are characterized by a similarity greater than 0.9. In order to locate the micro-earthquakes recorded by the OBS/H only a single station location technique was implemented and applied. Some multiplets have clouds of hypocenters overlapping each other. These clusters, indistinguishable without the application of a waveforms clustering technique, show differences in the waveforms that must be attributed to differences in focal mechanisms which generated the waveforms.
A medical ontology for intelligent web-based skin lesions image retrieval.
Maragoudakis, Manolis; Maglogiannis, Ilias
2011-06-01
Researchers have applied increasing efforts towards providing formal computational frameworks to consolidate the plethora of concepts and relations used in the medical domain. In the domain of skin related diseases, the variability of semantic features contained within digital skin images is a major barrier to the medical understanding of the symptoms and development of early skin cancers. The desideratum of making these standards machine-readable has led to their formalization in ontologies. In this work, in an attempt to enhance an existing Core Ontology for skin lesion images, hand-coded from image features, high quality images were analyzed by an autonomous ontology creation engine. We show that by exploiting agglomerative clustering methods with distance criteria upon the existing ontological structure, the original domain model could be enhanced with new instances, attributes and even relations, thus allowing for better classification and retrieval of skin lesion categories from the web.
Yang, Kai-Min; Chiang, Po-Yuan
2017-01-01
Different biological sources of n-3 polyunsaturated fatty acids (n-3 PUFA) in mainstream commercial products include algae and fish. Lipid oxidation in n-3 PUFA-rich oil is the most important cause of its deterioration. We investigated the kinetic parameters of n-3 PUFA-rich oil during oxidation via Rancimat (at a temperature range of 70~100 °C). This was done on the basis of the Arrhenius equation, which indicates that the activation energies (Ea) for oxidative stability are 82.84–96.98 KJ/mol. The chemical substrates of different oxidative levels resulting from oxidation via Rancimat at 80 °C were evaluated. At the initiation of oxidation, the tocopherols in the oil degraded very quickly, resulting in diminished protection against further oxidation. Then, the degradation of the fatty acids with n-3 PUFA-rich oil was evident because of decreased levels of PUFA along with increased levels of saturated fatty acids (SFA). The quality deterioration from n-3 PUFA-rich oil at the various oxidative levels was analyzed chemometrically. The anisidine value (p-AV, r: 0.92) and total oxidation value (TOTOX, r: 0.91) exhibited a good linear relationship in a principal component analysis (PCA), while oxidative change and a significant quality change to the induction period (IP) were detected through an agglomerative hierarchical cluster (AHC) analysis. PMID:28350348
Yang, Kai-Min; Chiang, Po-Yuan
2017-03-28
Different biological sources of n -3 polyunsaturated fatty acids ( n -3 PUFA) in mainstream commercial products include algae and fish. Lipid oxidation in n -3 PUFA-rich oil is the most important cause of its deterioration. We investigated the kinetic parameters of n -3 PUFA-rich oil during oxidation via Rancimat (at a temperature range of 70~100 °C). This was done on the basis of the Arrhenius equation, which indicates that the activation energies ( E a) for oxidative stability are 82.84-96.98 KJ/mol. The chemical substrates of different oxidative levels resulting from oxidation via Rancimat at 80 °C were evaluated. At the initiation of oxidation, the tocopherols in the oil degraded very quickly, resulting in diminished protection against further oxidation. Then, the degradation of the fatty acids with n -3 PUFA-rich oil was evident because of decreased levels of PUFA along with increased levels of saturated fatty acids (SFA). The quality deterioration from n -3 PUFA-rich oil at the various oxidative levels was analyzed chemometrically. The anisidine value (p-AV, r: 0.92) and total oxidation value (TOTOX, r: 0.91) exhibited a good linear relationship in a principal component analysis (PCA), while oxidative change and a significant quality change to the induction period (IP) were detected through an agglomerative hierarchical cluster (AHC) analysis.
Composition and source apportionment of dust fall around a natural lake.
Latif, Mohd Talib; Ngah, Sofia Aida; Dominick, Doreena; Razak, Intan Suraya; Guo, Xinxin; Srithawirat, Thunwadee; Mushrifah, Idris
2015-07-01
The aim of this study was to determine the source apportionment of dust fall around Lake Chini, Malaysia. Samples were collected monthly between December 2012 and March 2013 at seven sampling stations located around Lake Chini. The samples were filtered to separate the dissolved and undissolved solids. The ionic compositions (NO3-, SO4(2-), Cl- and NH4+) were determined using ion chromatography (IC) while major elements (K, Na, Ca and Mg) and trace metals (Zn, Fe, Al, Ni, Mn, Cr, Pb and Cd) were determined using inductively coupled plasma mass spectrometry (ICP-MS). The results showed that the average concentration of total solids around Lake Chini was 93.49±16.16 mg/(m2·day). SO4(2-), Na and Zn dominated the dissolved portion of the dust fall. The enrichment factors (EF) revealed that the source of the trace metals and major elements in the rain water was anthropogenic, except for Fe. Hierarchical agglomerative cluster analysis (HACA) classified the seven monitoring stations and 16 variables into five groups and three groups respectively. A coupled receptor model, principal component analysis multiple linear regression (PCA-MLR), revealed that the sources of dust fall in Lake Chini were dominated by agricultural and biomass burning (42%), followed by the earth's crust (28%), sea spray (16%) and a mixture of soil dust and vehicle emissions (14%). Copyright © 2015. Published by Elsevier B.V.
Instrumental and statistical methods for the comparison of class evidence
NASA Astrophysics Data System (ADS)
Liszewski, Elisa Anne
Trace evidence is a major field within forensic science. Association of trace evidence samples can be problematic due to sample heterogeneity and a lack of quantitative criteria for comparing spectra or chromatograms. The aim of this study is to evaluate different types of instrumentation for their ability to discriminate among samples of various types of trace evidence. Chemometric analysis, including techniques such as Agglomerative Hierarchical Clustering, Principal Components Analysis, and Discriminant Analysis, was employed to evaluate instrumental data. First, automotive clear coats were analyzed by using microspectrophotometry to collect UV absorption data. In total, 71 samples were analyzed with classification accuracy of 91.61%. An external validation was performed, resulting in a prediction accuracy of 81.11%. Next, fiber dyes were analyzed using UV-Visible microspectrophotometry. While several physical characteristics of cotton fiber can be identified and compared, fiber color is considered to be an excellent source of variation, and thus was examined in this study. Twelve dyes were employed, some being visually indistinguishable. Several different analyses and comparisons were done, including an inter-laboratory comparison and external validations. Lastly, common plastic samples and other polymers were analyzed using pyrolysis-gas chromatography/mass spectrometry, and their pyrolysis products were then analyzed using multivariate statistics. The classification accuracy varied dependent upon the number of classes chosen, but the plastics were grouped based on composition. The polymers were used as an external validation and misclassifications occurred with chlorinated samples all being placed into the category containing PVC.
Phenotypes of asthma in low-income children and adolescents: cluster analysis.
Cabral, Anna Lucia Barros; Sousa, Andrey Wirgues; Mendes, Felipe Augusto Rodrigues; Carvalho, Celso Ricardo Fernandes de
2017-01-01
Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. We included 306 children and adolescents (6-18 years of age) with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Three clusters were defined for our population. Cluster 1 (n = 94) included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87) included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108) included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability. Estudos que caracterizam fenótipos de asma predominantemente incluem adultos ou foram realizados em crianças e adolescentes de países desenvolvidos; portanto, sua aplicabilidade em outras populações, tais como as de países em desenvolvimento, permanece indeterminada. Nosso objetivo foi determinar como crianças e adolescentes asmáticas de baixa renda no Brasil são distribuídos através de uma análise de clusters. Foram incluídos 306 crianças e adolescentes (6-18 anos de idade) com diagnóstico clínico de asma e sob tratamento médico por pelo menos um ano de acompanhamento. No momento da inclusão, todos os pacientes estavam clinicamente estáveis. Vinte variáveis comumente determinadas na prática clínica e consideradas importantes na definição dos fenótipos de asma foram selecionadas para a análise de clusters. As variáveis com alta multicolinearidade foram excluídas. Uma análise de clusters foi realizada utilizando-se um teste aglomerativo em duas etapas e log-likelihood distance measure. Três clusters foram definidos para nossa população. O cluster 1 (n = 94) incluiu indivíduos com função pulmonar normal, inflamação eosinofílica leve, poucas exacerbações, início mais tardio da asma e atopia leve. O cluster 2 (n = 87) incluiu pacientes com função pulmonar normal, número moderado de exacerbações, início precoce da asma, inflamação eosinofílica mais grave e atopia moderada. O cluster 3 (n = 108) incluiu pacientes com função pulmonar ruim, exacerbações frequentes, inflamação eosinofílica e atopia graves. A asma foi caracterizada por presença de atopia, número de exacerbações e função pulmonar em crianças e adolescentes de baixa renda no Brasil. As muitas semelhanças entre esta e outras análises de clusters de fenótipos indicam que essa abordagem apresenta boa generalização.
Unsupervised classification of multivariate geostatistical data: Two algorithms
NASA Astrophysics Data System (ADS)
Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques
2015-12-01
With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
Loprinzi, Paul D; Walker, Jerome F
2016-01-01
To our knowledge, no longitudinal epidemiological study among daily smokers has examined the effects of physical activity change/ trajectory on smoking cessation. The purpose of this study was to examine the longitudinal effects of changes in physical activity on smoking cessation among a national sample of young (16-24 y) daily smokers. Data from the 2003-2005 National Youth Smoking Cessation Survey were used (N = 1178). Using hierarchical agglomerative cluster analysis, 5 distinct self-reported physical activity trajectories over 3 time periods (baseline, 12-month, and 24-month follow-up) were observed, including stable low physical activity, decreasing physical activity, curvilinear physical activity, stable high physical activity, and increasing physical activity. Nicotine dependence (Heaviness of Smoking Index) and demographic parameters were assessed via survey. With stable low physical activity (16.2% quit smoking) serving as the referent group, those in the stable high physical activity (24.8% quit smoking) group had 1.8 greater odds of not smoking at the 24-month follow-up period (odds ratio = 1.81; 95% confidence interval, 1.12-2.91) after adjusting for nicotine dependence, age, gender, race-ethnicity, and education. Maintenance of regular physical activity among young daily smokers may help to facilitate smoking cessation.
Gomes, Liliane R.; Gomes, Marcelo; Jung, Bryan; Paniagua, Beatriz; Ruellas, Antonio C.; Gonçalves, João Roberto; Styner, Martin A.; Wolford, Larry; Cevidanes, Lucia
2015-01-01
Abstract. This study aimed to investigate imaging statistical approaches for classifying three-dimensional (3-D) osteoarthritic morphological variations among 169 temporomandibular joint (TMJ) condyles. Cone-beam computed tomography scans were acquired from 69 subjects with long-term TMJ osteoarthritis (OA), 15 subjects at initial diagnosis of OA, and 7 healthy controls. Three-dimensional surface models of the condyles were constructed and SPHARM-PDM established correspondent points on each model. Multivariate analysis of covariance and direction-projection-permutation (DiProPerm) were used for testing statistical significance of the differences between the groups determined by clinical and radiographic diagnoses. Unsupervised classification using hierarchical agglomerative clustering was then conducted. Compared with healthy controls, OA average condyle was significantly smaller in all dimensions except its anterior surface. Significant flattening of the lateral pole was noticed at initial diagnosis. We observed areas of 3.88-mm bone resorption at the superior surface and 3.10-mm bone apposition at the anterior aspect of the long-term OA average model. DiProPerm supported a significant difference between the healthy control and OA group (p-value=0.001). Clinically meaningful unsupervised classification of TMJ condylar morphology determined a preliminary diagnostic index of 3-D osteoarthritic changes, which may be the first step towards a more targeted diagnosis of this condition. PMID:26158119
Mastrorilli, C; Tripodi, S; Caffarelli, C; Perna, S; Di Rienzo-Businco, A; Sfika, I; Asero, R; Dondi, A; Bianchi, A; Povesi Dascola, C; Ricci, G; Cipriani, F; Maiello, N; Miraglia Del Giudice, M; Frediani, T; Frediani, S; Macrì, F; Pistoletti, C; Dello Iacono, I; Patria, M F; Varin, E; Peroni, D; Comberiati, P; Chini, L; Moschese, V; Lucarelli, S; Bernardini, R; Pingitore, G; Pelosi, U; Olcese, R; Moretti, M; Cirisano, A; Faggian, D; Travaglini, A; Plebani, M; Verga, M C; Calvani, M; Giordani, P; Matricardi, P M
2016-08-01
Pollen-food syndrome (PFS) is heterogeneous with regard to triggers, severity, natural history, comorbidities, and response to treatment. Our study aimed to classify different endotypes of PFS based on IgE sensitization to panallergens. We examined 1271 Italian children (age 4-18 years) with seasonal allergic rhinoconjunctivitis (SAR). Foods triggering PFS were acquired by questionnaire. Skin prick tests were performed with commercial pollen extracts. IgE to panallergens Phl p 12 (profilin), Bet v 1 (PR-10), and Pru p 3 (nsLTP) were tested by ImmunoCAP FEIA. An unsupervised hierarchical agglomerative clustering method was applied within PFS population. PFS was observed in 300/1271 children (24%). Cluster analysis identified five PFS endotypes linked to panallergen IgE sensitization: (i) cosensitization to ≥2 panallergens ('multi-panallergen PFS'); (ii-iv) sensitization to either profilin, or nsLTP, or PR-10 ('mono-panallergen PFS'); (v) no sensitization to panallergens ('no-panallergen PFS'). These endotypes showed peculiar characteristics: (i) 'multi-panallergen PFS': severe disease with frequent allergic comorbidities and multiple offending foods; (ii) 'profilin PFS': oral allergy syndrome (OAS) triggered by Cucurbitaceae; (iii) 'LTP PFS': living in Southern Italy, OAS triggered by hazelnut and peanut; (iv) 'PR-10 PFS': OAS triggered by Rosaceae; and (v) 'no-panallergen PFS': mild disease and OAS triggered by kiwifruit. In a Mediterranean country characterized by multiple pollen exposures, PFS is a complex and frequent complication of childhood SAR, with five distinct endotypes marked by peculiar profiles of IgE sensitization to panallergens. Prospective studies in cohorts of patients with PFS are now required to test whether this novel classification may be useful for diagnostic and therapeutic purposes in the clinical practice. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Ruske, S. T.; Topping, D. O.; Foot, V. E.; Kaye, P. H.; Stanley, W. R.; Morse, A. P.; Crawford, I.; Gallagher, M. W.
2016-12-01
Characterisation of bio-aerosols has important implications within Environment and Public Health sectors. Recent developments in Ultra-Violet Light Induced Fluorescence (UV-LIF) detectors such as the Wideband Integrated bio-aerosol Spectrometer (WIBS) and the newly introduced Multiparameter bio-aerosol Spectrometer (MBS) has allowed for the real time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal Spores and pollen. This new generation of instruments has enabled ever-larger data sets to be compiled with the aim of studying more complex environments, yet the algorithms used for specie classification remain largely invalidated. It is therefore imperative that we validate the performance of different algorithms that can be used for the task of classification, which is the focus of this study. For unsupervised learning we test Hierarchical Agglomerative Clustering with various different linkages. For supervised learning, ten methods were tested; including decision trees, ensemble methods: Random Forests, Gradient Boosting and AdaBoost; two implementations for support vector machines: libsvm and liblinear; Gaussian methods: Gaussian naïve Bayesian, quadratic and linear discriminant analysis and finally the k-nearest neighbours algorithm. The methods were applied to two different data sets measured using a new Multiparameter bio-aerosol Spectrometer. We find that clustering, in general, performs slightly worse than the supervised learning methods correctly classifying, at best, only 72.7 and 91.1 percent for the two data sets. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 88.1 and 97.8 percent of the testing data respectively across the two data sets. We discuss the wider relevance of these results with regards to challenging existing classification in real-world environments.
Huber, Maxime; Gilbert, Guillaume; Roy, Julien; Parent, Stefan; Labelle, Hubert; Périé, Delphine
2016-11-01
To measure magnetic resonance imaging (MRI) parameters including relaxation times (T 1 ρ, T 2 ), magnetization transfer (MT) and diffusion parameters (mean diffusivity [MD], fractional anisotropy [FA]) of intervertebral discs in adolescents with idiopathic scoliosis, and to investigate the sensitivity of these MR parameters to the severity of the spine deformities. Thirteen patients with adolescent idiopathic scoliosis and three control volunteers with no history of spine disease underwent an MRI acquisition at 3T including the mapping of T 1 ρ, T 2 , MT, MD, and FA. The apical zone included all discs within the scoliotic curve while the control zone was composed of other discs. The severity was analyzed through low (<32°) versus high (>40°) Cobb angles. One-way analysis of variance (ANOVA) and agglomerative hierarchical clustering (AHC) were performed. Significant differences were found between the apical zone and the control zone for T 2 (P = 0.047), and between low and high Cobb angles for T 2 (P = 0.014) and MT (P = 0.002). AHC showed two distinct clusters, one with mainly low Cobb angles and one with mainly high Cobb angles, for the MRI parameters measured within the apical zone, with an accuracy of 0.9 and a Matthews correlation coefficient (MCC) of 0.8. Within the control zone, the AHC showed no clear classification (accuracy of 0.6 and MCC of 0.2). We successfully performed an in vivo multiparametric MRI investigation of young patients with adolescent idiopathic scoliosis. The MRI parameters measured within the intervertebral discs were found to be sensitive to intervertebral disc degeneration occurring with scoliosis and to the severity of scoliosis. J. Magn. Reson. Imaging 2016;44:1123-1131. © 2016 International Society for Magnetic Resonance in Medicine.
Nazeer, Summya; Ali, Zeshan; Malik, Riffat Naseem
2016-07-01
The present study was designed to determine the spatiotemporal patterns in water quality of River Soan using multivariate statistics. A total of 26 sites were surveyed along River Soan and its associated tributaries during pre- and post-monsoon seasons in 2008. Hierarchical agglomerative cluster analysis (HACA) classified sampling sites into three groups according to their degree of pollution, which ranged from least to high degradation of water quality. Discriminant function analysis (DFA) revealed that alkalinity, orthophosphates, nitrates, ammonia, salinity, and Cd were variables that significantly discriminate among three groups identified by HACA. Temporal trends as identified through DFA revealed that COD, DO, pH, Cu, Cd, and Cr could be attributed for major seasonal variations in water quality. PCA/FA identified six factors as potential sources of pollution of River Soan. Absolute principal component scores using multiple regression method (APCS-MLR) further explained the percent contribution from each source. Heavy metals were largely added through industrial activities (28 %) and sewage waste (28 %), nutrients through agriculture runoff (35 %) and sewage waste (28 %), organic pollution through sewage waste (27 %) and urban runoff (17 %) and macroelements through urban runoff (39 %), and mineralization and sewage waste (30 %). The present study showed that anthropogenic activities are the major source of variations in River Soan. In order to address the water quality issues, implementation of effective waste management measures are needed.
Contrasting ENSO types with novel satellite derived ocean phytoplankton biomass
NASA Astrophysics Data System (ADS)
Sharma, P.; Singh, A. M.; Marinov, I.; Kostadinov, T. S.
2016-12-01
Observed variations in community structure and biogeochemical processes in the tropics and the North Atlantic have been linked, in the first order, to the El Niño Southern Oscillation phenomenon (e.g., Bates, 2001; Karl et al., 2001; Di Lorenzo et al., 2010; Di Lorenzo et al., 2013). Current significant technical advances have allowed for the retrieval of biological data from the optical properties of the water via satellite ocean color remote sensing, providing an opportunity for quantifying the relationships between biological and climate indices. Studies have focused in-depth on contrasting flavors of the ENSO types with various physical (e.g., Singh et al. 2011; Turk et al. 2011) and biological (e.g., Radenac et al. 2012) indices. Here, we analyze the impact of different ENSO types on biology via analysis of recently-derived backscattering-based biomass separated into size-groups (Kostadinov et al. 2010, 2016) over the 17-year (1997-2013). We further contrast the responses of biomass with those of chlorophyll (Chl) and particulate inorganic carbon (PIC). We analyze the complex spatial differences in both physical (SST, mixed layer depth, winds) and biological (Chl, total and size-partitioned biomass) variability across the Pacific warm pool and equatorial tongue via simple EOF, combined regression-EOF and Agglomerative Hierarchical Clustering (AHC) analysis. The interannual variability in the physical and biological fields show clear signatures of the Niño cold-tongue (NCT) and Niño warm pool (NWP). Possible mechanisms responsible for these signatures are discussed.
NASA Astrophysics Data System (ADS)
Kumari, Babita; Paul, Pranesh Kumar; Singh, Rajendra; Mishra, Ashok; Gupta, Praveen Kumar; Singh, Raghvendra P.
2017-04-01
A new semi-distributed conceptual hydrological model, namely Satellite based Hydrological Model (SHM), has been developed under 'PRACRITI-2' program of Space Application Centre (SAC), Ahmedabad for sustainable water resources management of India by using data from Indian Remote Sensing satellites. Entire India is divided into 5km x 5km grid cells and properties at the center of the cells are assumed to represent the property of the cells. SHM contains five modules namely surface water, forest, snow, groundwater and routing. Two empirical equations (SCS-CN and Hargreaves) and water balance method have been used in the surface water module; the forest module is based on the calculations of water balancing & dynamics of subsurface. 2-D Boussinesq equation is used for groundwater modelling which is solved using implicit finite-difference. The routing module follows a distributed routing approach which requires flow path and network with the key point of travel time estimation. The aim of this study is to evaluate the performance of SHM using regionalization technique which also checks the usefulness of a model in data scarce condition or for ungauged basins. However, homogeneity analysis is pre-requisite to regionalization. Similarity index (Φ) and hierarchical agglomerative cluster analysis are adopted to test the homogeneity in terms of physical attributes of three basins namely Brahmani (39,033 km km^2)), Baitarani (10,982 km km^2)) and Kangsabati (9,660 km km^2)) with respect to Subarnarekha (29,196 km km^2)) basin. The results of both homogeneity analysis show that Brahmani basin is the most homogeneous with respect to Subarnarekha river basin in terms of physical characteristics (land use land cover classes, soiltype and elevation). The calibration and validation of model parameters of Brahmani basin is in progress which are to be transferred into the SHM set up of Subarnarekha basin and results are to be compared with the results of calibrated and validated parameter set up of SHM of Subarnarekha basin to test the applicability of SHM in hydrologically homogeneous regions of India. Keywords: SHM, regionalization, homogeneity, donor catchment, similarity index, cluster analysis
Shi, Yan; Xiong, Jing; Sun, Dongmei; Liu, Wei; Wei, Feng; Ma, Shuangcheng; Lin, Ruichao
2015-08-01
An accurate and sensitive high-performance liquid chromatography method coupled with ultralviolet detection and precolumn derivatization was developed for the simultaneous quantification of the major bile acids in Artificial Calculus bovis, including cholic acid, hyodeoxycholic acid, chenodeoxycholic acid, and deoxycholic acid. The extraction, derivatization, chromatographic separation, and detection parameters were fully optimized. The samples were extracted with methanol by ultrasonic extraction. Then, 2-bromine-4'-nitroacetophenone and 18-crown ether-6 were used for derivatization. The chromatographic separation was performed on an Agilent SB-C18 column (250 × 4.6 mm id, 5 μm) at a column temperature of 30°C and liquid flow rate of 1.0 mL/min using water and methanol as the mobile phase with a gradient elution. The detection wavelength was 263 nm. The method was extensively validated by evaluating the linearity (r(2) ≥ 0.9980), recovery (94.24-98.91%), limits of detection (0.25-0.31 ng) and limits of quantification (0.83-1.02 ng). Seventeen samples were analyzed using the developed and validated method. Then, the amounts of bile acids were analyzed by hierarchical agglomerative clustering analysis and principal component analysis. The results of the chemometric analysis showed that the contents of these compounds reflect the intrinsic quality of artificial Calculus bovis, and two compounds (hyodeoxycholic acid and chenodeoxycholic acid) were the most important markers for quality evaluating. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Forbes, Miriam K; Kotov, Roman; Ruggero, Camilo J; Watson, David; Zimmerman, Mark; Krueger, Robert F
2017-11-01
A large body of research has focused on identifying the optimal number of dimensions - or spectra - to model individual differences in psychopathology. Recently, it has become increasingly clear that ostensibly competing models with varying numbers of spectra can be synthesized in empirically derived hierarchical structures. We examined the convergence between top-down (bass-ackwards or sequential principal components analysis) and bottom-up (hierarchical agglomerative cluster analysis) statistical methods for elucidating hierarchies to explicate the joint hierarchical structure of clinical and personality disorders. Analyses examined 24 clinical and personality disorders based on semi-structured clinical interviews in an outpatient psychiatric sample (n=2900). The two methods of hierarchical analysis converged on a three-tier joint hierarchy of psychopathology. At the lowest tier, there were seven spectra - disinhibition, antagonism, core thought disorder, detachment, core internalizing, somatoform, and compulsivity - that emerged in both methods. These spectra were nested under the same three higher-order superspectra in both methods: externalizing, broad thought dysfunction, and broad internalizing. In turn, these three superspectra were nested under a single general psychopathology spectrum, which represented the top tier of the hierarchical structure. The hierarchical structure mirrors and extends upon past research, with the inclusion of a novel compulsivity spectrum, and the finding that psychopathology is organized in three superordinate domains. This hierarchy can thus be used as a flexible and integrative framework to facilitate psychopathology research with varying levels of specificity (i.e., focusing on the optimal level of detailed information, rather than the optimal number of factors). Copyright © 2017 Elsevier Inc. All rights reserved.
Gamage, I H; Jonker, A; Zhang, X; Yu, P
2014-01-24
The objective of this study was to determine the possibility of using molecular spectroscopy with multivariate technique as a fast method to detect the source effects among original feedstock sources of wheat and their corresponding co-products, wheat DDGS, from bioethanol production. Different sources of the bioethanol feedstock and their corresponding bioethanol co-products, three samples per source, were collected from the same newly-built bioethanol plant with current bioethanol processing technology. Multivariate molecular spectral analyses were carried out using agglomerative hierarchical cluster analysis (AHCA) and principal component analysis (PCA). The molecular spectral data of different feedstock sources and their corresponding co-products were compared at four different regions of ca. 1800-1725 cm(-1) (carbonyl CO ester, mainly related to lipid structure conformation), ca. 1725-1482 cm(-1) (amide I and amide II region mainly related to protein structure conformation), ca. 1482-1180 cm(-1) (mainly associated with structural carbohydrate) and ca. 1180-800 cm(-1) (mainly related to carbohydrates) in complex plant-based system. The results showed that the molecular spectroscopy with multivariate technique could reveal the structural differences among the bioethanol feedstock sources and among their corresponding co-products. The AHCA and PCA analyses were able to distinguish the molecular structure differences associated with chemical functional groups among the different sources of the feedstock and their corresponding co-products. The molecular spectral differences indicated the differences in functional, biomolecular and biopolymer groups which were confirmed by wet chemical analysis. These biomolecular and biopolymer structural differences were associated with chemical and nutrient profiles and nutrient utilization and availability. Molecular spectral analyses had the potential to identify molecular structure difference among bioethanol feedstock sources and their corresponding co-products. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Gamage, I. H.; Jonker, A.; Zhang, X.; Yu, P.
2014-01-01
The objective of this study was to determine the possibility of using molecular spectroscopy with multivariate technique as a fast method to detect the source effects among original feedstock sources of wheat and their corresponding co-products, wheat DDGS, from bioethanol production. Different sources of the bioethanol feedstock and their corresponding bioethanol co-products, three samples per source, were collected from the same newly-built bioethanol plant with current bioethanol processing technology. Multivariate molecular spectral analyses were carried out using agglomerative hierarchical cluster analysis (AHCA) and principal component analysis (PCA). The molecular spectral data of different feedstock sources and their corresponding co-products were compared at four different regions of ca. 1800-1725 cm-1 (carbonyl Cdbnd O ester, mainly related to lipid structure conformation), ca. 1725-1482 cm-1 (amide I and amide II region mainly related to protein structure conformation), ca. 1482-1180 cm-1 (mainly associated with structural carbohydrate) and ca. 1180-800 cm-1 (mainly related to carbohydrates) in complex plant-based system. The results showed that the molecular spectroscopy with multivariate technique could reveal the structural differences among the bioethanol feedstock sources and among their corresponding co-products. The AHCA and PCA analyses were able to distinguish the molecular structure differences associated with chemical functional groups among the different sources of the feedstock and their corresponding co-products. The molecular spectral differences indicated the differences in functional, biomolecular and biopolymer groups which were confirmed by wet chemical analysis. These biomolecular and biopolymer structural differences were associated with chemical and nutrient profiles and nutrient utilization and availability. Molecular spectral analyses had the potential to identify molecular structure difference among bioethanol feedstock sources and their corresponding co-products.
Hyperspectral remote sensing of paddy crop using insitu measurement and clustering technique
NASA Astrophysics Data System (ADS)
Moharana, S.; Dutta, S.
2014-11-01
Rice Agriculture, mainly cultivated in South Asia regions, is being monitored for extracting crop parameter, crop area, crop growth profile, crop yield using both optical and microwave remote sensing. Hyperspectral data provide more detailed information of rice agriculture. The present study was carried out at the experimental station of the Regional Rainfed Low land Rice Research Station, Assam, India (26.1400° N, 91.7700° E) and the overall climate of the study area comes under Lower Brahmaputra Valley (LBV) Agro Climatic Zones. The hyperspectral measurements were made in the year 2009 from 72 plots that include eight rice varieties along with three different level of nitrogen treatments (50, 100, 150 kg/ha) covering rice transplanting to the crop harvesting period. With an emphasis to varieties, hyperspectral measurements were taken in the year 2014 from 24 plots having 24 rice genotypes with different crop developmental ages. All the measurements were performed using a spectroradiometer with a spectral range of 350-1050 nm under direct sunlight of a cloud free sky and stable condition of the atmosphere covering more than 95 % canopy. In this study, reflectance collected from canopy of rice were expressed in terms of waveforms. Furthermore, generated waveforms were analysed for all combinations of nitrogen applications and varieties. A hierarchical clustering technique was employed to classify these waveforms into different groups. By help of agglomerative clustering algorithm a few number of clusters were finalized for different rice varieties along with nitrogen treatments. By this clustering approach, observational error in spectroradiometer reflectance was also nullified. From this hierarchical clustering, appropriate spectral signature for rice canopy were identified and will help to create rice crop classification accurately and therefore have a prospect to make improved information on rice agriculture at both local and regional scales. From this hierarchical clustering, spectral signature library for rice canopy were identified which will help to create rice crop classification maps and critical wave bands like green (519,559 nm), red (649 nm), red edge (729 nm) and NIR region (779,819 nm) were marked sensitive to nitrogen which will further help in nitrogen mapping of paddy agriculture over therefore have the prospect to make improved informed decisions.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
A novel community detection method in bipartite networks
NASA Astrophysics Data System (ADS)
Zhou, Cangqi; Feng, Liang; Zhao, Qianchuan
2018-02-01
Community structure is a common and important feature in many complex networks, including bipartite networks, which are used as a standard model for many empirical networks comprised of two types of nodes. In this paper, we propose a two-stage method for detecting community structure in bipartite networks. Firstly, we extend the widely-used Louvain algorithm to bipartite networks. The effectiveness and efficiency of the Louvain algorithm have been proved by many applications. However, there lacks a Louvain-like algorithm specially modified for bipartite networks. Based on bipartite modularity, a measure that extends unipartite modularity and that quantifies the strength of partitions in bipartite networks, we fill the gap by developing the Bi-Louvain algorithm that iteratively groups the nodes in each part by turns. This algorithm in bipartite networks often produces a balanced network structure with equal numbers of two types of nodes. Secondly, for the balanced network yielded by the first algorithm, we use an agglomerative clustering method to further cluster the network. We demonstrate that the calculation of the gain of modularity of each aggregation, and the operation of joining two communities can be compactly calculated by matrix operations for all pairs of communities simultaneously. At last, a complete hierarchical community structure is unfolded. We apply our method to two benchmark data sets and a large-scale data set from an e-commerce company, showing that it effectively identifies community structure in bipartite networks.
Krami, Loghman Khoda; Amiri, Fazel; Sefiyanian, Alireza; Shariff, Abdul Rashid B Mohamed; Tabatabaie, Tayebeh; Pradhan, Biswajeet
2013-12-01
One hundred and thirty composite soil samples were collected from Hamedan county, Iran to characterize the spatial distribution and trace the sources of heavy metals including As, Cd, Co, Cr, Cu, Ni, Pb, V, Zn, and Fe. The multivariate gap statistical analysis was used; for interrelation of spatial patterns of pollution, the disjunctive kriging and geoenrichment factor (EF(G)) techniques were applied. Heavy metals and soil properties were grouped using agglomerative hierarchical clustering and gap statistic. Principal component analysis was used for identification of the source of metals in a set of data. Geostatistics was used for the geospatial data processing. Based on the comparison between the original data and background values of the ten metals, the disjunctive kriging and EF(G) techniques were used to quantify their geospatial patterns and assess the contamination levels of the heavy metals. The spatial distribution map combined with the statistical analysis showed that the main source of Cr, Co, Ni, Zn, Pb, and V in group A land use (agriculture, rocky, and urban) was geogenic; the origin of As, Cd, and Cu was industrial and agricultural activities (anthropogenic sources). In group B land use (rangeland and orchards), the origin of metals (Cr, Co, Ni, Zn, and V) was mainly controlled by natural factors and As, Cd, Cu, and Pb had been added by organic factors. In group C land use (water), the origin of most heavy metals is natural without anthropogenic sources. The Cd and As pollution was relatively more serious in different land use. The EF(G) technique used confirmed the anthropogenic influence of heavy metal pollution. All metals showed concentrations substantially higher than their background values, suggesting anthropogenic pollution.
Caillé, Soline; Samson, Alain; Wirth, Jérémie; Diéval, Jean-Baptiste; Vidal, Stéphane; Cheynier, Véronique
2010-02-15
It is widely accepted that oxygen contributes to wine development by impacting its colour, aromatic bouquet, and mouth-feel properties. The wine industry can now also take advantage of engineered solutions to deliver known amounts of oxygen into bottles through the closures. This study was aimed at monitoring the influence of oxygen pick-up, before (micro-oxygenation, Mox) and after (nano-oxygenation) bottling, on wine sensory evolution. Red Grenache wines were prepared either by flash release (FR) or traditional soaking (Trad) and with or without Mox during elevage (FR+noMox, FR+Mox, Trad+noMox, Trad+Mox). The rate of nano oxygenation was controlled by combining consistent oxygen transfer rate (OTR) closures and different oxygen controlled storage conditions. Wine sensory characteristics were analyzed by sensory profile, at bottling (T0) and after 5 and 10 months of ageing, by a panel of trained judges. Effects of winemaking techniques and OTR were analyzed by multivariate analysis (principal component analysis and agglomerative hierarchical clustering) and analysis of variance. Results showed that, at bottling, Trad wines were perceived more animal and FR wines more bitter and astringent. Mox wines showed more orange shade. At 5 and 10 months, visual and olfactory differences were observed according to the OTR levels: modalities with higher oxygen ingress were darker and fruitier but also perceived significantly less animal than modalities with lower oxygen. Along the 10 months of ageing, the influence of OTR became more important as shown by increased significance levels of the observed differences. As the mouth-feel properties of the wines were mainly dictated by winemaking techniques, OTR had only little impact on "in mouth" attributes. Copyright 2009 Elsevier B.V. All rights reserved.
Adaptation of Chain Event Graphs for use with Case-Control Studies in Epidemiology.
Keeble, Claire; Thwaites, Peter Adam; Barber, Stuart; Law, Graham Richard; Baxter, Paul David
2017-09-26
Case-control studies are used in epidemiology to try to uncover the causes of diseases, but are a retrospective study design known to suffer from non-participation and recall bias, which may explain their decreased popularity in recent years. Traditional analyses report usually only the odds ratio for given exposures and the binary disease status. Chain event graphs are a graphical representation of a statistical model derived from event trees which have been developed in artificial intelligence and statistics, and only recently introduced to the epidemiology literature. They are a modern Bayesian technique which enable prior knowledge to be incorporated into the data analysis using the agglomerative hierarchical clustering algorithm, used to form a suitable chain event graph. Additionally, they can account for missing data and be used to explore missingness mechanisms. Here we adapt the chain event graph framework to suit scenarios often encountered in case-control studies, to strengthen this study design which is time and financially efficient. We demonstrate eight adaptations to the graphs, which consist of two suitable for full case-control study analysis, four which can be used in interim analyses to explore biases, and two which aim to improve the ease and accuracy of analyses. The adaptations are illustrated with complete, reproducible, fully-interpreted examples, including the event tree and chain event graph. Chain event graphs are used here for the first time to summarise non-participation, data collection techniques, data reliability, and disease severity in case-control studies. We demonstrate how these features of a case-control study can be incorporated into the analysis to provide further insight, which can help to identify potential biases and lead to more accurate study results.
Big data driven cycle time parallel prediction for production planning in wafer manufacturing
NASA Astrophysics Data System (ADS)
Wang, Junliang; Yang, Jungang; Zhang, Jie; Wang, Xiaoxi; Zhang, Wenjun Chris
2018-07-01
Cycle time forecasting (CTF) is one of the most crucial issues for production planning to keep high delivery reliability in semiconductor wafer fabrication systems (SWFS). This paper proposes a novel data-intensive cycle time (CT) prediction system with parallel computing to rapidly forecast the CT of wafer lots with large datasets. First, a density peak based radial basis function network (DP-RBFN) is designed to forecast the CT with the diverse and agglomerative CT data. Second, the network learning method based on a clustering technique is proposed to determine the density peak. Third, a parallel computing approach for network training is proposed in order to speed up the training process with large scaled CT data. Finally, an experiment with respect to SWFS is presented, which demonstrates that the proposed CTF system can not only speed up the training process of the model but also outperform the radial basis function network, the back-propagation-network and multivariate regression methodology based CTF methods in terms of the mean absolute deviation and standard deviation.
Fleming, Erin E.; Ziegler, Gregory R.; Hayes, John E.
2015-01-01
Multiple rapid sensory profiling techniques have been developed as more efficient alternatives to traditional sensory descriptive analysis. Here, we compare the results of three rapid sensory profiling techniques – check-all-that-apply (CATA), sorting, and polarized sensory positioning (PSP) – using a diverse range of astringent stimuli. These rapid methods differ in their theoretical basis, implementation, and data analyses, and the relative advantages and limitations are largely unexplored. Additionally, we were interested in using these methods to compare varied astringent stimuli, as these compounds are difficult to characterize using traditional descriptive analysis due to high fatigue and potential carry-over. In the CATA experiment, subjects (n=41) were asked to rate the overall intensity of each stimulus as well as to endorse any relevant terms (from a list of 13) which characterized the sample. In the sorting experiment, subjects (n=30) assigned intensity-matched stimuli into groups 1-on-1 with the experimenter. In the PSP experiment, (n=41) subjects first sampled and took notes on three blind references (‘poles’) before rating each stimulus for its similarity to each of the 3 poles. Two-dimensional perceptual maps from correspondence analysis (CATA), multidimensional scaling (sorting), and multiple factor analysis (PSP) were remarkably similar, with normalized RV coefficients indicating significantly similar plots, regardless of method. Agglomerative hierarchical clustering of all data sets using Ward’s minimum variance as the linkage criteria showed the clusters of astringent stimuli were approximately based on the respective class of astringent agent. Based on the descriptive CATA data, it appears these differences may be due to the presence of side tastes such as bitterness and sourness, rather than astringent sub-qualities per se. Although all three methods are considered ‘rapid,’ our prior experience with sorting suggests it is best performed 1:1 with the experimenter, which makes sorting relatively less efficient than CATA or PSP. Based on the evaluation criteria used here, the choice of method depends on the time constraints of the experimenter and the need for descriptive terms to understand the sensory space of the samples. Accordingly, we recommend a mixed approach that combines CATA with a subsequent PSP task so that the product space can be well characterized before choosing poles for PSP. PMID:26113771
Saigal, Christopher S; Lambrechts, Sylvia I; Seenu Srinivasan, V; Dahan, Ely
2017-06-01
Many guidelines advocate the use of shared decision making for men with newly diagnosed prostate cancer. Decision aids can facilitate the process of shared decision making. Implicit in this approach is the idea that physicians understand which elements of treatment matter to patients. Little formal work exists to guide physicians or developers of decision aids in identifying these attributes. We use a mixed-methods technique adapted from marketing science, the 'Voice of the Patient', to describe and identify treatment elements of value for men with localized prostate cancer. We conducted semi-structured interviews with 30 men treated for prostate cancer in the urology clinic of the West Los Angeles Veteran Affairs Medical Center. We used a qualitative analysis to generate themes in patient narratives, and a quantitative approach, agglomerative hierarchical clustering, to identify attributes of treatment that were most relevant to patients making decisions about prostate cancer. We identified five 'traditional' prostate cancer treatment attributes: sexual dysfunction, bowel problems, urinary problems, lifespan, and others' opinions. We further identified two novel treatment attributes: a treatment's ability to validate a sense of proactivity and the need for an incision (separate from risks of surgery). Application of a successful marketing technique, the 'Voice of the Customer', in a clinical setting elicits non-obvious attributes that highlight unique patient decision-making concerns. Use of this method in the development of decision aids may result in more effective decision support.
Ruiz, Duncan D. A.; Norberto de Souza, Osmar
2015-01-01
Protein receptor conformations, obtained from molecular dynamics (MD) simulations, have become a promising treatment of its explicit flexibility in molecular docking experiments applied to drug discovery and development. However, incorporating the entire ensemble of MD conformations in docking experiments to screen large candidate compound libraries is currently an unfeasible task. Clustering algorithms have been widely used as a means to reduce such ensembles to a manageable size. Most studies investigate different algorithms using pairwise Root-Mean Square Deviation (RMSD) values for all, or part of the MD conformations. Nevertheless, the RMSD only may not be the most appropriate gauge to cluster conformations when the target receptor has a plastic active site, since they are influenced by changes that occur on other parts of the structure. Hence, we have applied two partitioning methods (k-means and k-medoids) and four agglomerative hierarchical methods (Complete linkage, Ward’s, Unweighted Pair Group Method and Weighted Pair Group Method) to analyze and compare the quality of partitions between a data set composed of properties from an enzyme receptor substrate-binding cavity and two data sets created using different RMSD approaches. Ensembles of representative MD conformations were generated by selecting a medoid of each group from all partitions analyzed. We investigated the performance of our new method for evaluating binding conformation of drug candidates to the InhA enzyme, which were performed by cross-docking experiments between a 20 ns MD trajectory and 20 different ligands. Statistical analyses showed that the novel ensemble, which is represented by only 0.48% of the MD conformations, was able to reproduce 75% of all dynamic behaviors within the binding cavity for the docking experiments performed. Moreover, this new approach not only outperforms the other two RMSD-clustering solutions, but it also shows to be a promising strategy to distill biologically relevant information from MD trajectories, especially for docking purposes. PMID:26218832
De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D A; Norberto de Souza, Osmar
2015-01-01
Protein receptor conformations, obtained from molecular dynamics (MD) simulations, have become a promising treatment of its explicit flexibility in molecular docking experiments applied to drug discovery and development. However, incorporating the entire ensemble of MD conformations in docking experiments to screen large candidate compound libraries is currently an unfeasible task. Clustering algorithms have been widely used as a means to reduce such ensembles to a manageable size. Most studies investigate different algorithms using pairwise Root-Mean Square Deviation (RMSD) values for all, or part of the MD conformations. Nevertheless, the RMSD only may not be the most appropriate gauge to cluster conformations when the target receptor has a plastic active site, since they are influenced by changes that occur on other parts of the structure. Hence, we have applied two partitioning methods (k-means and k-medoids) and four agglomerative hierarchical methods (Complete linkage, Ward's, Unweighted Pair Group Method and Weighted Pair Group Method) to analyze and compare the quality of partitions between a data set composed of properties from an enzyme receptor substrate-binding cavity and two data sets created using different RMSD approaches. Ensembles of representative MD conformations were generated by selecting a medoid of each group from all partitions analyzed. We investigated the performance of our new method for evaluating binding conformation of drug candidates to the InhA enzyme, which were performed by cross-docking experiments between a 20 ns MD trajectory and 20 different ligands. Statistical analyses showed that the novel ensemble, which is represented by only 0.48% of the MD conformations, was able to reproduce 75% of all dynamic behaviors within the binding cavity for the docking experiments performed. Moreover, this new approach not only outperforms the other two RMSD-clustering solutions, but it also shows to be a promising strategy to distill biologically relevant information from MD trajectories, especially for docking purposes.
NASA Astrophysics Data System (ADS)
Girardi, P.; Pastres, R.; Gaetan, C.; Mangin, A.; Taji, M. A.
2015-12-01
In this paper, we present the results of a classification of Adriatic waters, based on spatial time series of remotely sensed Chlorophyll type-a. The study was carried out using a clustering procedure combining quantile smoothing and an agglomerative clustering algorithms. The smoothing function includes a seasonal term, thus allowing one to classify areas according to “similar” seasonal evolution, as well as according to “similar” trends. This methodology, which is here applied for the first time to Ocean Colour data, is more robust with respect to other classical methods, as it does not require any assumption on the probability distribution of the data. This approach was applied to the classification of an eleven year long time series, from January 2002 to December 2012, of monthly values of Chlorophyll type-a concentrations covering the whole Adriatic Sea. The data set was made available by ACRI (http://hermes.acri.fr) in the framework of the Glob-Colour Project (http://www.globcolour.info). Data were obtained by calibrating Ocean Colour data provided by different satellite missions, such as MERIS, SeaWiFS and MODIS. The results clearly show the presence of North-South and West-East gradient in the level of Chlorophyll, which is consistent with literature findings. This analysis could provide a sound basis for the identification of “water bodies” and of Chlorophyll type-a thresholds which define their Good Ecological Status, in terms of trophic level, as required by the implementation of the Marine Strategy Framework Directive. The forthcoming availability of Sentinel-3 OLCI data, in continuity of the previous missions, and with perspective of more than a 15-year monitoring system, offers a real opportunity of expansion of our study as a strong support to the implementation of both the EU Marine Strategy Framework Directive and the UNEP-MAP Ecosystem Approach in the Mediterranean.
Oligosaccharides in feces of breast- and formula-fed babies.
Albrecht, Simone; Schols, Henk A; van Zoeren, Diny; van Lingen, Richard A; Groot Jebbink, Liesbeth J M; van den Heuvel, Ellen G H M; Voragen, Alphons G J; Gruppen, Harry
2011-10-18
So far, little is known on the fate of oligosaccharides in the colon of breast- and formula-fed babies. Using capillary electrophoresis with laser induced fluorescence detector coupled to a mass spectrometer (CE-LIF-MS(n)), we studied the fecal oligosaccharide profiles of 27 two-month-old breast-, formula- and mixed-fed preterm babies. The interpretation of the complex oligosaccharide profiles was facilitated by beforehand clustering the CE-LIF data points by agglomerative hierarchical clustering (AHC). In the feces of breast-fed babies, characteristic human milk oligosaccharide (HMO) profiles, showing genetic fingerprints known for human milk of secretors and non-secretors, were recognized. Alternatively, advanced degradation and bioconversion of HMOs, resulting in an accumulation of acidic HMOs or HMO bioconversion products was observed. Independent of the prebiotic supplementation of the formula with galactooligosaccharides (GOS) at the level used, similar oligosaccharide profiles of low peak abundance were obtained for formula-fed babies. Feeding influences the presence of diet-related oligosaccharides in baby feces and gastrointestinal adaptation plays an important role herein. Four fecal oligosaccharides, characterized as HexNAc-Hex-Hex, Hex-[Fuc]-HexNAc-Hex, HexNAc-[Fuc]-Hex-Hex and HexNAc-[Fuc]-Hex-HexNAc-Hex-Hex, highlighted an active gastrointestinal metabolization of the feeding-related oligosaccharides. Their presence was linked to the gastrointestinal mucus layer and the blood-group determinant oligosaccharides therein, which are characteristic for the host's genotype. Copyright © 2011 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Mehmood, S.; Ashfaq, M.; Evans, K. J.; Black, R. X.; Hsu, H. H.
2017-12-01
Extreme precipitation during summer season has shown an increasing trend across South Asia in recent decades, causing an exponential increase in weather related losses. Here we combine a cluster analyses technique (Agglomerative Hierarchical Clustering) with a Lagrangian based moisture analyses technique to investigate potential commonalities in the characteristics of the large scale meteorological patterns (LSMP) and moisture anomalies associated with the observed extreme precipitation events, and their representation in the Department of Energy model ACME. Using precipitation observations from the Indian Meteorological Department (IMD) and Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE), and atmospheric variables from Era-Interim Reanalysis, we first identify LSMP both in upper and lower troposphere that are responsible for wide spread precipitation extreme events during 1980-2015 period. For each of the selected extreme event, we perform moisture source analyses to identify major evaporative sources that sustain anomalous moisture supply during the course of the event, with a particular focus on local terrestrial moisture recycling. Further, we perform similar analyses on two sets of five-member ensemble of ACME model (1-degree and ¼ degree) to investigate the ability of ACME model in simulating precipitation extremes associated with each of the LSMP patterns and associated anomalous moisture sourcing from each of the terrestrial and oceanic evaporative region. Comparison of low and high-resolution model configurations provides insight about the influence of horizontal grid spacing in the simulation of extreme precipitation and the governing mechanisms.
A preliminary classification system for vegetation of Alaska.
Leslie A. Viereck; C.T. Dyrness
1980-01-01
A hierarchical system, with five levels of resolution, is proposed for classifying Alaska vegetation. The system, which is agglomerative, starts with 415 known Alaska plant communities which are listed and referenced. At the broadest level of resolution the system contains five formations - forest, tundra, shrubland, herbaceous vegetation, and aquatic vegetation.
Diagnostic index of 3D osteoarthritic changes in TMJ condylar morphology
NASA Astrophysics Data System (ADS)
Gomes, Liliane R.; Gomes, Marcelo; Jung, Bryan; Paniagua, Beatriz; Ruellas, Antonio C.; Gonçalves, João. Roberto; Styner, Martin A.; Wolford, Larry; Cevidanes, Lucia
2015-03-01
The aim of this study was to investigate imaging statistical approaches for classifying 3D osteoarthritic morphological variations among 169 Temporomandibular Joint (TMJ) condyles. Cone beam Computed Tomography (CBCT) scans were acquired from 69 patients with long-term TMJ Osteoarthritis (OA) (39.1 ± 15.7 years), 15 patients at initial diagnosis of OA (44.9 ± 14.8 years) and 7 healthy controls (43 ± 12.4 years). 3D surface models of the condyles were constructed and Shape Correspondence was used to establish correspondent points on each model. The statistical framework included a multivariate analysis of covariance (MANCOVA) and Direction-Projection- Permutation (DiProPerm) for testing statistical significance of the differences between healthy control and the OA group determined by clinical and radiographic diagnoses. Unsupervised classification using hierarchical agglomerative clustering (HAC) was then conducted. Condylar morphology in OA and healthy subjects varied widely. Compared with healthy controls, OA average condyle was statistically significantly smaller in all dimensions except its anterior surface. Significant flattening of the lateral pole was noticed at initial diagnosis (p < 0.05). It was observed areas of 3.88 mm bone resorption at the superior surface and 3.10 mm bone apposition at the anterior aspect of the long-term OA average model. 1000 permutation statistics of DiProPerm supported a significant difference between the healthy control group and OA group (t = 6.7, empirical p-value = 0.001). Clinically meaningful unsupervised classification of TMJ condylar morphology determined a preliminary diagnostic index of 3D osteoarthritic changes, which may be the first step towards a more targeted diagnosis of this condition.
A synoptic and dynamical characterization of wave-train and blocking cold surge over East Asia
NASA Astrophysics Data System (ADS)
Park, Tae-Won; Ho, Chang-Hoi; Deng, Yi
2014-08-01
Through an agglomerative hierarchical clustering method, cold surges over East Asia are classified into two distinct types based on the spatial pattern of the geopotential height anomalies at 300 hPa. One is the wave-train type that is associated with developing large-scale waves across the Eurasian continent. The other is the blocking type whose occurrence accompanies subarctic blocking. During the wave-train cold surge, growing baroclinic waves induce a southeastward expansion of the Siberian High and strong northerly winds over East Asia. Blocking cold surge, on the other hand, is associated with a southward expansion of the Siberian High and northeasterly winds inherent to a height dipole consisting of the subarctic blocking and the East Asian coastal trough. The blocking cold surge tends to be more intense and last longer compared to the wave-train type. The wave-train cold surge is associated with the formation of a negative upper tropospheric height anomaly southeast of Greenland approximately 12 days before the surge occurrence. Further analysis of isentropic potential vorticity reveals that this height anomaly could originate from the lower stratosphere over the North Atlantic. Cold surge of the blocking type occurs with an amplifying positive geopotential and a negative potential vorticity anomaly over the Arctic and the northern Eurasia in stratosphere. These anomalies resemble the stratospheric signature of a negative phase of the Arctic Oscillation. This stratospheric feature is further demonstrated by the observation that the blocking type cold surge occurs more often when the Arctic Oscillation is in its negative phase.
NASA Astrophysics Data System (ADS)
Kujanová, Kateřina; Matoušková, Milada; Kliment, Zdeněk
2016-04-01
A fundamental prerequisite for assessing the current ecological status of streams is the establishment of reference conditions for each stream type that serve as a benchmark. The hydromorphological reference conditions reflect the natural channel behavior, which is extremely variable. Significant parameters of natural channel behavior were determined using a combination of four selected statistical methods: Principal Component Analysis, Agglomerative Hierarchical Clustering, correlation, and regression. Macroscale analyses of data about altitude, stream order, channel slope, valley floor slope, sinuosity, and characteristics of the hydrological regime were conducted for 3197 reaches of major rivers in the Czech Republic with total length of 15,636 km. On the basis of selected significant parameters and their threshold values, channels were classified into groups of river characteristics based on shared behaviors. The channel behavior within these groups was validated using hydromorphological characteristics of natural channels determined during field research at reference sites. Classification of channels into groups confirmed the fundamental differences between channel behavior under conditions of the Hercynian System and the flysch belt of the Western Carpathians in the Czech Republic and determined a specific group in the flattened high areas of mountains in the Bohemian Massif. Validating confirmed the distinctions between groups of river characteristics and the uniqueness of each one; it also emphasized the benefits of using qualitative data and riparian zone characteristics for describing channel behavior. Channel slope, entrenchment ratio, bed structure, and d50 were determined as quantitative characteristics of natural channel behavior.
Model Independence in Downscaled Climate Projections: a Case Study in the Southeast United States
NASA Astrophysics Data System (ADS)
Gray, G. M. E.; Boyles, R.
2016-12-01
Downscaled climate projections are used to deduce how the climate will change in future decades at local and regional scales. It is important to use multiple models to characterize part of the future uncertainty given the impact on adaptation decision making. This is traditionally employed through an equally-weighted ensemble of multiple GCMs downscaled using one technique. Newer practices include several downscaling techniques in an effort to increase the ensemble's representation of future uncertainty. However, this practice may be adding statistically dependent models to the ensemble. Previous research has shown a dependence problem in the GCM ensemble in multiple generations, but has not been shown in the downscaled ensemble. In this case study, seven downscaled climate projections on the daily time scale are considered: CLAREnCE10, SERAP, BCCA (CMIP5 and CMIP3 versions), Hostetler, CCR, and MACA-LIVNEH. These data represent 83 ensemble members, 44 GCMs, and two generations of GCMs. Baseline periods are compared against the University of Idaho's METDATA gridded observation dataset. Hierarchical agglomerative clustering is applied to the correlated errors to determine dependent clusters. Redundant GCMs across different downscaling techniques show the most dependence, while smaller dependence signals are detected within downscaling datasets and across generations of GCMs. These results indicate that using additional downscaled projections to increase the ensemble size must be done with care to avoid redundant GCMs and the process of downscaling may increase the dependence of those downscaled GCMs. Climate model generation does not appear dissimilar enough to be treated as two separate statistical populations for ensemble building at the local and regional scales.
Diversity and biogeographical patterns of legumes (Leguminosae) indigenous to southern Africa
Trytsman, Marike; Westfall, Robert H.; Breytenbach, Philippus J. J.; Calitz, Frikkie J.; van Wyk, Abraham E.
2016-01-01
Abstract The principal aim of this study was to establish biogeographical patterns in the legume flora of southern Africa so as to facilitate the selection of species with agricultural potential. Plant collection data from the National Herbarium, South Africa, were analysed to establish the diversity and areas covered by legumes (Leguminosae/Fabaceae) indigenous to South Africa, Lesotho and Swaziland. A total of 27,322 records from 1,619 quarter degree grid cells, representing 1,580 species, 122 genera and 24 tribes were included in the analyses. Agglomerative hierarchical clustering was applied to the presence or absence of legume species in quarter degree grid cells, the resultant natural biogeographical regions (choria) being referred to as leguminochoria. The description of the 16 uniquely formed leguminochoria focuses on defining the associated bioregions and biomes, as well as on the key climate and soil properties. Legume species with a high occurrence in a leguminochorion are listed as key species. The dominant growth form of key species, species richness and range within each leguminochorion is discussed. Floristic links between the leguminochoria are established, by examining and comparing key species common to clusters, using a vegetation classification program. Soil pH and mean annual minimum temperature were found to be the main drivers for distinguishing among legume assemblages. This is the first time that distribution data for legumes has been used to identify biogeographical areas covered by leguminochoria on the subcontinent. One potential application of the results of this study is to assist in the selection of legumes for pasture breeding and soil conservation programs, especially in arid and semi-arid environments. PMID:27829799
Lanting, Rosanne; Nooraee, Nazanin; Werker, Paul M N; van den Heuvel, Edwin R
2014-09-01
Dupuytren disease affects fingers in a variable fashion. Knowledge about specific disease patterns (phenotype) based on location and severity of the disease is lacking. In this cross-sectional study, 344 primary affected hands with Dupuytren disease were physically examined. The Pearson correlation coefficient between the coexistence of Dupuytren disease in pairs of fingers was calculated, and agglomerative hierarchical clustering was applied to identify possible clusters of affected fingers. With a multivariate ordinal logit model, the authors studied the correlation on severity, taking into account age and sex, and tested hypotheses on independence between groups of fingers. The ring finger was most frequently affected by Dupuytren disease, and contractures were seen in 15.1 percent of affected rays. The severity of thumb and index finger, middle and ring fingers, and middle and little fingers was significantly correlated. Occurrences in pairs of fingers were highest in the middle and ring fingers and lowest in the thumb and index finger. Correlation between the ring and little fingers and a correlation between fingers from the ulnar and radial sides could not be demonstrated. Rays on the ulnar side of the hand are predominantly affected. The middle finger is substantially correlated with other fingers on the ulnar side, and the thumb and index finger are correlated; however, there was no evidence that the ulnar side and the radial side were correlated in any way, which suggests that occurrence on one side of the hand does not predict Dupuytren disease on the other side of the hand. Risk, III.
Diversity and biogeographical patterns of legumes (Leguminosae) indigenous to southern Africa.
Trytsman, Marike; Westfall, Robert H; Breytenbach, Philippus J J; Calitz, Frikkie J; van Wyk, Abraham E
2016-01-01
The principal aim of this study was to establish biogeographical patterns in the legume flora of southern Africa so as to facilitate the selection of species with agricultural potential. Plant collection data from the National Herbarium, South Africa, were analysed to establish the diversity and areas covered by legumes (Leguminosae/Fabaceae) indigenous to South Africa, Lesotho and Swaziland. A total of 27,322 records from 1,619 quarter degree grid cells, representing 1,580 species, 122 genera and 24 tribes were included in the analyses. Agglomerative hierarchical clustering was applied to the presence or absence of legume species in quarter degree grid cells, the resultant natural biogeographical regions (choria) being referred to as leguminochoria. The description of the 16 uniquely formed leguminochoria focuses on defining the associated bioregions and biomes, as well as on the key climate and soil properties. Legume species with a high occurrence in a leguminochorion are listed as key species. The dominant growth form of key species, species richness and range within each leguminochorion is discussed. Floristic links between the leguminochoria are established, by examining and comparing key species common to clusters, using a vegetation classification program. Soil pH and mean annual minimum temperature were found to be the main drivers for distinguishing among legume assemblages. This is the first time that distribution data for legumes has been used to identify biogeographical areas covered by leguminochoria on the subcontinent. One potential application of the results of this study is to assist in the selection of legumes for pasture breeding and soil conservation programs, especially in arid and semi-arid environments.
Bradshaw, Corey J. A.; Brook, Barry W.
2016-01-01
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68–0.84 Spearman’s ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows. PMID:26930052
van Krugten, F C W; Goorden, M; van Balkom, A J L M; Spijker, J; Brouwer, W B F; Hakkaart-van Roijen, L
2018-04-01
Early identification of the subgroup of patients with major depressive disorder (MDD) in need of highly specialized care could enhance personalized intervention. This, in turn, may reduce the number of treatment steps needed to achieve and sustain an adequate treatment response. The aim of this study was to identify patient-related indicators that could facilitate the early identification of the subgroup of patients with MDD in need of highly specialized care. Initial patient indicators were derived from a systematic review. Subsequently, a structured conceptualization methodology known as concept mapping was employed to complement the initial list of indicators by clinical expertise and develop a consensus-based conceptual framework. Subject-matter experts were invited to participate in the subsequent steps (brainstorming, sorting, and rating) of the concept mapping process. A final concept map solution was generated using nonmetric multidimensional scaling and agglomerative hierarchical cluster analyses. In total, 67 subject-matter experts participated in the concept mapping process. The final concept map revealed the following 10 major clusters of indicators: 1-depression severity, 2-onset and (treatment) course, 3-comorbid personality disorder, 4-comorbid substance use disorder, 5-other psychiatric comorbidity, 6-somatic comorbidity, 7-maladaptive coping, 8-childhood trauma, 9-social factors, and 10-psychosocial dysfunction. The study findings highlight the need for a comprehensive assessment of patient indicators in determining the need for highly specialized care, and suggest that the treatment allocation of patients with MDD to highly specialized mental healthcare settings should be guided by the assessment of clinical and nonclinical patient factors. © 2018 Wiley Periodicals, Inc.
Bradshaw, Corey J A; Brook, Barry W
2016-01-01
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68-0.84 Spearman's ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.
Abreu, Mauro Henrique Nogueira Guimarães; Sanglard-Oliveira, Carla Aparecida; Jaruche, Abdul Rahman Mustafá; Mambrini, Juliana Vaz de Melo; Werneck, Marcos Azeredo Furquim; Lucas, Simone Dutra
2013-12-23
To describe some sociodemographic and educational characteristics of oral health technicians (OHTs) in public primary health care teams in the state of Minas Gerais, Brazil. A cross-sectional descriptive study was performed based on the telephone survey of a representative sample comprising 231 individuals. A pre-tested instrument was used for the data collection, including questions on gender, age in years, years of work as an OHT, years since graduation as an OHT, formal schooling, individual income in a month, and participation in continuing educational programmes. The descriptive statistic was developed and the formation of clusters, by the agglomerative hierarchy technique based on the furthest neighbour, was based on the age, years of work as an OHT, time since graduation as an OHT, formal schooling, individual income in a month, and participation in continuing educational programmes. Most interviewees (97.1%) were female. A monthly income of USD 300.00 to 600.00 was reported by 77.5% of the sample. Having educational qualifications in excess of their role was reported by approximately 20% of the participants. The median time since graduation was six years, and half of the sample had worked for four years as an OHT. Most interviewees (67.6%) reported having participated in professional continuing educational programmes. Two different clusters were identified based on the sociodemographic and educational characteristics of the sample. The Brazilian OHTs in public primary health care teams in the state of Minas Gerais are mostly female who have had little time since graduation, working experience, and formal schooling sufficient for professional practice.
2013-01-01
Background To describe some sociodemographic and educational characteristics of oral health technicians (OHTs) in public primary health care teams in the state of Minas Gerais, Brazil. Methods A cross-sectional descriptive study was performed based on the telephone survey of a representative sample comprising 231 individuals. A pre-tested instrument was used for the data collection, including questions on gender, age in years, years of work as an OHT, years since graduation as an OHT, formal schooling, individual income in a month, and participation in continuing educational programmes. The descriptive statistic was developed and the formation of clusters, by the agglomerative hierarchy technique based on the furthest neighbour, was based on the age, years of work as an OHT, time since graduation as an OHT, formal schooling, individual income in a month, and participation in continuing educational programmes. Results Most interviewees (97.1%) were female. A monthly income of USD 300.00 to 600.00 was reported by 77.5% of the sample. Having educational qualifications in excess of their role was reported by approximately 20% of the participants. The median time since graduation was six years, and half of the sample had worked for four years as an OHT. Most interviewees (67.6%) reported having participated in professional continuing educational programmes. Two different clusters were identified based on the sociodemographic and educational characteristics of the sample. Conclusions The Brazilian OHTs in public primary health care teams in the state of Minas Gerais are mostly female who have had little time since graduation, working experience, and formal schooling sufficient for professional practice. PMID:24365451
Eggimann, Sven; Truffer, Bernhard; Maurer, Max
2015-11-01
The strong reliance of most utility services on centralised network infrastructures is becoming increasingly challenged by new technological advances in decentralised alternatives. However, not enough effort has been made to develop planning tools designed to address the implications of these new opportunities and to determine the optimal degree of centralisation of these infrastructures. We introduce a planning tool for sustainable network infrastructure planning (SNIP), a two-step techno-economic heuristic modelling approach based on shortest path-finding and hierarchical-agglomerative clustering algorithms to determine the optimal degree of centralisation in the field of wastewater management. This SNIP model optimises the distribution of wastewater treatment plants and the sewer network outlay relative to several cost and sewer-design parameters. Moreover, it allows us to construct alternative optimal wastewater system designs taking into account topography, economies of scale as well as the full size range of wastewater treatment plants. We quantify and confirm that the optimal degree of centralisation decreases with increasing terrain complexity and settlement dispersion while showing that the effect of the latter exceeds that of topography. Case study results for a Swiss community indicate that the calculated optimal degree of centralisation is substantially lower than the current level. Copyright © 2015 Elsevier Ltd. All rights reserved.
Anomaly detection driven active learning for identifying suspicious tracks and events in WAMI video
NASA Astrophysics Data System (ADS)
Miller, David J.; Natraj, Aditya; Hockenbury, Ryler; Dunn, Katherine; Sheffler, Michael; Sullivan, Kevin
2012-06-01
We describe a comprehensive system for learning to identify suspicious vehicle tracks from wide-area motion (WAMI) video. First, since the road network for the scene of interest is assumed unknown, agglomerative hierarchical clustering is applied to all spatial vehicle measurements, resulting in spatial cells that largely capture individual road segments. Next, for each track, both at the cell (speed, acceleration, azimuth) and track (range, total distance, duration) levels, extreme value feature statistics are both computed and aggregated, to form summary (p-value based) anomaly statistics for each track. Here, to fairly evaluate tracks that travel across different numbers of spatial cells, for each cell-level feature type, a single (most extreme) statistic is chosen, over all cells traveled. Finally, a novel active learning paradigm, applied to a (logistic regression) track classifier, is invoked to learn to distinguish suspicious from merely anomalous tracks, starting from anomaly-ranked track prioritization, with ground-truth labeling by a human operator. This system has been applied to WAMI video data (ARGUS), with the tracks automatically extracted by a system developed in-house at Toyon Research Corporation. Our system gives promising preliminary results in highly ranking as suspicious aerial vehicles, dismounts, and traffic violators, and in learning which features are most indicative of suspicious tracks.
On the problem of earthquake correlation in space and time over large distances
NASA Astrophysics Data System (ADS)
Georgoulas, G.; Konstantaras, A.; Maravelakis, E.; Katsifarakis, E.; Stylios, C. D.
2012-04-01
A quick examination of geographical maps with the epicenters of earthquakes marked on them reveals a strong tendency of these points to form compact clusters of irregular shapes and various sizes often traversing with other clusters. According to [Saleur et al. 1996] "earthquakes are correlated in space and time over large distances". This implies that seismic sequences are not formatted randomly but they follow a spatial pattern with consequent triggering of events. Seismic cluster formation is believed to be due to underlying geological natural hazards, which: a) act as the energy storage elements of the phenomenon, and b) tend to form a complex network of numerous interacting faults [Vallianatos and Tzanis, 1998]. Therefore it is imperative to "isolate" meaningful structures (clusters) in order to mine information regarding the underlying mechanism and at a second stage to test the causality effect implied by what is known as the Domino theory [Burgman, 2009]. Ongoing work by Konstantaras et al. 2011 and Katsifarakis et al. 2011 on clustering seismic sequences in the area of the Southern Hellenic Arc and progressively throughout the Greek vicinity and the entire Mediterranean region based on an explicit segmentation of the data based both on their temporal and spatial stamp, following modelling assumptions proposed by Dobrovolsky et al. 1989 and Drakatos et al. 2001, managed to identify geologically validated seismic clusters. These results suggest that that the time component should be included as a dimension during the clustering process as seismic cluster formation is dynamic and the emerging clusters propagate in time. Another issue that has not been investigated yet explicitly is the role of the magnitude of each seismic event. In other words the major seismic event should be treated differently compared to pre or post seismic sequences. Moreover the sometimes irregular and elongated shapes that appear on geophysical maps means that clustering algorithms such as the well known k-means that tend to form "well-shaped" clusters may not suffice for the problem at hand and other families of unsupervised pattern recognition methods might be a better choice. One such algorithm is the DBSCAN algorithm which is based on the notion of density. In this proposed version the density is not estimated solely on the number of seismic events occurring at a specific spatio-temporal area, but also takes into account the size of the seismic event. A second method proposes the use of a modified measure of proximity that will also account for the size of the earthquake along with traditional clustering schemes such as k-means and agglomerative clustering (k-means is seeded with a quite large number for k and the results are fed to the hierarchical algorithm in order to alleviate the memory requirements on one hand and also allow for irregular shapes on the other hand). Preliminary results of seismic cluster formation using these algorithms appear promising as they are in agreement with geophysical observations on distinct seismic regions, such as those of the neighbouring regions in the Ionian sea and that of the southern Hellenic seismic arc; as well as by the location and orientation of the mapped network of underlying natural hazards beneath each clusters vicinity.
Salinity-driven decadal changes in phytoplankton community in the NW Arabian Gulf of Kuwait.
Al-Said, Turki; Al-Ghunaim, Aws; Subba Rao, D V; Al-Yamani, Faiza; Al-Rifaie, Kholood; Al-Baz, Ali
2017-06-01
Evaluation of hydrological data obtained between 2000 and 2013 from a time series station in Kuwait Bay (station K6) and an offshore southern location (station 18) off Kuwait showed drastic increase in salinity by 6 units. We tested the hypothesis that increased salinity impacted phytoplankton community characteristics in these semiarid waters. The Arabian Gulf receives seasonal freshwater discharge in the north via Shatt Al-Arab estuary with a peak during March-July. A north to south gradient in the proportion of the freshwater exists between station A in the vicinity of Shatt Al-Arab estuary and station 18 in the southern offshore area. At station A, the proportion of freshwater was the highest (25.6-42.5%) in 1997 but decreased to 0.8-4.6% by 2012-2013. The prevailing hyperhaline conditions off Kuwait are attributed to decrease in the river flow. Phytoplankton data showed a decrease in the number of constituent taxa in the last one decade from 353 to 159 in the Kuwait Bay and from 164 to 156 in the offshore area. A shift in their biomass was caused by a decrease in diatom species from 243 to 92 in the coastal waters and from 108 to 83 in the offshore areas with a concomitant increase of smaller algae. Mutivariate agglomerative hierarchical cluster analysis, non-metric multi-dimensional scaling, and one-way analysis of similarity analyses on phytoplankton data at different taxonomic levels confirmed significant changes in their community organization on a decadal scale. These evidences support our hypothesis that the salinity-related environmental changes have resulted in a coincidental decrease in species diversity and significant changes in phytoplankton community between the years 2000-2002 and 2012-2013, off Kuwait. This in turn would affect the pelagic trophodynamics as evident from a drastic decrease in the catch landings of Tenulosa ilisha (Suboor), Carangoides sp. (Hamam), Otolithes ruber (Nowaiby), Parastromateus niger (Halwaya), and Epinephelus coioides (Hamoor) in Kuwait.
Zema, Demetrio Antonio; Bombino, Giuseppe; Denisi, Pietro; Lucas-Borja, Manuel Esteban; Zimbone, Santo Marcello
2018-06-12
In mountain streams possible negative impacts of check dams on soil, water and riparian vegetation due to check dam installation can be noticed. In spite of the ample literature on the qualitative effects of engineering works on channel hydrology, morphology, sedimentary effects and riparian vegetation characteristics, quantitative evaluations of the changes induced by check dams on headwater characteristics are rare. In order to fill this gap, this study has evaluated the effects of check dams located in headwaters of Calabria (Southern Italy) on hydrological and geomorphological processes and on the response of riparian vegetation to these actions. The analysis has compared physical and vegetation indicators in transects identified around check dams (upstream and downstream) and far from their direct influence (control transects). Check dams were found to influence significantly unit discharge, surface and subsurface sediments (both upstream and downstream), channel shape and transverse distribution of riparian vegetation (upstream) as well as cover and structure of riparian complexes (downstream). The actions of the structures on torrent longitudinal slope and biodiversity of vegetation were less significant. The differences on bed profile slope were significant only between upstream and downstream transects. The results of the Agglomerative Hierarchical Cluster analysis confirmed the substantial similarity between upstream and control transects, thus highlighting that the construction of check dams, needed to mitigate the hydro-geological risks, has not strongly influenced the torrent functioning and ecology before check dam construction. Moreover, simple and quantitative linkages between torrent hydraulics, geomorphology and vegetation characteristics exist in the analysed headwaters; these relationships among physical adjustments of channels and most of the resulting characteristics of the riparian vegetation are specific for the transect locations with respect of check dams. Conversely, the biodiversity of the riparian vegetation basically eludes any quantitative relations with the physical and other vegetal characteristics of the torrent transects. Copyright © 2018 Elsevier B.V. All rights reserved.
Jadejaroen, Janya; Hamada, Yuzuru; Kawamoto, Yoshi; Malaivijitnond, Suchinda
2015-01-01
Rhesus (Macaca mulatta) and long-tailed (M. fascicularis) macaques are the most commonly used non-human primate models for biomedical research, but it is difficult to identify these two species in the hybrid zone (15-20°N). In this work, we used morphological values obtained via photogrammetry to assess hybrids of rhesus and long-tailed macaques at Khao Khieow Open Zoo (KKZ; 13°21'N, 101°06'E), eastern Thailand. Long-tailed and rhesus macaques have species-specific tail lengths and contrasts of their yellowish pelages. The accuracy and precision of the relative tail length (%RTL) and the contrast of the yellow hue (Cb*) of the pelage, as obtained from photographs, were compared with the corresponding direct measurements (morphometrics). The photogrammetric and morphometric measurements of %RTL and Cb* were highly significantly correlated (r = 0.989 and 0.980, p < 0.001), and there were no significant differences between the two datasets (t test, p = 0.13 and 0.41; n = 42 and 17 for %RTL and Cb*, respectively). The reproducibilities of the %RTL and Cb* measurements (calculated in the photogrammetric case by taking photographs of the same macaques in two different environments) were significantly correlated between the datasets (r = 0.983 and 0.914, p < 0.001 and 0.005), and there were no significant differences between the datasets (t test, p = 0.539 and 0.344; n = 30 each for %RTL and Cb*, respectively). The %RTL and Cb* data were combined with data on the crown and cheek hair patterns and sex skin reddening of the macaques, and this combined data set was then analyzed by multiple correspondence analysis and agglomerative hierarchical cluster analysis, leading to the categorization of the rhesus macaques, long-tailed macaques, and hybrids at KKZ into five groups. Thus, photogrammetry can be utilized to identify macaque species or hybrids when species identification relies mainly on tail length and pelage color.
NASA Astrophysics Data System (ADS)
Ruske, Simon; Topping, David O.; Foot, Virginia E.; Kaye, Paul H.; Stanley, Warren R.; Crawford, Ian; Morse, Andrew P.; Gallagher, Martin W.
2017-03-01
Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was found to be the most effective, on average correctly classifying 82. 8 and 98. 27 % of the testing data, respectively, across the two data sets.A possible alternative to gradient boosting is neural networks. We do however note that this method requires much more user input than the other methods, and we suggest that further research should be conducted using this method, especially using parallelised hardware such as the GPU, which would allow for larger networks to be trained, which could possibly yield better results.We also saw that some methods, such as clustering, failed to utilise the additional shape information provided by the instrument, whilst for others, such as the decision trees, ensemble methods and neural networks, improved performance could be attained with the inclusion of such information.
Apparatus for entrained coal pyrolysis
Durai-Swamy, Kandaswamy
1982-11-16
This invention discloses a process and apparatus for pyrolyzing particulate coal by heating with a particulate solid heating media in a transport reactor. The invention tends to dampen fluctuations in the flow of heating media upstream of the pyrolysis zone, and by so doing forms a substantially continuous and substantially uniform annular column of heating media flowing downwardly along the inside diameter of the reactor. The invention is particularly useful for bituminous or agglomerative type coals.
Pyrolysis process and apparatus
Lee, Chang-Kuei
1983-01-01
This invention discloses a process and apparatus for pyrolyzing particulate coal by heating with a particulate solid heating media in a transport reactor. The invention tends to dampen fluctuations in the flow of heating media upstream of the pyrolysis zone, and by so doing forms a substantially continuous and substantially uniform annular column of heating media flowing downwardly along the inside diameter of the reactor. The invention is particularly useful for bituminous or agglomerative type coals.
NASA Astrophysics Data System (ADS)
Kang, Ziho
This dissertation is divided into four parts: 1) Development of effective methods for comparing visual scanning paths (or scanpaths) for a dynamic task of multiple moving targets, 2) application of the methods to compare the scanpaths of experts and novices for a conflict detection task of multiple aircraft on radar screen, 3) a post-hoc analysis of other eye movement characteristics of experts and novices, and 4) finding out whether the scanpaths of experts can be used to teach the novices. In order to compare experts' and novices' scanpaths, two methods are developed. The first proposed method is the matrix comparisons using the Mantel test. The second proposed method is the maximum transition-based agglomerative hierarchical clustering (MTAHC) where comparisons of multi-level visual groupings are held out. The matrix comparison method was useful for a small number of targets during the preliminary experiment, but turned out to be inapplicable to a realistic case when tens of aircraft were presented on screen; however, MTAHC was effective with large number of aircraft on screen. The experiments with experts and novices on the aircraft conflict detection task showed that their scanpaths are different. The MTAHC result was able to explicitly show how experts visually grouped multiple aircraft based on similar altitudes while novices tended to group them based on convergence. Also, the MTAHC results showed that novices paid much attention to the converging aircraft groups even if they are safely separated by altitude; therefore, less attention was given to the actual conflicting pairs resulting in low correct conflict detection rates. Since the analysis showed the scanpath differences, experts' scanpaths were shown to novices in order to find out its effectiveness. The scanpath treatment group showed indications that they changed their visual movements from trajectory-based to altitude-based movements. Between the treatment and the non-treatment group, there were no significant differences in terms of number of correct detections; however, the treatment group made significantly fewer false alarms.
Diako, Charles; McMahon, Kenneth; Mattinson, Scott; Evans, Marc; Ross, Carolyn
2016-08-01
The objective of this study was to assess the influence of the interaction among alcohol, tannins, and mannoproteins on the aroma, flavor, taste, and mouthfeel characteristics of selected commercial Merlot wines. Merlot wines (n = 61) were characterized for wine chemistry parameters, including pH, titratable acidity, alcohol, glucose, fructose, tannin profile, total proteins, and mannoprotein content. Agglomerative clustering of these physicochemical characteristics revealed 6 groups of wines. Two wines were selected from each group (n = 12) and profiled by a trained sensory evaluation panel. One wine from each group was evaluated using the electronic tongue (e-tongue). Sensory evaluation results showed complex effects among tannins, alcohol, and mannoproteins on the perception of most aromas, flavors, tastes, and mouthfeel attributes (P < 0.05). The e-tongue showed distinct differences among the taste attributes of the 6 groups of wines as indicated by a high discrimination index (DI = 95). Strong correlations (r(2) > 0.930) were reported between the e-tongue and sensory perception of sweet, sour, bitter, burning, astringent, and metallic. This study showed that interactions among wine matrix components influence the resulting sensory perceptions. The strong correlation between the e-tongue and trained panel evaluations indicated the e-tongue can complement sensory evaluations to improve wine quality assessment. © 2016 Institute of Food Technologists®
Sampling Modification Effects in the Subgingival Microbiome Profile of Healthy Children.
Santigli, Elisabeth; Trajanoski, Slave; Eberhard, Katharina; Klug, Barbara
2016-01-01
Background: Oral microbiota are considered major players in the development of periodontal diseases. Thorough knowledge of intact subgingival microbiomes is required to elucidate microbial shifts from health to disease. Aims: This comparative study investigated the subgingival microbiome of healthy children, possible inter- and intra-individual effects of modified sampling, and basic comparability of subgingival microprints. Methods: In five 10-year-old children, biofilm was collected from the upper first premolars and first molars using sterilized, UV-treated paper-points inserted into the subgingival sulcus at eight sites. After supragingival cleaning using an electric toothbrush and water, sampling was performed, firstly, excluding (Mode A) and, secondly, including (Mode B) cleansing with sterile cotton pellets. DNA was extracted from the pooled samples, and primers targeting 16S rRNA hypervariable regions V5 and V6 were used for 454-pyrosequencing. Wilcoxon signed rank test and t -test were applied to compare sampling modes. Principal coordinate analysis (PCoA) and average agglomerative hierarchical clustering were calculated with unweighted UniFrac distance matrices. Sample grouping was tested with permutational MANOVA (Adonis). Results: Data filtering and quality control yielded 67,218 sequences with an average sequence length of 243bp (SD 6.52; range 231-255). Actinobacteria (2.8-24.6%), Bacteroidetes (9.2-25.1%), Proteobacteria (4.9-50.6%), Firmicutes (16.5-57.4%), and Fusobacteria (2.2-17.1%) were the five major phyla found in all samples. Differences in microbial abundances between sampling modes were not evident. High sampling numbers are needed to achieve significance for rare bacterial phyla. Samples taken from one individual using different sampling modes were more similar to each other than to other individuals' samples. PCoA and hierarchical clustering showed a grouping of the paired samples. Permutational MANOVA did not reveal sample grouping by sampling modes ( p = 0.914 by R 2 = 0.09). Conclusion: A slight modification of sampling mode has minor effects corresponding to a natural variability in the microbiome profiles of healthy children. The inter-individual variability in subgingival microprints is greater than intra-individual differences. Statistical analyses of microbial populations should consider this baseline variability and move beyond mere quantification with input from visual analytics. Comparative results are difficult to summarize as methods for studying huge datasets are still evolving. Advanced approaches are needed for sample size calculations in clinical settings.
Geographic distribution of physicians in Portugal.
Isabel, Correia; Paula, Veiga
2010-08-01
The main goals of this paper are to (1) analyse the inequality in geographic distribution of physicians and its evolution, (2) estimate the determinants of physician density, and (3) assess the importance of competitive and agglomerative forces in location decisions. The analysis of the geographic distribution of physicians is based on the ratio of general practitioners (GPs) and specialists to 1,000 inhabitants. The inequality is measured using Gini indices, coefficients of variation, and physician-to-population ratios. The econometric models were estimated by ordinary least squares. The data used refer to 1996 and 2007. The impact of the growing number of physicians, and therefore potential increased competition, on geographic distribution during the period studied was small. Nonetheless, there is evidence of competitive forces acting on the dynamics of doctor localisation. Geographic disparities in physician density are still high, and appear to be due mainly to geographic income inequality.
A Simple Hierarchical Pooling Data Structure for Loop Closure
2016-10-16
ticated agglomerative schemes at a fraction of the effort. 1.1 Related work Loop closure is a key component in robotic mapping (SLAM) [37], autonomous...appearance-only slam-fab-map 2.0. In: Robotics : Science and Systems. vol. 5. Seattle, USA (2009) 7. Dong, J., Soatto, S.: Domain size pooling in local...detection with bags of binary words. In: Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ Intl. Conf. on. pp. 51–58. IEEE (2011) 9. Geiger, A
Video shot boundary detection using region-growing-based watershed method
NASA Astrophysics Data System (ADS)
Wang, Jinsong; Patel, Nilesh; Grosky, William
2004-10-01
In this paper, a novel shot boundary detection approach is presented, based on the popular region growing segmentation method - Watershed segmentation. In image processing, gray-scale pictures could be considered as topographic reliefs, in which the numerical value of each pixel of a given image represents the elevation at that point. Watershed method segments images by filling up basins with water starting at local minima, and at points where water coming from different basins meet, dams are built. In our method, each frame in the video sequences is first transformed from the feature space into the topographic space based on a density function. Low-level features are extracted from frame to frame. Each frame is then treated as a point in the feature space. The density of each point is defined as the sum of the influence functions of all neighboring data points. The height function that is originally used in Watershed segmentation is then replaced by inverting the density at the point. Thus, all the highest density values are transformed into local minima. Subsequently, Watershed segmentation is performed in the topographic space. The intuitive idea under our method is that frames within a shot are highly agglomerative in the feature space and have higher possibilities to be merged together, while those frames between shots representing the shot changes are not, hence they have less density values and are less likely to be clustered by carefully extracting the markers and choosing the stopping criterion.
Révész, Ágnes; Rokob, Tibor András; Jeanne Dit Fouque, Dany; Turiák, Lilla; Memboeuf, Antony; Vékey, Károly; Drahos, László
2018-05-04
Collision energy is a key parameter determining the information content of beam-type collision induced dissociation tandem mass spectrometry (MS/MS) spectra, and its optimal choice largely affects successful peptide and protein identification in MS-based proteomics. For an MS/MS spectrum, quality of peptide match based on sequence database search, often characterized in terms of a single score, is a complex function of spectrum characteristics, and its collision energy dependence has remained largely unexplored. We carried out electrospray ionization-quadrupole-time of flight (ESI-Q-TOF)-MS/MS measurements on 2807 peptides from tryptic digests of HeLa and E. coli at 21 different collision energies. Agglomerative clustering of the resulting Mascot score versus energy curves revealed that only few of them display a single, well-defined maximum; rather, they feature either a broad plateau or two clear peaks. Nonlinear least-squares fitting of one or two Gaussian functions allowed the characteristic energies to be determined. We found that the double peaks and the plateaus in Mascot score can be associated with the different energy dependence of b- and y-type fragment ion intensities. We determined that the energies for optimum Mascot scores follow separate linear trends for the unimodal and bimodal cases with rather large residual variance even after differences in proton mobility are taken into account. This leaves room for experiment optimization and points to the possible influence of further factors beyond m/ z.
An efficient repeating signal detector to investigate earthquake swarms
NASA Astrophysics Data System (ADS)
Skoumal, Robert J.; Brudzinski, Michael R.; Currie, Brian S.
2016-08-01
Repetitive earthquake swarms have been recognized as key signatures in fluid injection induced seismicity, precursors to volcanic eruptions, and slow slip events preceding megathrust earthquakes. We investigate earthquake swarms by developing a Repeating Signal Detector (RSD), a computationally efficient algorithm utilizing agglomerative clustering to identify similar waveforms buried in years of seismic recordings using a single seismometer. Instead of relying on existing earthquake catalogs of larger earthquakes, RSD identifies characteristic repetitive waveforms by rapidly identifying signals of interest above a low signal-to-noise ratio and then grouping based on spectral and time domain characteristics, resulting in dramatically shorter processing time than more exhaustive autocorrelation approaches. We investigate seismicity in four regions using RSD: (1) volcanic seismicity at Mammoth Mountain, California, (2) subduction-related seismicity in Oaxaca, Mexico, (3) induced seismicity in Central Alberta, Canada, and (4) induced seismicity in Harrison County, Ohio. In each case, RSD detects a similar or larger number of earthquakes than existing catalogs created using more time intensive methods. In Harrison County, RSD identifies 18 seismic sequences that correlate temporally and spatially to separate hydraulic fracturing operations, 15 of which were previously unreported. RSD utilizes a single seismometer for earthquake detection which enables seismicity to be quickly identified in poorly instrumented regions at the expense of relying on another method to locate the new detections. Due to the smaller computation overhead and success at distances up to ~50 km, RSD is well suited for real-time detection of low-magnitude earthquake swarms with permanent regional networks.
Principles for scaling of distributed direct potable water reuse systems: a modeling study.
Guo, Tianjiao; Englehardt, James D
2015-05-15
Scaling of direct potable water reuse (DPR) systems involves tradeoffs of treatment facility economy-of-scale, versus cost and energy of conveyance including energy for upgradient distribution of treated water, and retention of wastewater thermal energy. In this study, a generalized model of the cost of DPR as a function of treatment plant scale, assuming futuristic, optimized conveyance networks, was constructed for purposes of developing design principles. Fractal landscapes representing flat, hilly, and mountainous topographies were simulated, with urban, suburban, and rural housing distributions placed by modified preferential growth algorithm. Treatment plants were allocated by agglomerative hierarchical clustering, networked to buildings by minimum spanning tree. Simulations assume advanced oxidation-based DPR system design, with 20-year design life and capability to mineralize chemical oxygen demand below normal detection limits, allowing implementation in regions where disposal of concentrate containing hormones and antiscalants is not practical. Results indicate that total DPR capital and O&M costs in rural areas, where systems that return nutrients to the land may be more appropriate, are high. However, costs in urban/suburban areas are competitive with current water/wastewater service costs at scales of ca. one plant per 10,000 residences. This size is relatively small, and costs do not increase significantly until plant service areas fall below 100 to 1000 homes. Based on these results, distributed DPR systems are recommended for consideration for urban/suburban water and wastewater system capacity expansion projects. Copyright © 2015 Elsevier Ltd. All rights reserved.
In VitroToxicity Evaluation of Nanomaterials: Importance of Materials Characterization
2011-03-28
Energy •Automotive •Catalysis •Textiles • Medical • Food • Water Treatment • Coatings DoD Applications •Biosensors •Anti microbial Agents...need to be man_ metal and metal oxide nanomatterial tend ttl agglom- erate ilil . olutt ion . Moreover, othe.r variable. , . uch ru the addition of...Structure in TiO2 Nanotoxicity Study Design The bioeffects of TiO2 were studied in mouse keratinocytes using the following Size Dependent Study with
Observations of fluorescent and biological aerosol at a high-altitude site in Central France
NASA Astrophysics Data System (ADS)
Gabey, A. M.; Vaitilingom, M.; Freney, E.; Boulon, J.; Sellegri, K.; Gallagher, M. W.; Crawford, I. P.; Robinson, N. H.; Stanley, W. R.; Kaye, P. H.
2013-01-01
Total bacteria, fungal spore and yeast counts were compared with UV Light-Induced Fluorescence (UV-LIF) measurements of ambient aerosol at the summit of the Puy de Dôme (pdD) mountain in Central France (1465 m a.s.l), which represents a background elevated site. Bacteria, fungal spores and yeast were enumerated by epifluorescence microscopy (EFM) and found to number 2.2 to 23 L-1 and 0.8 to 2 L-1, respectively. Bacteria counts on two successive nights were an order of magnitude larger than in the intervening day. A Wide Issue Bioaerosol Spectrometer, version 3 (WIBS-3) was used to perform UV-LIF measurements on ambient aerosol sized 0.8 to 20 μm. Mean total number concentration was 270 L-1 (σ = 66 L-1) found predominantly in a size mode at 2 μm for most of the campaign. Total concentration (fluorescent + non-fluorescent aerosol) peaked at 500 L-1 with a size mode at 1 μm because of a change in air mass origin lasting around 48 h. The WIBS-3 features two excitation and fluorescence detection wavelengths corresponding to different biological molecules. The mean fluorescent particle concentration after short-wave (280 nm; Tryptophan) excitation was 12 L-1 (σ = 6 L-1), and did not vary much through the campaign. In contrast the mean concentration of particles fluorescent after long-wave (370 nm; NADH) excitation was 95 L-1 (σ = 25 L-1), and a nightly rise and subsequent fall of up to 100 L-1 formed a strong diurnal cycle in the latter. The fluorescent populations exhibited size modes at 3 μm and 2 to 3 μm, respectively. A hierarchical agglomerative cluster analysis algorithm was applied to the data and used to extract different particle factors. A cluster concentration time series representative of bacteria was identified. This was found to exhibit a diurnal cycle with a maximum peak appearing during the day. Analysis of organic mass spectra recorded using an Aerosol Mass Spectrometer (AMS; Aerodyne Inc.) suggests that aerosol reaching the site at night was more aged than that during the day, indicative of sampling the residual layer at night. Supplementary meteorological data and previous work also show that pdD lies in the residual layer/free troposphere at night, and this is thought to cause the observed diurnal cycles in organic-type and fluorescent aerosol particles. Based on the observed disparity between bacteria and fluorescent particle concentrations, fluorescent non-PBA is likely to be important in the WIBS-3 data and the surprisingly high fluorescent concentration in the residual layer/free troposphere raises questions about a ubiquitous background in continental air during the summer.
Abroms, Lorien; Bontemps-Jones, Jeuneviette; Bauer, Joseph E; Bade, Jeanine
2011-01-01
Background Most smokers attempt to quit on their own even though cessation aids can substantially increase their chances of success. Millions of smokers seek cessation advice on the Internet, so using it to promote cessation products and services is one strategy for increasing demand for treatments. Little is known, however, about what cessation aids these smokers would find most appealing or what predicts their preferences (eg, age, level of dependence, or timing of quit date). Objective The objective of our study was to gain insight into how Internet seekers of cessation information make judgments about their preferences for treatments, and to identify sociodemographic and other predictors of preferences. Methods An online survey assessing interest in 9 evidence-based cessation products and services was voluntarily completed by 1196 smokers who visited the American Cancer Society’s Great American Smokeout (GASO) webpage. Cluster analysis was conducted on ratings of interest. Results In total, 48% (572/1196) of respondents were “quite a bit” or “very much” interested in nicotine replacement therapy (NRT), 45% (534/1196) in a website that provides customized quitting advice, and 37% (447/1196) in prescription medications. Only 11.5% (138/1196) indicated similar interest in quitlines, and 17% (208/1196) in receiving customized text messages. Hierarchical agglomerative cluster analysis revealed that interest in treatments formed 3 clusters: interpersonal – supportive methods (eg, telephone counseling, Web-based peer support, and in-person group programs), nonsocial – informational methods (eg, Internet programs, tailored emails, and informational booklets), and pharmacotherapy (NRT, bupropion, and varenicline). Only 5% (60/1196) of smokers were “quite a bit” or “very much” interested in interpersonal–supportive methods compared with 25% (298/1196) for nonsocial–informational methods and 33% (399/1196) for pharmacotherapy. Multivariate analyses and follow-up comparisons indicated that level of interest in pharmacotherapy (“quite a bit or “very much” vs. “not at all”) varied as a function of education (n = 575, χ2 3 =16.6, P = .001), age (n = 528, χ2 3 = 8.2, P = .04), smoking level (n = 514, χ2 3 = 9.5, P = .02), and when smokers were planning to quit (n = 607, χ2 4 = 34.0, P < .001). Surprisingly, greater age was associated with stronger interest in nonsocial–informational methods (n = 367, χ2 3 = 10.8, P = .01). Interest in interpersonal–supportive methods was greater if smokers had used a quitline before (n = 259, χ2 1 = 18.3, P < .001), or were planning to quit earlier rather than later (n = 148, χ2 1 = 4.9, P = .03). Conclusions Smokers accessing the Internet for information on quitting appear to differentiate cessation treatments by how much interpersonal interaction or support the treatment entails. Quitting date, smoking level, and sociodemographic variables can identify smokers with varying levels of interest in the 3 classes of cessation methods identified. These results can potentially be used to more effectively target and increase demand for these treatments among smokers searching the Internet for cessation information. PMID:21873150
SCOWLP classification: Structural comparison and analysis of protein binding regions
Teyra, Joan; Paszkowski-Rogacz, Maciej; Anders, Gerd; Pisabarro, M Teresa
2008-01-01
Background Detailed information about protein interactions is critical for our understanding of the principles governing protein recognition mechanisms. The structures of many proteins have been experimentally determined in complex with different ligands bound either in the same or different binding regions. Thus, the structural interactome requires the development of tools to classify protein binding regions. A proper classification may provide a general view of the regions that a protein uses to bind others and also facilitate a detailed comparative analysis of the interacting information for specific protein binding regions at atomic level. Such classification might be of potential use for deciphering protein interaction networks, understanding protein function, rational engineering and design. Description Protein binding regions (PBRs) might be ideally described as well-defined separated regions that share no interacting residues one another. However, PBRs are often irregular, discontinuous and can share a wide range of interacting residues among them. The criteria to define an individual binding region can be often arbitrary and may differ from other binding regions within a protein family. Therefore, the rational behind protein interface classification should aim to fulfil the requirements of the analysis to be performed. We extract detailed interaction information of protein domains, peptides and interfacial solvent from the SCOWLP database and we classify the PBRs of each domain family. For this purpose, we define a similarity index based on the overlapping of interacting residues mapped in pair-wise structural alignments. We perform our classification with agglomerative hierarchical clustering using the complete-linkage method. Our classification is calculated at different similarity cut-offs to allow flexibility in the analysis of PBRs, feature especially interesting for those protein families with conflictive binding regions. The hierarchical classification of PBRs is implemented into the SCOWLP database and extends the SCOP classification with three additional family sub-levels: Binding Region, Interface and Contacting Domains. SCOWLP contains 9,334 binding regions distributed within 2,561 families. In 65% of the cases we observe families containing more than one binding region. Besides, 22% of the regions are forming complex with more than one different protein family. Conclusion The current SCOWLP classification and its web application represent a framework for the study of protein interfaces and comparative analysis of protein family binding regions. This comparison can be performed at atomic level and allows the user to study interactome conservation and variability. The new SCOWLP classification may be of great utility for reconstruction of protein complexes, understanding protein networks and ligand design. SCOWLP will be updated with every SCOP release. The web application is available at . PMID:18182098
Sampling Modification Effects in the Subgingival Microbiome Profile of Healthy Children
Santigli, Elisabeth; Trajanoski, Slave; Eberhard, Katharina; Klug, Barbara
2017-01-01
Background: Oral microbiota are considered major players in the development of periodontal diseases. Thorough knowledge of intact subgingival microbiomes is required to elucidate microbial shifts from health to disease. Aims: This comparative study investigated the subgingival microbiome of healthy children, possible inter- and intra-individual effects of modified sampling, and basic comparability of subgingival microprints. Methods: In five 10-year-old children, biofilm was collected from the upper first premolars and first molars using sterilized, UV-treated paper-points inserted into the subgingival sulcus at eight sites. After supragingival cleaning using an electric toothbrush and water, sampling was performed, firstly, excluding (Mode A) and, secondly, including (Mode B) cleansing with sterile cotton pellets. DNA was extracted from the pooled samples, and primers targeting 16S rRNA hypervariable regions V5 and V6 were used for 454-pyrosequencing. Wilcoxon signed rank test and t-test were applied to compare sampling modes. Principal coordinate analysis (PCoA) and average agglomerative hierarchical clustering were calculated with unweighted UniFrac distance matrices. Sample grouping was tested with permutational MANOVA (Adonis). Results: Data filtering and quality control yielded 67,218 sequences with an average sequence length of 243bp (SD 6.52; range 231–255). Actinobacteria (2.8–24.6%), Bacteroidetes (9.2–25.1%), Proteobacteria (4.9–50.6%), Firmicutes (16.5–57.4%), and Fusobacteria (2.2–17.1%) were the five major phyla found in all samples. Differences in microbial abundances between sampling modes were not evident. High sampling numbers are needed to achieve significance for rare bacterial phyla. Samples taken from one individual using different sampling modes were more similar to each other than to other individuals' samples. PCoA and hierarchical clustering showed a grouping of the paired samples. Permutational MANOVA did not reveal sample grouping by sampling modes (p = 0.914 by R2 = 0.09). Conclusion: A slight modification of sampling mode has minor effects corresponding to a natural variability in the microbiome profiles of healthy children. The inter-individual variability in subgingival microprints is greater than intra-individual differences. Statistical analyses of microbial populations should consider this baseline variability and move beyond mere quantification with input from visual analytics. Comparative results are difficult to summarize as methods for studying huge datasets are still evolving. Advanced approaches are needed for sample size calculations in clinical settings. PMID:28149291
Karaca, Sefayet; Erge, Sema; Cesuroglu, Tomris; Polimanti, Renato
2016-06-01
Cardiovascular and metabolic traits (CMT) are influenced by complex interactive processes including diet, lifestyle, and genetic predisposition. The present study investigated the interactions of these risk factors in relation to CMTs in the Turkish population. We applied bootstrap agglomerative hierarchical clustering and Bayesian network learning algorithms to identify the causative relationships among genes involved in different biological mechanisms (i.e., lipid metabolism, hormone metabolism, cellular detoxification, aging, and energy metabolism), lifestyle (i.e., physical activity, smoking behavior, and metropolitan residency), anthropometric traits (i.e., body mass index, body fat ratio, and waist-to-hip ratio), and dietary habits (i.e., daily intakes of macro- and micronutrients) in relation to CMTs (i.e., health conditions and blood parameters). We identified significant correlations between dietary habits (soybean and vitamin B12 intakes) and different cardiometabolic diseases that were confirmed by the Bayesian network-learning algorithm. Genetic factors contributed to these disease risks also through the pleiotropy of some genetic variants (i.e., F5 rs6025 and MTR rs180508). However, we also observed that certain genetic associations are indirect since they are due to the causative relationships among the CMTs (e.g., APOC3 rs5128 is associated with low-density lipoproteins cholesterol and, by extension, total cholesterol). Our study applied a novel approach to integrate various sources of information and dissect the complex interactive processes related to CMTs. Our data indicated that complex causative networks are present: causative relationships exist among CMTs and are affected by genetic factors (with pleiotropic and non-pleiotropic effects) and dietary habits. Copyright © 2016 Elsevier Inc. All rights reserved.
The Role of Anger in Psychosocial Subgrouping for Patients with Low Back Pain
Nisenzon, Anne N.; George, Steven Z.; Beneciuk, Jason M.; Wandner, Laura D.; Torres, Calia; Robinson, Michael E.
2014-01-01
Low back pain (LBP) is a common and costly condition that often becomes chronic if not properly addressed. Recent research has shown that psychosocial symptoms can complicate LBP, necessitating more comprehensive screening measures. The present study investigated the role of psychosocial factors, including anger regulation, in pain and disability using a screening measure designed for LBP treated with physical therapy. One-hundred and three LBP patients initiating physical therapy completed an established screening measure to assess risk for developing chronic pain, as well as psychosocial measures assessing anger, depression, anxiety, fear-avoidance, and pain-catastrophizing before and after four weeks of treatment. Dependent variables were pain intensity, physical impairment, and patient-reported disability. Risk subgrouping based on anger and other psychosocial measures was examined using established screening methods and through employing an empirical statistical approach. Analyses revealed that risk subgroups differed according to corresponding levels of negative affect, as opposed to anger alone. General psychosocial distress also predicted disability post-treatment, but, interestingly, did not have a strong relationship to pain. Subsequent hierarchical agglomerative clustering procedures divided patients into overall High and Low Distress groups, with follow-up analyses revealing that the High Distress group had higher baseline measures of pain, disability, and impairment. Findings suggest that anger may be part of generalized negative affect rather than a unique predictor when assessing risk for pain and disability in LBP treatment. Continued research in the area of screening for psychosocial prognostic indicators in LBP may ultimately guide treatment protocols in physical therapy for more comprehensive patient care. PMID:24281272
Software framework for automatic learning of telescope operation
NASA Astrophysics Data System (ADS)
Rodríguez, Jose A.; Molgó, Jordi; Guerra, Dailos
2016-07-01
The "Gran Telescopio de Canarias" (GTC) is an optical-infrared 10-meter segmented mirror telescope at the ORM observatory in Canary Islands (Spain). The GTC Control System (GCS) is a distributed object and component oriented system based on RT-CORBA and it is responsible for the operation of the telescope, including its instrumentation. The current development state of GCS is mature and fully operational. On the one hand telescope users as PI's implement the sequences of observing modes of future scientific instruments that will be installed in the telescope and operators, in turn, design their own sequences for maintenance. On the other hand engineers develop new components that provide new functionality required by the system. This great work effort is possible to minimize so that costs are reduced, especially if one considers that software maintenance is the most expensive phase of the software life cycle. Could we design a system that allows the progressive assimilation of sequences of operation and maintenance of the telescope, through an automatic self-programming system, so that it can evolve from one Component oriented organization to a Service oriented organization? One possible way to achieve this is to use mechanisms of learning and knowledge consolidation to reduce to the minimum expression the effort to transform the specifications of the different telescope users to the operational deployments. This article proposes a framework for solving this problem based on the combination of the following tools: data mining, self-Adaptive software, code generation, refactoring based on metrics, Hierarchical Agglomerative Clustering and Service Oriented Architectures.
Luyssaert, Sebastiaan; Sulkava, Mika; Raitio, Hannu; Hollmén, Jaakko
2004-02-01
This paper introduces the use of nutrition profiles as a first step in the development of a concept that is suitable for evaluating forest nutrition on the basis of large-scale foliar surveys. Nutrition profiles of a tree or stand were defined as the nutrient status, which accounts for all element concentrations, contents and interactions between two or more elements. Therefore a nutrition profile overcomes the shortcomings associated with the commonly used concepts for evaluating forest nutrition. Nutrition profiles can be calculated by means of a neural network, i.e. a self-organizing map, and an agglomerative clustering algorithm with pruning. As an example, nutrition profiles were calculated to describe the temporal variation in the mineral composition of Scots pine and Norway spruce needles in Finland between 1987 and 2000. The temporal trends in the frequency distribution of the nutrition profiles of Scots pine indicated that, between 1987 and 2000, the N, S, P, K, Ca, Mg and Al decreased, whereas the needle mass (NM) increased or remained unchanged. As there were no temporal trends in the frequency distribution of the nutrition profiles of Norway spruce, the mineral composition of the needles of Norway spruce needles subsequently did not change. Interpretation of the (lack of) temporal trends was outside the scope of this example. However, nutrition profiles prove to be a new and better concept for the evaluation of the mineral composition of large-scale surveys only when a biological interpretation of the nutrition profiles can be provided.
Yokoyama, Eiji; Uchimura, Masako
2007-11-01
Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Observations of fluorescent and biological aerosol at a high-altitude site in central France
NASA Astrophysics Data System (ADS)
Gabey, A. M.; Vaitilingom, M.; Freney, E.; Boulon, J.; Sellegri, K.; Gallagher, M. W.; Crawford, I. P.; Robinson, N. H.; Stanley, W. R.; Kaye, P. H.
2013-08-01
Total bacteria, fungal spore and yeast counts were compared with ultraviolet-light-induced fluorescence (UV-LIF) measurements of ambient aerosol at the summit of the Puy de Dôme (PdD) mountain in central France (1465 m a.s.l), which represents a background elevated site. Bacteria, fungal spores and yeast were enumerated by epifluorescence microscopy (EFM) and found to number 2.2 to 23 L-1 and 0.8 to 2 L-1, respectively. Bacteria counts on two successive nights were an order of magnitude larger than in the intervening day. A wide issue bioaerosol spectrometer, version 3 (WIBS-3) was used to perform UV-LIF measurements on ambient aerosol sized 0.8 to 20 μm. Mean total number concentration was 270 L-1 (σ = 66 L-1), found predominantly in a size mode at 2 μm for most of the campaign. Total concentration (fluorescent + non-fluorescent aerosol) peaked at 500 L-1 with a size mode at 1 μm because of a change in air mass origin lasting around 48 h. The WIBS-3 features two excitation and fluorescence detection wavelengths corresponding to different biological molecules, although non-biological interferents also contribute. The mean fluorescent particle concentration after short-wave (280 nm; associated with tryptophan) excitation was 12 L-1 (σ = 6 L-1), and did not vary much throughout the campaign. In contrast, the mean concentration of particles fluorescent after long-wave (370 nm; associated with NADH) excitation was 95 L-1 (σ = 25 L-1), and a nightly rise and subsequent fall of up to 100 L-1 formed a strong diurnal cycle in the latter. The two fluorescent populations exhibited size modes at 3 μm and 2 to 3 μm, respectively. A hierarchical agglomerative cluster analysis algorithm was applied to the data and used to extract different particle factors. A cluster concentration time series representative of bacteria was identified. This was found to exhibit a diurnal cycle with a maximum peak appearing during the day. Analysis of organic mass spectra recorded using an aerosol mass spectrometer (AMS; Aerodyne Inc.) suggests that aerosol reaching the site at night was more aged than that during the day, indicative of sampling the residual layer at night. Supplementary meteorological data and previous work also show that PdD lies in the residual layer/free troposphere at night, and this is thought to cause the observed diurnal cycles in organic-type and fluorescent aerosol particles. Based on the observed disparity between bacteria and fluorescent particle concentrations, fluorescent non-PBA is likely to be important in the WIBS-3 data and the surprisingly high fluorescent concentration in the residual layer/free troposphere raises questions about a ubiquitous background in continental air during the summer.
2013-01-01
Background The objective of this study was to examine the potential environmental risk of tailings resulted after precious and base metal ores processing, stored in seven impoundments located in the Aries river basin, Romania. The tailings were characterized by mineralogical and elemental composition, contamination indices, acid rock drainage generation potential and water leachability of hazardous/priority hazardous metals and ions. Multivariate statistical methods were used for data interpretation. Results Tailings were found to be highly contaminated with several hazardous/priority hazardous metals (As, Cu, Cd, Pb), and pose potential contamination risk for soil, sediments, surface and groundwater. Two out of the seven studied impoundments does not satisfy the criteria required for inert wastes, shows acid rock drainage potential and thus can contaminate the surface and groundwater. Three impoundments were found to be highly contaminated with As, Pb and Cd, two with As and other two with Cu. The tailings impoundments were grouped based on the enrichment factor, geoaccumulation index, contamination factor and contamination degree of 7 hazardous/priority hazardous metals (As, Cd, Cr, Cu, Ni, Pb, Zn) considered typical for the studied tailings. Principal component analysis showed that 47% of the elemental variability was attributable to alkaline silicate rocks, 31% to acidic S-containing minerals, 12% to carbonate minerals and 5% to biogenic elements. Leachability of metals and ions was ascribed in proportion of 61% to silicates, 11% to acidic minerals and 6% to the organic matter. A variability of 18% was attributed to leachability of biogenic elements (Na, K, Cl-, NO3-) with no potential environmental risk. Pattern recognition by agglomerative hierarchical clustering emphasized the grouping of impoundments in agreement with their contamination degree and acid rock drainage generation potential. Conclusions Tailings stored in the studied impoundments were found to be contaminated with some hazardous/ priority hazardous metals, fluoride and sulphate and thus presents different contamination risk for the environment. A long term monitoring program of these tailings impoundments and the expansion of the ecologization measures in the area is required. PMID:23311708
Dynamics of brain activity underlying working memory for music in a naturalistic condition.
Burunat, Iballa; Alluri, Vinoo; Toiviainen, Petri; Numminen, Jussi; Brattico, Elvira
2014-08-01
We aimed at determining the functional neuroanatomy of working memory (WM) recognition of musical motifs that occurs while listening to music by adopting a non-standard procedure. Western tonal music provides naturally occurring repetition and variation of motifs. These serve as WM triggers, thus allowing us to study the phenomenon of motif tracking within real music. Adopting a modern tango as stimulus, a behavioural test helped to identify the stimulus motifs and build a time-course regressor of WM neural responses. This regressor was then correlated with the participants' (musicians') functional magnetic resonance imaging (fMRI) signal obtained during a continuous listening condition. In order to fine-tune the identification of WM processes in the brain, the variance accounted for by the sensory processing of a set of the stimulus' acoustic features was pruned from participants' neurovascular responses to music. Motivic repetitions activated prefrontal and motor cortical areas, basal ganglia, medial temporal lobe (MTL) structures, and cerebellum. The findings suggest that WM processing of motifs while listening to music emerges from the integration of neural activity distributed over cognitive, motor and limbic subsystems. The recruitment of the hippocampus stands as a novel finding in auditory WM. Effective connectivity and agglomerative hierarchical clustering analyses indicate that the hippocampal connectivity is modulated by motif repetitions, showing strong connections with WM-relevant areas (dorsolateral prefrontal cortex - dlPFC, supplementary motor area - SMA, and cerebellum), which supports the role of the hippocampus in the encoding of the musical motifs in WM, and may evidence long-term memory (LTM) formation, enabled by the use of a realistic listening condition. Copyright © 2014 Elsevier Ltd. All rights reserved.
2011-01-01
Background Identifying the functional importance of the millions of single nucleotide polymorphisms (SNPs) in the human genome is a difficult challenge. Therefore, a reverse strategy, which identifies functionally important SNPs by virtue of the bimodal abundance across the human population of the SNP-related mRNAs will be useful. Those mRNA transcripts that are expressed at two distinct abundances in proportion to SNP allele frequency may warrant further study. Matrix metalloproteinase 1 (MMP1) is important in both normal development and in numerous pathologies. Although much research has been conducted to investigate the expression of MMP1 in many different cell types and conditions, the regulation of its expression is still not fully understood. Results In this study, we used a novel but straightforward method based on agglomerative hierarchical clustering to identify bimodally expressed transcripts in human umbilical vein endothelial cell (HUVEC) microarray data from 15 individuals. We found that MMP1 mRNA abundance was bimodally distributed in un-treated HUVECs and showed a bimodal response to inflammatory mediator treatment. RT-PCR and MMP1 activity assays confirmed the bimodal regulation and DNA sequencing of 69 individuals identified an MMP1 gene promoter polymorphism that segregated precisely with the MMP1 bimodal expression. Chromatin immunoprecipation (ChIP) experiments indicated that the transcription factors (TFs) ETS1, ETS2 and GATA3, bind to the MMP1 promoter in the region of this polymorphism and may contribute to the bimodal expression. Conclusions We describe a simple method to identify putative bimodally expressed RNAs from transcriptome data that is effective yet easy for non-statisticans to understand and use. This method identified bimodal endothelial cell expression of MMP1, which appears to be biologically significant with implications for inflammatory disease. (271 Words) PMID:21244711
Ali, Zeshan; Mujeeb-Kazi, Abdul; Quraishi, Umar Masood; Malik, Riffat Naseem
2018-04-25
The current study provides one of the first attempts to identify tolerant, moderately sensitive, and highly sensitive wheat genotypes on the basis of heavy metal accumulation, biochemical attributes, and human health risk assessments on urban wastewater (UW) irrigation. Mean heavy metals (Fe, Co, Ni, Cu, Zn, Pb, Cd, Cr, Mn) and macro-nutrients (Na, K, Ca, Mg) levels increased in the roots, stem, and grains of studied genotypes. Except K (stem > root > grain), all metals were accumulated in highest concentrations in roots followed by stem and grains. Principal component analyses (PCA) identified three groups of UW-irrigated genotypes which were confirmed by hierarchical agglomerative cluster analyses (HACA). Wheat genotypes with the lowest metal accumulation were regarded as tolerant, whereas those with maximum accumulation were considered highly sensitive. Tolerant genotypes showed the lowest hazard quotient for heavy metals, i.e., Co, Mn, Cd, Cu, Fe, Pb, and Cr, and hazard index (HI) values (adults, 2.04; children, 2.27) than moderately and highly sensitive genotypes. Higher health risks (HI) associated with moderate (adults 2.26; children 2.53) and highly sensitive (adults 2.52; children 2.82) genotypes revealed maximum uptake of heavy metals. The heatmap showed higher mean biochemical levels of chlorophyll, carotenoids, membrane stability index (MSI%), sugars, proteins, proline, superoxide dismutase (SOD), peroxidase (POD), and catalase (CAT) in tolerant genotypes than remaining genotypes. With the lowest metal accumulation and advanced biochemical mechanisms to cope with the adverse effects of heavy metals in their plant bodies, tolerant genotypes present a better option for cultivation in areas receiving UW or similar type of wastewater.
Onboard Robust Visual Tracking for UAVs Using a Reliable Global-Local Object Model
Fu, Changhong; Duan, Ran; Kircali, Dogan; Kayacan, Erdal
2016-01-01
In this paper, we present a novel onboard robust visual algorithm for long-term arbitrary 2D and 3D object tracking using a reliable global-local object model for unmanned aerial vehicle (UAV) applications, e.g., autonomous tracking and chasing a moving target. The first main approach in this novel algorithm is the use of a global matching and local tracking approach. In other words, the algorithm initially finds feature correspondences in a way that an improved binary descriptor is developed for global feature matching and an iterative Lucas–Kanade optical flow algorithm is employed for local feature tracking. The second main module is the use of an efficient local geometric filter (LGF), which handles outlier feature correspondences based on a new forward-backward pairwise dissimilarity measure, thereby maintaining pairwise geometric consistency. In the proposed LGF module, a hierarchical agglomerative clustering, i.e., bottom-up aggregation, is applied using an effective single-link method. The third proposed module is a heuristic local outlier factor (to the best of our knowledge, it is utilized for the first time to deal with outlier features in a visual tracking application), which further maximizes the representation of the target object in which we formulate outlier feature detection as a binary classification problem with the output features of the LGF module. Extensive UAV flight experiments show that the proposed visual tracker achieves real-time frame rates of more than thirty-five frames per second on an i7 processor with 640 × 512 image resolution and outperforms the most popular state-of-the-art trackers favorably in terms of robustness, efficiency and accuracy. PMID:27589769
Biogeographic classification of the Caspian Sea
NASA Astrophysics Data System (ADS)
Fendereski, F.; Vogt, M.; Payne, M. R.; Lachkar, Z.; Gruber, N.; Salmanmahiny, A.; Hosseini, S. A.
2014-11-01
Like other inland seas, the Caspian Sea (CS) has been influenced by climate change and anthropogenic disturbance during recent decades, yet the scientific understanding of this water body remains poor. In this study, an eco-geographical classification of the CS based on physical information derived from space and in situ data is developed and tested against a set of biological observations. We used a two-step classification procedure, consisting of (i) a data reduction with self-organizing maps (SOMs) and (ii) a synthesis of the most relevant features into a reduced number of marine ecoregions using the hierarchical agglomerative clustering (HAC) method. From an initial set of 12 potential physical variables, 6 independent variables were selected for the classification algorithm, i.e., sea surface temperature (SST), bathymetry, sea ice, seasonal variation of sea surface salinity (DSSS), total suspended matter (TSM) and its seasonal variation (DTSM). The classification results reveal a robust separation between the northern and the middle/southern basins as well as a separation of the shallow nearshore waters from those offshore. The observed patterns in ecoregions can be attributed to differences in climate and geochemical factors such as distance from river, water depth and currents. A comparison of the annual and monthly mean Chl a concentrations between the different ecoregions shows significant differences (one-way ANOVA, P < 0.05). In particular, we found differences in phytoplankton phenology, with differences in the date of bloom initiation, its duration and amplitude between ecoregions. A first qualitative evaluation of differences in community composition based on recorded presence-absence patterns of 25 different species of plankton, fish and benthic invertebrate also confirms the relevance of the ecoregions as proxies for habitats with common biological characteristics.
Bio-geographic classification of the Caspian Sea
NASA Astrophysics Data System (ADS)
Fendereski, F.; Vogt, M.; Payne, M. R.; Lachkar, Z.; Gruber, N.; Salmanmahiny, A.; Hosseini, S. A.
2014-03-01
Like other inland seas, the Caspian Sea (CS) has been influenced by climate change and anthropogenic disturbance during recent decades, yet the scientific understanding of this water body remains poor. In this study, an eco-geographical classification of the CS based on physical information derived from space and in-situ data is developed and tested against a set of biological observations. We used a two-step classification procedure, consisting of (i) a data reduction with self-organizing maps (SOMs) and (ii) a synthesis of the most relevant features into a reduced number of marine ecoregions using the Hierarchical Agglomerative Clustering (HAC) method. From an initial set of 12 potential physical variables, 6 independent variables were selected for the classification algorithm, i.e., sea surface temperature (SST), bathymetry, sea ice, seasonal variation of sea surface salinity (DSSS), total suspended matter (TSM) and its seasonal variation (DTSM). The classification results reveal a robust separation between the northern and the middle/southern basins as well as a separation of the shallow near-shore waters from those off-shore. The observed patterns in ecoregions can be attributed to differences in climate and geochemical factors such as distance from river, water depth and currents. A comparison of the annual and monthly mean Chl a concentrations between the different ecoregions shows significant differences (Kruskal-Wallis rank test, P < 0.05). In particular, we found differences in phytoplankton phenology, with differences in the date of bloom initiation, its duration and amplitude between ecoregions. A first qualitative evaluation of differences in community composition based on recorded presence-absence patterns of 27 different species of plankton, fish and benthic invertebrate also confirms the relevance of the ecoregions as proxies for habitats with common biological characteristics.
Using Cluster Analysis to Examine Husband-Wife Decision Making
ERIC Educational Resources Information Center
Bonds-Raacke, Jennifer M.
2006-01-01
Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
Environmental assessment of Al-Hammar Marsh, Southern Iraq.
Al-Gburi, Hind Fadhil Abdullah; Al-Tawash, Balsam Salim; Al-Lafta, Hadi Salim
2017-02-01
(a) To determine the spatial distributions and levels of major and minor elements, as well as heavy metals, in water, sediment, and biota (plant and fish) in Al-Hammar Marsh, southern Iraq, and ultimately to supply more comprehensive information for policy-makers to manage the contaminants input into the marsh so that their concentrations do not reach toxic levels. (b) to characterize the seasonal changes in the marsh surface water quality. (c) to address the potential environmental risk of these elements by comparison with the historical levels and global quality guidelines (i.e., World Health Organization (WHO) standard limits). (d) to define the sources of these elements (i.e., natural and/or anthropogenic) using combined multivariate statistical techniques such as Principal Component Analysis (PCA) and Agglomerative Hierarchical Cluster Analysis (AHCA) along with pollution analysis (i.e., enrichment factor analysis). Water, sediment, plant, and fish samples were collected from the marsh, and analyzed for major and minor ions, as well as heavy metals, and then compared to historical levels and global quality guidelines (WHO guidelines). Then, multivariate statistical techniques, such as PCA and AHCA, were used to determine the element sourcing. Water analyses revealed unacceptable values for almost all physio-chemical and biological properties, according to WHO standard limits for drinking water. Almost all major ions and heavy metal concentrations in water showed a distinct decreasing trend at the marsh outlet station compared to other stations. In general, major and minor ions, as well as heavy metals exhibit higher concentrations in winter than in summer. Sediment analyses using multivariate statistical techniques revealed that Mg, Fe, S, P, V, Zn, As, Se, Mo, Co, Ni, Cu, Sr, Br, Cd, Ca, N, Mn, Cr, and Pb were derived from anthropogenic sources, while Al, Si, Ti, K, and Zr were primarily derived from natural sources. Enrichment factor analysis gave results compatible with multivariate statistical techniques findings. Analysis of heavy metals in plant samples revealed that there is no pollution in plants in Al-Hammar Marsh. However, the concentrations of heavy metals in fish samples showed that all samples were contaminated by Pb, Mn, and Ni, while some samples were contaminated by Pb, Mn, and Ni. Decreasing of Tigris and Euphrates discharges during the past decades due to drought conditions and upstream damming, as well as the increasing stress of wastewater effluents from anthropogenic activities, led to degradation of the downstream Al-Hammar Marsh water quality in terms of physical, chemical, and biological properties. As such properties were found to consistently exceed the historical and global quality objectives. However, element concentration decreasing trend at the marsh outlet station compared to other stations indicate that the marsh plays an important role as a natural filtration and bioremediation system. Higher element concentrations in winter were due to runoff from the washing of the surrounding Sabkha during flooding by winter rainstorms. Finally, the high concentrations of heavy metals in fish samples can be attributed to bioaccumulation and biomagnification processes.
Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo
2017-12-01
To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Stefurak, Tres; Calhoun, Georgia B
2007-01-01
The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.
Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina
2009-04-01
In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Witter, Amy E; Nguyen, Minh H
2016-02-01
Recent studies indicate that PAH transformation products such as ketone or quinone-substituted PAHs (OPAHs) are potent aryl hydrocarbon receptor (AhR) activators that elicit toxicological effects independent of those observed for PAHs. Here, we measured eight OPAHs, two sulfur-containing (SPAH), one oxygen-containing (DBF), and one nitrogen-containing (CARB) heterocyclic PAHs (i.e. ΣONS-PAHs = OPAH8 + SPAH + DBF + CARB) in 35 stream sediments collected from a small (∼1303 km(2)) urban watershed located in south-central Pennsylvania, USA. Combined ΣONS-PAH concentrations ranged from 59 to 1897 μg kg(-1) (mean = 568 μg kg(-1); median = 425 μg kg(-1)) and were 2.4 times higher in urban versus rural areas, suggesting that activities taking place on urban land serve as a source of ΣONS-PAHs to sediments. To evaluate urban land use metrics that might explain these data, Spearman rank correlation analyses was used to evaluate the degree of association between ΣONS-PAH concentrations and urban land-use/land-cover metrics along an urban-rural transect at two spatial scales (500-m and 1000-m upstream). Combined ΣONS-PAH concentrations showed highly significant (p < 0.0001) correlations with ΣPAH19, residential and commercial/industrial land use (RESCI), and combined state and local road miles (MILES), suggesting that ΣONS-PAHs originate from similar sources as PAHs. To evaluate OPAH sources, a subset of ΣONS-PAHs for which reference assemblages exist, an average OPAH fractional assemblage for urban sediments was derived using agglomerative hierarchal cluster (AHC) analysis, and compared to published OPAH source profiles. Urban sediments from the Condoguinet Creek (n = 21) showed highly significant correlations with urban particulate matter (X(2) = 0.05, r = 0.91, p = 0.0047), suggesting that urban particulate matter is an important OPAH source to sediments in this watershed. Results suggest the inclusion of ΣONS-PAH measurements adds value to traditional PAH analyses, and may help elucidate and refine pollutant source identification in urban watersheds. Copyright © 2015 Elsevier Ltd. All rights reserved.
Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G
2017-05-01
The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
NASA Technical Reports Server (NTRS)
Wharton, S. W.; Turner, B. J.
1981-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
NASA Astrophysics Data System (ADS)
Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue
2017-08-01
Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard
2014-07-01
Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn
2014-07-15
Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less
Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy
2016-08-01
To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M
2015-05-01
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2015-01-01
Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis
ERIC Educational Resources Information Center
de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.
2006-01-01
K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…
2014-01-01
Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
A generalized analysis of hydrophobic and loop clusters within globular protein sequences
Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle
2007-01-01
Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072
Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O
2015-01-01
To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
ERIC Educational Resources Information Center
DiStefano, Christine; Kamphaus, R. W.
2006-01-01
Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
Cluster analysis in phenotyping a Portuguese population.
Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J
2015-09-03
Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.
Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.
Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D
2017-06-01
Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.
Cross-scale analysis of cluster correspondence using different operational neighborhoods
NASA Astrophysics Data System (ADS)
Lu, Yongmei; Thill, Jean-Claude
2008-09-01
Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.
2010-01-01
Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.
Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut
2015-03-01
Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.
van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim
2017-01-01
In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia
NASA Astrophysics Data System (ADS)
Novak, Vibor; Renema, Willem
2018-01-01
To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar
2015-03-01
Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.
Interactive visual exploration and refinement of cluster assignments.
Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R
2017-09-12
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
Somatotyping using 3D anthropometry: a cluster analysis.
Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur
2013-01-01
Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
Clusters of Occupations Based on Systematically Derived Work Dimensions: An Exploratory Study.
ERIC Educational Resources Information Center
Cunningham, J. W.; And Others
The study explored the feasibility of deriving an educationally relevant occupational cluster structure based on Occupational Analysis Inventory (OAI) work dimensions. A hierarchical cluster analysis was applied to the factor score profiles of 814 occupations on 22 higher-order OAI work dimensions. From that analysis, 73 occupational clusters were…
Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.
Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren
2014-12-01
Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.
Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.
2015-01-01
Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
clusterProfiler: an R package for comparing biological themes among gene clusters.
Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu
2012-05-01
Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.
Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.
Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik
2017-11-01
Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease
Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat
2014-07-01
Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.
Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?
Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J
2018-04-01
Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.
Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis
ERIC Educational Resources Information Center
Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian
2006-01-01
Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…
Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.
Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun
2017-12-01
Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.
Hanson, E; Ingold, S; Haas, C; Ballantyne, J
2018-05-01
The recovery of a DNA profile from the perpetrator or victim in criminal investigations can provide valuable 'source level' information for investigators. However, a DNA profile does not reveal the circumstances by which biological material was transferred. Some contextual information can be obtained by a determination of the tissue or fluid source of origin of the biological material as it is potentially indicative of some behavioral activity on behalf of the individual that resulted in its transfer from the body. Here, we sought to improve upon established RNA based methods for body fluid identification by developing a targeted multiplexed next generation mRNA sequencing assay comprising a panel of approximately equal sized gene amplicons. The multiplexed biomarker panel includes several highly specific gene targets with the necessary specificity to definitively identify most forensically relevant biological fluids and tissues (blood, semen, saliva, vaginal secretions, menstrual blood and skin). In developing the biomarker panel we evaluated 66 gene targets, with a progressive iteration of testing target combinations that exhibited optimal sensitivity and specificity using a training set of forensically relevant body fluid samples. The current assay comprises 33 targets: 6 blood, 6 semen, 6 saliva, 4 vaginal secretions, 5 menstrual blood and 6 skin markers. We demonstrate the sensitivity and specificity of the assay and the ability to identify body fluids in single source and admixed stains. A 16 sample blind test was carried out by one lab with samples provided by the other participating lab. The blinded lab correctly identified the body fluids present in 15 of the samples with the major component identified in the 16th. Various classification methods are being investigated to permit inference of the body fluid/tissue in dried physiological stains. These include the percentage of reads in a sample that are due to each of the 6 tissues/body fluids tested and inter-sample differential gene expression revealed by agglomerative hierarchical clustering. Copyright © 2018 Elsevier B.V. All rights reserved.
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang
2017-01-01
Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
[Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].
Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés
2018-03-06
To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less
Steenbergen, K G; Gaston, N
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.
Won, Jong Chul; Im, Yong-Jin; Lee, Ji-Hyun; Kim, Chong Hwa; Kwon, Hyuk Sang; Cha, Bong-Yun; Park, Tae Sun
2017-01-01
Patients with diabetic peripheral neuropathy (DPN) is the most common complication. However, patients are usually suffering from not only diverse sensory deficit but also neuropathy-related discomforts. The aim of this study is to identify distinct groups of patients with DPN with respect to its clinical impacts on symptom patterns and comorbidities. A hierarchical cluster analysis and factor analysis were performed to identify relevant subgroups of patients with DPN ( n = 1338) and symptom patterns. Patients with DPN were divided into three clusters: asymptomatic (cluster 1, n = 448, 33.5%), moderate symptoms with disturbed sleep (cluster 2, n = 562, 42.0%), and severe symptoms with decreased quality of life (cluster 3, n = 328, 24.5%). Patients in cluster 3, compared with clusters 1 and 2, were characterized by higher levels of HbA1c and more severe pain and physical impairments. Patients in cluster 2 had moderate pain levels but disturbed sleep patterns comparable to those in cluster 3. The frequency of symptoms on each item of MNSI by "painful" symptom pattern showed a similar distribution pattern with increasing intensities along the three clusters. Cluster and factor analysis endorsed the use of comprehensive and symptomatic subgrouping to individualize the evaluation of patients with DPN.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John
2015-01-01
Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.
Williams, N J; Nasuto, S J; Saddy, J D
2015-07-30
The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
Aikawa, Ken; Kataoka, Masao; Ogawa, Soichiro; Akaihata, Hidenori; Sato, Yuichi; Yabe, Michihiro; Hata, Junya; Koguchi, Tomoyuki; Kojima, Yoshiyuki; Shiragasawa, Chihaya; Kobayashi, Toshimitsu; Yamaguchi, Osamu
2015-08-01
To present a new grouping of male patients with lower urinary tract symptoms (LUTS) based on symptom patterns and clarify whether the therapeutic effect of α1-blocker differs among the groups. We performed secondary analysis of anonymous data from 4815 patients enrolled in a postmarketing surveillance study of tamsulosin in Japan. Data on 7 International Prostate Symptom Score (IPSS) items at the initial visit were used in the cluster analysis. IPSS and quality of life (QOL) scores before and after tamsulosin treatment for 12 weeks were assessed in each cluster. Partial correlation coefficients were also obtained for IPSS and QOL scores based on changes before and after treatment. Five symptom groups were identified by cluster analysis of IPSS. On their symptom profile, each cluster was labeled as minimal type (cluster 1), multiple severe type (cluster 2), weak stream type (cluster 3), storage type (cluster 4), and voiding type (cluster 5). Prevalence and the mean symptom score were significantly improved in almost all symptoms in all clusters by tamsulosin treatment. Nocturia and weak stream had the strongest effect on QOL in clusters 1, 2, and 4 and clusters 3 and 5, respectively. The study clarified that 5 characteristic symptom patterns exist by cluster analysis of IPSS in male patients with LUTS. Tamsulosin improved various symptoms and QOL in each symptom group. The study reports many male patients with LUTS being satisfied with monotherapy using tamsulosin and suggests the usefulness of α1-blockers as a drug of first choice. Copyright © 2015 Elsevier Inc. All rights reserved.
Multiscale visual quality assessment for cluster analysis with self-organizing maps
NASA Astrophysics Data System (ADS)
Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias
2011-01-01
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Redman, Regina S.; Ranson, Judith; Rodriguez, Rusty J.
2006-01-01
Cantharellus formosus growing on the Olympic Peninsula of the Pacific Northwest was sampled from September – November 1995 for genetic analysis. A total of ninety-six basidiomes from five clusters separated from one another by 3 - 25 meters were genetically characterized by PCR analysis of 13 arbitrary loci and rDNA sequences. The number of basidiomes in each cluster varied from 15 to 25 and genetic analysis delineated 15 genets among the clusters. Analysis of variance utilizing thirteen apPCR generated genetic molecular markers and PCR amplification of the ribosomal ITS regions indicated that 81.41% of the genetic variation occurred between clusters and 18.59% within clusters. Proximity of the basidiomes within a cluster was not an indicator of genotypic similarity. The molecular profiles of each cluster were distinct and defined as unique populations containing 2 - 6 genets. The monitoring and analysis of this species through non-lethal sampling and future applications is discussed.
Li, Hai-juan; Zhao, Xin; Jia, Qing-fei; Li, Tian-lai; Ning, Wei
2012-08-01
The achenes morphological and micro-morphological characteristics of six species of genus Taraxacum from northeastern China as well as SRAP cluster analysis were observed for their classification evidences. The achenes were observed by microscope and EPMA. Cluster analysis was given on the basis of the size, shape, cone proportion, color and surface sculpture of achenes. The Taraxacum inter-species achene shape characteristic difference is obvious, particularly spinulose distribution and size, achene color and achene size; with the Taraxacum plant achene shape the cluster method T. antungense Kitag. and the T. urbanum Kitag. should combine for the identical kind; the achene morphology cluster analysis and the SRAP tagged molecule systematics's cluster result retrieves in the table with "the Chinese flora". The class group to divide the result is consistent. Taraxacum plant achene shape characteristic stable conservative, may carry on the inter-species division and the sibship analysis according to the achene shape characteristic combination difference; the achene morphology cluster analysis as well as the SRAP tagged molecule systematics confirmation support dandelion classification result of "the Chinese flora".
Exploratory Item Classification Via Spectral Graph Clustering
Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
2017-01-01
Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
NASA Technical Reports Server (NTRS)
Fomenkova, M. N.
1997-01-01
The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R
2016-11-01
To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.
Peterson, Leif E
2002-01-01
CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
DICON: interactive visual analysis of multidimensional clusters.
Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin
2011-12-01
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
Cluster Correspondence Analysis.
van de Velden, M; D'Enza, A Iodice; Palumbo, F
2017-03-01
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein
2014-11-01
Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
X-ray and optical substructures of the DAFT/FADA survey clusters
NASA Astrophysics Data System (ADS)
Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.
2013-04-01
We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.
Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo
2018-07-01
Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
Mixture modelling for cluster analysis.
McLachlan, G J; Chang, S U
2004-10-01
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.
2016-01-01
Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2015-01-01
Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.
Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John
2015-09-01
We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.
Cluster analysis of the hot subdwarfs in the PG survey
NASA Technical Reports Server (NTRS)
Thejll, Peter; Charache, Darryl; Shipman, Harry L.
1989-01-01
Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.
Using Machine Learning Techniques in the Analysis of Oceanographic Data
NASA Astrophysics Data System (ADS)
Falcinelli, K. E.; Abuomar, S.
2017-12-01
Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.
Impact of Sampling Density on the Extent of HIV Clustering
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2014-01-01
Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2014-04-29
To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Elemental Abundances in the Intracluster Gas and the Hot Galactic Coronae in Cluster A194
NASA Technical Reports Server (NTRS)
Forman, William R.
1997-01-01
We have completed the analysis of observations of the Coma cluster and are continuing analysis of A1367 both of which are shown to be merging clusters. Also, we are analyzing observations of the Centaurus cluster which we see as a merger based in both its temperature and surface brightness distributions. Attachment: Another collision for the coma cluster.
A Cluster of Legionella-Associated Pneumonia Cases in a Population of Military Recruits
2007-06-01
this cluster may suggest a previously unrecognized suscep- FIG. 1. Phylogenic analysis of the training center strain (represented by the MCRD consensus...military recruits during population- based surveillance for pneumonia pathogens. Results were confirmed by sequence analysis . Cases cluster tightly...17 April 2007 A Legionella cluster was identified through retrospective PCR analysis of 240 throat swab samples from X-ray-confirmed pneumonia cases
A scoping review of spatial cluster analysis techniques for point-event data.
Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott
2013-05-01
Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, John; Castillo, Andrew
2016-09-21
This software contains a set of python modules – input, search, cluster, analysis; these modules read input files containing spatial coordinates and associated attributes which can be used to perform nearest neighbor search (spatial indexing via kdtree), cluster analysis/identification, and calculation of spatial statistics for analysis.
NASA Astrophysics Data System (ADS)
Xin, Hangshu; Yu, Peiqiang
2013-10-01
There is no information on the co-products from carinata bio-fuel and bio-oil processing (carinata meal) in molecular structural profiles mainly related to carbohydrate biopolymers in relation to ruminant nutrition. Molecular analyses with Fourier transform infrared spectroscopy (FT/IR) technique with attenuated total reflectance (ATR) and chemometrics enable to detect structural features on a molecular basis. The objectives of this study were to: (1) determine carbohydrate conformation spectral features in original carinata meal, co-products from bio-fuel/bio-oil processing; and (2) investigate differences in carbohydrate molecular composition and functional group spectral intensities after in situ ruminal fermentation at 0, 12, 24 and 48 h compared to canola meal as a reference. The molecular spectroscopic parameters of carbohydrate profiles detected were structural carbohydrates (STCHO, mainly associated with hemi-cellulosic and cellulosic compounds; region and baseline ca. 1483-1184 cm-1), cellulosic compounds (CELC, region and baseline ca. 1304-1184 cm-1), total carbohydrates (CHO, region and baseline ca. 1193-889 cm-1) as well as the spectral ratios calculated based on respective spectral intensity data. The results showed that the spectral profiles of carinata meal were significantly different from that of canola meal in CHO 2nd peak area (center at ca. 1091 cm-1, region: 1102-1083 cm-1) and functional group peak intensity ratios such as STCHO 1st peak (ca. 1415 cm-1) to 2nd peak (ca. 1374 cm-1) height ratio, CHO 1st peak (ca. 1149 cm-1) to 3rd peak (ca. 1032 cm-1) height ratio, CELC to total CHO area ratio and STCHO to CELC area ratio, indicating that carinata meal may not in full accord with canola meal in carbohydrate utilization and availability in ruminants. Carbohydrate conformation and spectral features were changed by significant interaction of meal type and incubation time and almost all the spectral parameters were significantly decreased (P < 0.05) during 48 h ruminal degradation in both carinata meal and canola meal. Although carinata meal differed from canola meal in some carbohydrate spectral parameters, multivariate results from agglomerative hierarchical cluster analysis and principal component analysis showed that both original and in situ residues of two meals were not fully distinguished from each other within carbohydrate spectral regions. It was concluded that carbohydrate structural conformation could be detected in carinata meal by using ATR-FT/IR techniques and further study is needed to explore more information on molecular spectral features of other functional group such as protein structure profile and their association with potential nutrient supply and availability of carinata meal in animals.
Xin, Hangshu; Yu, Peiqiang
2013-10-01
There is no information on the co-products from carinata bio-fuel and bio-oil processing (carinata meal) in molecular structural profiles mainly related to carbohydrate biopolymers in relation to ruminant nutrition. Molecular analyses with Fourier transform infrared spectroscopy (FT/IR) technique with attenuated total reflectance (ATR) and chemometrics enable to detect structural features on a molecular basis. The objectives of this study were to: (1) determine carbohydrate conformation spectral features in original carinata meal, co-products from bio-fuel/bio-oil processing; and (2) investigate differences in carbohydrate molecular composition and functional group spectral intensities after in situ ruminal fermentation at 0, 12, 24 and 48 h compared to canola meal as a reference. The molecular spectroscopic parameters of carbohydrate profiles detected were structural carbohydrates (STCHO, mainly associated with hemi-cellulosic and cellulosic compounds; region and baseline ca. 1483-1184 cm(-1)), cellulosic compounds (CELC, region and baseline ca. 1304-1184 cm(-1)), total carbohydrates (CHO, region and baseline ca. 1193-889cm(-1)) as well as the spectral ratios calculated based on respective spectral intensity data. The results showed that the spectral profiles of carinata meal were significantly different from that of canola meal in CHO 2nd peak area (center at ca. 1091 cm(-1), region: 1102-1083 cm(-1)) and functional group peak intensity ratios such as STCHO 1st peak (ca. 1415 cm(-1)) to 2nd peak (ca. 1374 cm(-1)) height ratio, CHO 1st peak (ca. 1149 cm(-1)) to 3rd peak (ca. 1032 cm(-1)) height ratio, CELC to total CHO area ratio and STCHO to CELC area ratio, indicating that carinata meal may not in full accord with canola meal in carbohydrate utilization and availability in ruminants. Carbohydrate conformation and spectral features were changed by significant interaction of meal type and incubation time and almost all the spectral parameters were significantly decreased (P<0.05) during 48 h ruminal degradation in both carinata meal and canola meal. Although carinata meal differed from canola meal in some carbohydrate spectral parameters, multivariate results from agglomerative hierarchical cluster analysis and principal component analysis showed that both original and in situ residues of two meals were not fully distinguished from each other within carbohydrate spectral regions. It was concluded that carbohydrate structural conformation could be detected in carinata meal by using ATR-FT/IR techniques and further study is needed to explore more information on molecular spectral features of other functional group such as protein structure profile and their association with potential nutrient supply and availability of carinata meal in animals. Copyright © 2013 Elsevier B.V. All rights reserved.
Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M
2016-09-01
To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire(SGRQ) score, acute exacerbation in the past one year, PEF variability and allergic dermatitis (P<0.05). (2) Four clusters were also identified by two-step cluster analysis as followings, cluster 1, COPD patients with moderate to severe airflow limitation; cluster 2, asthma and COPD patients with heavy smoking, airflow limitation and increased airways reversibility; cluster 3, patients having less smoking and normal pulmonary function with wheezing but no chronic cough; cluster 4, chronic bronchitis patients with normal pulmonary function and chronic cough. Significant differences were revealed regarding gender distribution, respiratory symptoms, pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, MMEF% pred, DLCO/VA% pred, RV% pred, PEF variability, total serum IgE level, cumulative tobacco cigarette consumption (pack-years), and SGRQ score (P<0.05). By different cluster analyses, distinct clinical phenotypes of chronic airway diseases are identified. Thus, individualized treatments may guide doctors to provide based on different phenotypes.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.
Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B
2017-11-01
Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The Use of Cluster Analysis in Typological Research on Community College Students
ERIC Educational Resources Information Center
Bahr, Peter Riley; Bielby, Rob; House, Emily
2011-01-01
One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
2008-05-12
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis
ERIC Educational Resources Information Center
Hwang, Heungsun; Dillon, William R.
2010-01-01
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…
Cluster Analysis of Minnesota School Districts. A Research Report.
ERIC Educational Resources Information Center
Cleary, James
The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…
NASA Technical Reports Server (NTRS)
Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Lamb, James W.; Hawkins, David; Hennessy, Ryan;
2012-01-01
We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.
Description and typology of intensive Chios dairy sheep farms in Greece.
Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G
2012-06-01
The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Vigre, Håkan; Domingues, Ana Rita Coutinho Calado; Pedersen, Ulrik Bo; Hald, Tine
2016-03-01
The aim of the project as the cluster analysis was to in part to develop a generic structured quantitative microbiological risk assessment (QMRA) model of human salmonellosis due to pork consumption in EU member states (MSs), and the objective of the cluster analysis was to group the EU MSs according to the relative contribution of different pathways of Salmonella in the farm-to-consumption chain of pork products. In the development of the model, by selecting a case study MS from each cluster the model was developed to represent different aspects of pig production, pork production, and consumption of pork products across EU states. The objective of the cluster analysis was to aggregate MSs into groups of countries with similar importance of different pathways of Salmonella in the farm-to-consumption chain using available, and where possible, universal register data related to the pork production and consumption in each country. Based on MS-specific information about distribution of (i) small and large farms, (ii) small and large slaughterhouses, (iii) amount of pork meat consumed, and (iv) amount of sausages consumed we used nonhierarchical and hierarchical cluster analysis to group the MSs. The cluster solutions were validated internally using statistic measures and externally by comparing the clustered MSs with an estimated human incidence of salmonellosis due to pork products in the MSs. Finally, each cluster was characterized qualitatively using the centroids of the clusters. © 2016 Society for Risk Analysis.
Statistical Significance for Hierarchical Clustering
Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.
2017-01-01
Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
The detection methods of dynamic objects
NASA Astrophysics Data System (ADS)
Knyazev, N. L.; Denisova, L. A.
2018-01-01
The article deals with the application of cluster analysis methods for solving the task of aircraft detection on the basis of distribution of navigation parameters selection into groups (clusters). The modified method of cluster analysis for search and detection of objects and then iterative combining in clusters with the subsequent count of their quantity for increase in accuracy of the aircraft detection have been suggested. The course of the method operation and the features of implementation have been considered. In the conclusion the noted efficiency of the offered method for exact cluster analysis for finding targets has been shown.
Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations
NASA Astrophysics Data System (ADS)
Žibert, Janez; Pražnikar, Jure
2012-09-01
The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of the diurnal concentration showed that various different day-profiles are presented in a cold period, while this is not the case for the hot season. Additional analysis of ship traffic and rain fall data showed that there is no statistically significant difference between the ship gross (bruto) registered tonnage (BRT) values in the case of BC and PM10 clusters, but that there is statistically significant differences between the rain fall in the BC and PM10 clusters. The wind-rose for clusters which included most days in the sampling period indicating that emitted PM10 and BC from Port of Koper were manly transported in the west direction over the sea and in the east direction, where there is in no populated area. Presented analysis showed that both BC and PM10 concentrations were driven by rain intensity and wind speed.
Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.
Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço
2017-11-01
The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.
Batch Computed Tomography Analysis of Projectiles
2016-05-01
error calculation. Projectiles are then grouped together according to the similarity of their components. Also discussed is graphical- cluster analysis...ballistic, armor, grouping, clustering 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF...Fig. 10 Graphical structure of 15 clusters of the jacket/core radii profiles with plots of the profiles contained within each cluster . The size of
ERIC Educational Resources Information Center
Raker, Jeffrey R.; Holme, Thomas A.
2014-01-01
A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…
NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques
2008-07-01
The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.
Identification and characterization of near-fatal asthma phenotypes by cluster analysis.
Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V
2015-09-01
Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei
2010-08-01
During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.
A Survey of Popular R Packages for Cluster Analysis
ERIC Educational Resources Information Center
Flynt, Abby; Dean, Nema
2016-01-01
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Using Cluster Analysis for Data Mining in Educational Technology Research
ERIC Educational Resources Information Center
Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.
2012-01-01
Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
Cluster Analysis of the Luria-Nebraska Neuropsychological Battery with Learning Disabled Adults.
ERIC Educational Resources Information Center
McCue, Michael; And Others
The study reports a cluster analysis of Luria-Nebraska Neuropsychological Battery sources of 25 learning disabled adults. The cluster analysis suggested the presence of three subgroups within this sample, one having high elevations on the Rhythm, Writing, Reading, and Arithmetic Rhythm scales, the second having an extremely high evelation on the…
A nonparametric clustering technique which estimates the number of clusters
NASA Technical Reports Server (NTRS)
Ramey, D. B.
1983-01-01
In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
Subgroups of physically abusive parents based on cluster analysis of parenting behavior and affect.
Haskett, Mary E; Smith Scott, Susan; Sabourin Ward, Caryn
2004-10-01
Cluster analysis of observed parenting and self-reported discipline was used to categorize 83 abusive parents into subgroups. A 2-cluster solution received support for validity. Cluster 1 parents were relatively warm, positive, sensitive, and engaged during interactions with their children, whereas Cluster 2 parents were relatively negative, disengaged or intrusive, and insensitive. Further, clusters differed in emotional health, parenting stress, perceptions of children, and problem solving. Children of parents in the 2 clusters differed on several indexes of social adjustment. Cluster 1 parents were similar to nonabusive parents (n = 66) on parenting and related constructs, but Cluster 2 parents differed from nonabusive parents on all clustering variables and many validation variables. Results highlight clinically relevant diversity in parenting practices and functioning among abusive parents. ((c) 2004 APA, all rights reserved).
Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.
Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R
2014-06-01
Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic inflammation. Published by Mosby, Inc.
Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis
Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.
2013-01-01
Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation. PMID:24332216
Identification and validation of asthma phenotypes in Chinese population using cluster analysis.
Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang
2017-10-01
Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P < .0001 for all comparisons). We identified 3 clinical clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Topic modeling for cluster analysis of large biological and medical datasets
2014-01-01
Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.
Zhao, Weizhong; Zou, Wen; Chen, James J
2014-01-01
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
Network Analysis Tools: from biological networks to clusters and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques
2008-01-01
Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.
ERIC Educational Resources Information Center
Viriyangkura, Yuwadee
2014-01-01
Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…
Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto
2018-01-01
Abstract Background A huge amount of literature suggests that adolescents’ health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. Methods The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students’ data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Results Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: ‘Alcohol drinking’, ‘Smoking’, ‘Physical activity’, ‘Screen time’, ‘Signs & symptoms’, ‘Healthy eating’, ‘Violence’ and ‘Sweet tooth’. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Conclusions Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany. PMID:27908972
Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto
2018-03-01
A huge amount of literature suggests that adolescents' health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students' data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: 'Alcohol drinking', 'Smoking', 'Physical activity', 'Screen time', 'Signs & symptoms', 'Healthy eating', 'Violence' and 'Sweet tooth'. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany.
Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.
2016-01-01
Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969
Gonzalez, Robert; Suppes, Trisha; Zeitzer, Jamie; McClung, Colleen; Tamminga, Carol; Tohen, Mauricio; Forero, Angelica; Dwivedi, Alok; Alvarado, Andres
2018-02-19
Multiple types of chronobiological disturbances have been reported in bipolar disorder, including characteristics associated with general activity levels, sleep, and rhythmicity. Previous studies have focused on examining the individual relationships between affective state and chronobiological characteristics. The aim of this study was to conduct a variable cluster analysis in order to ascertain how mood states are associated with chronobiological traits in bipolar I disorder (BDI). We hypothesized that manic symptomatology would be associated with disturbances of rhythm. Variable cluster analysis identified five chronobiological clusters in 105 BDI subjects. Cluster 1, comprising subjective sleep quality was associated with both mania and depression. Cluster 2, which comprised variables describing the degree of rhythmicity, was associated with mania. Significant associations between mood state and cluster analysis-identified chronobiological variables were noted. Disturbances of mood were associated with subjectively assessed sleep disturbances as opposed to objectively determined, actigraphy-based sleep variables. No associations with general activity variables were noted. Relationships between gender and medication classes in use and cluster analysis-identified chronobiological characteristics were noted. Exploratory analyses noted that medication class had a larger impact on these relationships than the number of psychiatric medications in use. In a BDI sample, variable cluster analysis was able to group related chronobiological variables. The results support our primary hypothesis that mood state, particularly mania, is associated with chronobiological disturbances. Further research is required in order to define these relationships and to determine the directionality of the associations between mood state and chronobiological characteristics.
Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G
2015-10-01
Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.
Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis
Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis
2016-01-01
Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230
Calibrating the Planck cluster mass scale with CLASH
NASA Astrophysics Data System (ADS)
Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.
2017-08-01
We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.
Another collision for the Coma cluster
NASA Technical Reports Server (NTRS)
Vikhlinin, A.; Forman, W.; Jones, C.
1996-01-01
The wavelet transform analysis of the Rosat position sensitive proportional counter (PSPC) images of the Coma cluster are presented. The analysis shows, on small scales, a substructure dominated by two extended sources surrounding the two bright clusters NGC 4874 and NGC 4889. On scales of about 2 arcmin to 3 arcmin, the analysis reveals a tail of X-ray emission originating near the cluster center, curving to the south and east for approximately 25 arcmin and ending near the galaxy NGC 4911. The results are interpreted in terms of a merger of a group, having a core mass of approximately 10(exp 13) solar mass, with the main body of the Coma cluster.
Leung, S C; Fung, W K; Wong, K H
1999-01-01
The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.
Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M
2016-01-01
Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu
2015-12-01
Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including cigarette smokers. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Twelve clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters were identified, including two characterized by high pack-year exposure to cigarette smoking and low FEV1/FVC. There were marked differences between the two clusters of cigarette smokers. One had high levels of circulating eosinophils, high IgE levels, and a high sinus disease score. The other was characterized by low levels of the same parameters. Sputum analysis revealed increased levels of IL-5 in the former cluster and increased levels of IL-6 and osteopontin in the latter. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 1 year later. This study reveals two distinct phenotypes of severe asthma in current and former cigarette smokers with potentially different biological pathways contributing to fixed airflow limitation. Clinical trial registered with www.umin.ac.jp (000003254).
The dynamics of cyclone clustering in re-analysis and a high-resolution climate model
NASA Astrophysics Data System (ADS)
Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len
2017-04-01
Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.
Wang, Z; Wang, W H; Wang, S L; Jin, J; Song, Y W; Liu, Y P; Ren, H; Fang, H; Tang, Y; Chen, B; Qi, S N; Lu, N N; Li, N; Tang, Y; Liu, X F; Yu, Z H; Li, Y X
2016-06-23
To find phenotypic subgroups of patients with pT1-2N0 invasive breast cancer by means of cluster analysis and estimate the prognosis and clinicopathological features of these subgroups. From 1999 to 2013, 4979 patients with pT1-2N0 invasive breast cancer were recruited for hierarchical clustering analysis. Age (≤40, 41-70, 70+ years), size of primary tumor, pathological type, grade of differentiation, microvascular invasion, estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER-2) were chosen as distance metric between patients. Hierarchical cluster analysis was performed using Ward's method. Cophenetic correlation coefficient (CPCC) and Spearman correlation coefficient were used to validate clustering structures. The CPCC was 0.603. The Spearman correlation coefficient was 0.617 (P<0.001), which indicated a good fit of hierarchy to the data. A twelve-cluster model seemed to best illustrate our patient cohort. Patients in cluster 5, 9 and 12 had best prognosis and were characterized by age >40 years, smaller primary tumor, lower histologic grade, positive ER and PR status, and mainly negative HER-2. Patients in the cluster 1 and 11 had the worst prognosis, The cluster 1 was characterized by a larger tumor, higher grade and negative ER and PR status, while the cluster 11 was characterized by positive microvascular invasion. Patients in other 7 clusters had a moderate prognosis, and patients in each cluster had distinctive clinicopathological features and recurrent patterns. This study identified distinctive clinicopathologic phenotypes in a large cohort of patients with pT1-2N0 breast cancer through hierarchical clustering and revealed different prognosis. This integrative model may help physicians to make more personalized decisions regarding adjuvant therapy.
Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B
2016-01-01
Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.
Almeida, Suzana C; George, Steven Z; Leite, Raquel D V; Oliveira, Anamaria S; Chaves, Thais C
2018-05-17
We aimed to empirically derive psychosocial and pain sensitivity subgroups using cluster analysis within a sample of individuals with chronic musculoskeletal pain (CMP) and to investigate derived subgroups for differences in pain and disability outcomes. Eighty female participants with CMP answered psychosocial and disability scales and were assessed for pressure pain sensitivity. A cluster analysis was used to derive subgroups, and analysis of variance (ANOVA) was used to investigate differences between subgroups. Psychosocial factors (kinesiophobia, pain catastrophizing, anxiety, and depression) and overall pressure pain threshold (PPT) were entered into the cluster analysis. Three subgroups were empirically derived: cluster 1 (high pain sensitivity and high psychosocial distress; n = 12) characterized by low overall PPT and high psychosocial scores; cluster 2 (high pain sensitivity and intermediate psychosocial distress; n = 39) characterized by low overall PPT and intermediate psychosocial scores; and cluster 3 (low pain sensitivity and low psychosocial distress; n = 29) characterized by high overall PPT and low psychosocial scores compared to the other subgroups. Cluster 1 showed higher values for mean pain intensity (F (2,77) = 10.58, p < 0.001) compared with cluster 3, and cluster 1 showed higher values for disability (F (2,77) = 3.81, p = 0.03) compared with both clusters 2 and 3. Only cluster 1 was distinct from cluster 3 according to both pain and disability outcomes. Pain catastrophizing, depression, and anxiety were the psychosocial variables that best differentiated the subgroups. Overall, these results call attention to the importance of considering pain sensitivity and psychosocial variables to obtain a more comprehensive characterization of CMP patients' subtypes.
Jurencák, Roman; Fritzler, Marvin; Tyrrell, Pascal; Hiraki, Linda; Benseler, Susanne; Silverman, Earl
2009-02-01
(1) To evaluate the spectrum of serum autoantibodies in pediatric-onset systemic lupus erythematosus (pSLE) with a focus on ethnic differences; (2) using cluster analysis, to identify patients with similar autoantibody patterns and to determine their clinical associations. A single-center cohort study of all patients with newly diagnosed pSLE seen over an 8-year period was performed. Ethnicity, clinical, and serological data were prospectively collected from 156/169 patients (92%). The frequencies of 10 selected autoantibodies among ethnic groups were compared. Cluster analysis identified groups of patients with similar autoantibody profiles. Associations of these groups with clinical and laboratory features of pSLE were examined. Among our 5 ethnic groups, there were differences only in the prevalence of anti-U1RNP and anti-Sm antibodies, which occurred more frequently in non-Caucasian patients (p < 0.0001, p < 0.01, respectively). Cluster analysis revealed 3 autoantibody clusters. Cluster 1 consisted of anti-dsDNA antibodies. Cluster 2 consisted of anti-dsDNA, antichromatin, antiribosomal P, anti-U1RNP, anti-Sm, anti-Ro and anti-La autoantibody. Cluster 3 consisted of anti-dsDNA, anti-RNP, and anti-Sm autoantibody. The highest proportion of Caucasians was in cluster 1 (p < 0.05), which was characterized by a mild disease with infrequent major organ involvement compared to cluster 2, which had the highest frequency of nephritis, renal failure, serositis, and hemolytic anemia, or cluster 3, which was characterized by frequent neuropsychiatric disease and nephritis. We observed ethnic differences in autoantibody profiles in pSLE. Autoantibodies tended to cluster together and these clusters were associated with different clinical courses.
The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions
NASA Astrophysics Data System (ADS)
Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.
2018-04-01
The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.
Clustering analysis strategies for electron energy loss spectroscopy (EELS).
Torruella, Pau; Estrader, Marta; López-Ortega, Alberto; Baró, Maria Dolors; Varela, Maria; Peiró, Francesca; Estradé, Sònia
2018-02-01
In this work, the use of cluster analysis algorithms, widely applied in the field of big data, is proposed to explore and analyze electron energy loss spectroscopy (EELS) data sets. Three different data clustering approaches have been tested both with simulated and experimental data from Fe 3 O 4 /Mn 3 O 4 core/shell nanoparticles. The first method consists on applying data clustering directly to the acquired spectra. A second approach is to analyze spectral variance with principal component analysis (PCA) within a given data cluster. Lastly, data clustering on PCA score maps is discussed. The advantages and requirements of each approach are studied. Results demonstrate how clustering is able to recover compositional and oxidation state information from EELS data with minimal user input, giving great prospects for its usage in EEL spectroscopy. Copyright © 2017 Elsevier B.V. All rights reserved.
Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart
2016-03-01
In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.
Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart
2016-01-01
In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882
Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje
2015-08-01
Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.
Applications of cluster analysis to satellite soundings
NASA Technical Reports Server (NTRS)
Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.
1984-01-01
The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.
Craen, Saskia de; Commandeur, Jacques J F; Frank, Laurence E; Heiser, Willem J
2006-06-01
K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these populations showed a significant effect of lack of sphericity and group size. This effect was, however, not as large as expected, with still a recovery index of more than 0.5 in the "worst case scenario." An interaction effect between the two data aspects was also found. The decreasing trend in the recovery of clusters for increasing departures from sphericity is different for equal and unequal group sizes.
Kanno, Chihiro; Sakamoto, Kentaro Q; Yanagawa, Yojiro; Takahashi, Yoshiyuki; Katagiri, Seiji; Nagano, Masashi
2017-08-04
In the present study, bull sperm in the first and second ejaculates were divided into subpopulations based on their motility characteristics using a cluster analysis of data from computer-assisted sperm motility analysis (CASA). Semen samples were collected from 4 Japanese black bulls. Data from 9,228 motile sperm were classified into 4 clusters; 1) very rapid and progressively motile sperm, 2) rapid and circularly motile sperm with widely moving heads, 3) moderately motile sperm with heads moving frequently in a short length, and 4) poorly motile sperm. The percentage of cluster 1 varied between bulls. The first ejaculates had a higher proportion of cluster 2 and lower proportion of cluster 3 than the second ejaculates.
cluML: A markup language for clustering and cluster validity assessment of microarray data.
Bolshakova, Nadia; Cunningham, Pádraig
2005-01-01
cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.
Cluster analysis of multiple planetary flow regimes
NASA Technical Reports Server (NTRS)
Mo, Kingtse; Ghil, Michael
1987-01-01
A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.
Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun
2017-01-01
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.
Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si
2017-07-01
Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
MatSeis and the GNEM R&E regional seismic anaylsis tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chael, Eric Paul; Hart, Darren M.; Young, Christopher John
2003-08-01
To improve the nuclear event monitoring capability of the U.S., the NNSA Ground-based Nuclear Explosion Monitoring Research & Engineering (GNEM R&E) program has been developing a collection of products known as the Knowledge Base (KB). Though much of the focus for the KB has been on the development of calibration data, we have also developed numerous software tools for various purposes. The Matlab-based MatSeis package and the associated suite of regional seismic analysis tools were developed to aid in the testing and evaluation of some Knowledge Base products for which existing applications were either not available or ill-suited. This presentationmore » will provide brief overviews of MatSeis and each of the tools, emphasizing features added in the last year. MatSeis was begun in 1996 and is now a fairly mature product. It is a highly flexible seismic analysis package that provides interfaces to read data from either flatfiles or an Oracle database. All of the standard seismic analysis tasks are supported (e.g. filtering, 3 component rotation, phase picking, event location, magnitude calculation), as well as a variety of array processing algorithms (beaming, FK, coherency analysis, vespagrams). The simplicity of Matlab coding and the tremendous number of available functions make MatSeis/Matlab an ideal environment for developing new monitoring research tools (see the regional seismic analysis tools below). New MatSeis features include: addition of evid information to events in MatSeis, options to screen picks by author, input and output of origerr information, improved performance in reading flatfiles, improved speed in FK calculations, and significant improvements to Measure Tool (filtering, multiple phase display), Free Plot (filtering, phase display and alignment), Mag Tool (maximum likelihood options), and Infra Tool (improved calculation speed, display of an F statistic stream). Work on the regional seismic analysis tools (CodaMag, EventID, PhaseMatch, and Dendro) began in 1999 and the tools vary in their level of maturity. All rely on MatSeis to provide necessary data (waveforms, arrivals, origins, and travel time curves). CodaMag Tool implements magnitude calculation by scaling to fit the envelope shape of the coda for a selected phase type (Mayeda, 1993; Mayeda and Walter, 1996). New tool features include: calculation of a yield estimate based on the source spectrum, display of a filtered version of the seismogram based on the selected band, and the output of codamag data records for processed events. EventID Tool implements event discrimination using phase ratios of regional arrivals (Hartse et al., 1997; Walter et al., 1999). New features include: bandpass filtering of displayed waveforms, screening of reference events based on SNR, multivariate discriminants, use of libcgi to access correction surfaces, and the output of discrim{_}data records for processed events. PhaseMatch Tool implements match filtering to isolate surface waves (Herrin and Goforth, 1977). New features include: display of the signal's observed dispersion and an option to use a station-based dispersion surface. Dendro Tool implements agglomerative hierarchical clustering using dendrograms to identify similar events based on waveform correlation (Everitt, 1993). New features include: modifications to include arrival information within the tool, and the capability to automatically add/re-pick arrivals based on the picked arrivals for similar events.« less
Spatiotemporal Analysis of the Ebola Hemorrhagic Fever in West Africa in 2014
NASA Astrophysics Data System (ADS)
Xu, M.; Cao, C. X.; Guo, H. F.
2017-09-01
Ebola hemorrhagic fever (EHF) is an acute hemorrhagic diseases caused by the Ebola virus, which is highly contagious. This paper aimed to explore the possible gathering area of EHF cases in West Africa in 2014, and identify endemic areas and their tendency by means of time-space analysis. We mapped distribution of EHF incidences and explored statistically significant space, time and space-time disease clusters. We utilized hotspot analysis to find the spatial clustering pattern on the basis of the actual outbreak cases. spatial-temporal cluster analysis is used to analyze the spatial or temporal distribution of agglomeration disease, examine whether its distribution is statistically significant. Local clusters were investigated using Kulldorff's scan statistic approach. The result reveals that the epidemic mainly gathered in the western part of Africa near north Atlantic with obvious regional distribution. For the current epidemic, we have found areas in high incidence of EVD by means of spatial cluster analysis.
The composite sequential clustering technique for analysis of multispectral scanner data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Comprehensive cluster analysis with Transitivity Clustering.
Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan
2011-03-01
Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.
Analysis of 3D vortex motion in a dusty plasma
NASA Astrophysics Data System (ADS)
Mulsow, M.; Himpel, M.; Melzer, A.
2017-12-01
Dust clusters of about 50-1000 particles have been confined near the sheath region of a gaseous radio-frequency plasma discharge. These compact clusters exhibit a vortex motion which has been reconstructed in full three dimensions from stereoscopy. Smaller clusters are found to show a competition between solid-like cluster structure and vortex motion, whereas larger clusters feature very pronounced vortices. From the three-dimensional analysis, the dust flow field has been found to be nearly incompressible. The vortices in all observed clusters are essentially poloidal. The dependence of the vorticity on the cluster size is discussed. Finally, the vortex motion has been quantitatively attributed to radial gradients of the ion drag force.
Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.
Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J
2016-04-01
Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Analysis of Tropical Cyclone Tracks in the North Indian Ocean
NASA Astrophysics Data System (ADS)
Patwardhan, A.; Paliwal, M.; Mohapatra, M.
2011-12-01
Cyclones are regarded as one of the most dangerous meteorological phenomena of the tropical region. The probability of landfall of a tropical cyclone depends on its movement (trajectory). Analysis of trajectories of tropical cyclones could be useful for identifying potentially predictable characteristics. There is long history of analysis of tropical cyclones tracks. A common approach is using different clustering techniques to group the cyclone tracks on the basis of certain characteristics. Various clustering method have been used to study the tropical cyclones in different ocean basins like western North Pacific ocean (Elsner and Liu, 2003; Camargo et al., 2007), North Atlantic Ocean (Elsner, 2003; Gaffney et al. 2007; Nakamura et al., 2009). In this study, tropical cyclone tracks in the North Indian Ocean basin, for the period 1961-2010 have been analyzed and grouped into clusters based on their spatial characteristics. A tropical cyclone trajectory is approximated as an open curve and described by its first two moments. The resulting clusters have different centroid locations and also differently shaped variance ellipses. These track characteristics are then used in the standard clustering algorithms which allow the whole track shape, length, and location to be incorporated into the clustering methodology. The resulting clusters have different genesis locations and trajectory shapes. We have also examined characteristics such as life span, maximum sustained wind speed, landfall, seasonality, many of which are significantly different across the identified clusters. The clustering approach groups cyclones with higher maximum wind speed and longest life span in to one cluster. Another cluster includes short duration cyclonic events that are mostly deep depressions and significant for rainfall over Eastern and Central India. The clustering approach is likely to prove useful for analysis of events of significance with regard to impacts.
Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T
2010-01-01
Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.
A Bayesian cluster analysis method for single-molecule localization microscopy data.
Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick
2016-12-01
Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
NASA Astrophysics Data System (ADS)
Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.
2017-01-01
This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.
ERIC Educational Resources Information Center
Brown, S. J.; White, S.; Power, N.
2015-01-01
A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review
Morris, Tom; Gray, Laura
2017-01-01
Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637
Toyoda, Hiromitsu; Takahashi, Shinji; Hoshino, Masatoshi; Takayama, Kazushi; Iseki, Kazumichi; Sasaoka, Ryuichi; Tsujio, Tadao; Yasuda, Hiroyuki; Sasaki, Takeharu; Kanematsu, Fumiaki; Kono, Hiroshi; Nakamura, Hiroaki
2017-09-23
This study demonstrated four distinct patterns in the course of back pain after osteoporotic vertebral fracture (OVF). Greater angular instability in the first 6 months after the baseline was one factor affecting back pain after OVF. Understanding the natural course of symptomatic acute OVF is important in deciding the optimal treatment strategy. We used latent class analysis to classify the course of back pain after OVF and identify the risk factors associated with persistent pain. This multicenter cohort study included 218 consecutive patients with ≤ 2-week-old OVFs who were enrolled at 11 institutions. Dynamic x-rays and back pain assessment with a visual analog scale (VAS) were obtained at enrollment and at 1-, 3-, and 6-month follow-ups. The VAS scores were used to characterize patient groups, using hierarchical cluster analysis. VAS for 128 patients was used for hierarchical cluster analysis. Analysis yielded four clusters representing different patterns of back pain progression. Cluster 1 patients (50.8%) had stable, mild pain. Cluster 2 patients (21.1%) started with moderate pain and progressed quickly to very low pain. Patients in cluster 3 (10.9%) had moderate pain that initially improved but worsened after 3 months. Cluster 4 patients (17.2%) had persistent severe pain. Patients in cluster 4 showed significant high baseline pain intensity, higher degree of angular instability, and higher number of previous OVFs, and tended to lack regular exercise. In contrast, patients in cluster 2 had significantly lower baseline VAS and less angular instability. We identified four distinct groups of OVF patients with different patterns of back pain progression. Understanding the course of back pain after OVF may help in its management and contribute to future treatment trials.
Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J
2013-06-01
Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.
Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA
NASA Astrophysics Data System (ADS)
Perren, G. I.; Piatti, A. E.; Vázquez, R. A.
2017-06-01
Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89
Variable Screening for Cluster Analysis.
ERIC Educational Resources Information Center
Donoghue, John R.
Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…
Spatial pattern recognition of seismic events in South West Colombia
NASA Astrophysics Data System (ADS)
Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber
2013-09-01
Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.
Cluster analysis of multiple planetary flow regimes
NASA Technical Reports Server (NTRS)
Mo, Kingtse; Ghil, Michael
1988-01-01
A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.
Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve
2015-01-01
Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, P<0.001). CD4 count group below 200 was associated with the all high occurrence rate symptom cluster (Fisher's exact=41, P<0.001). Symptom clusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high-symptom occurrence rates having a higher risk of poorer outcomes. Identification of symptom clusters could provide insights into commonly co-occurring symptoms that should be jointly targeted for management in patients with multiple complaints.
Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard
2015-01-01
Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.
Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour
2018-01-12
Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
Grimsley, Jasmine M S; Gadziola, Marie A; Wenstrup, Jeffrey J
2012-01-01
Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified 10 syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.
Micro-heterogeneity versus clustering in binary mixtures of ethanol with water or alkanes.
Požar, Martina; Lovrinčević, Bernarda; Zoranić, Larisa; Primorać, Tomislav; Sokolić, Franjo; Perera, Aurélien
2016-08-24
Ethanol is a hydrogen bonding liquid. When mixed in small concentrations with water or alkanes, it forms aggregate structures reminiscent of, respectively, the direct and inverse micellar aggregates found in emulsions, albeit at much smaller sizes. At higher concentrations, micro-heterogeneous mixing with segregated domains is found. We examine how different statistical methods, namely correlation function analysis, structure factor analysis and cluster distribution analysis, can describe efficiently these morphological changes in these mixtures. In particular, we explain how the neat alcohol pre-peak of the structure factor evolves into the domain pre-peak under mixing conditions, and how this evolution differs whether the co-solvent is water or alkane. This study clearly establishes the heuristic superiority of the correlation function/structure factor analysis to study the micro-heterogeneity, since cluster distribution analysis is insensitive to domain segregation. Correlation functions detect the domains, with a clear structure factor pre-peak signature, while the cluster techniques detect the cluster hierarchy within domains. The main conclusion is that, in micro-segregated mixtures, the domain structure is a more fundamental statistical entity than the underlying cluster structures. These findings could help better understand comparatively the radiation scattering experiments, which are sensitive to domains, versus the spectroscopy-NMR experiments, which are sensitive to clusters.
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Dimensional assessment of personality pathology in patients with eating disorders.
Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L
1999-02-22
This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.
NASA Astrophysics Data System (ADS)
Iswandhani, N.; Muhajir, M.
2018-03-01
This research was conducted in Department of Statistics Islamic University of Indonesia. The data used are primary data obtained by post @explorejogja instagram account from January until December 2016. In the @explorejogja instagram account found many tourist destinations that can be visited by tourists both in the country and abroad, Therefore it is necessary to form a cluster of existing tourist destinations based on the number of likes from user instagram assumed as the most popular. The purpose of this research is to know the most popular distribution of tourist spot, the cluster formation of tourist destinations, and central popularity of tourist destinations based on @explorejogja instagram account in 2016. Statistical analysis used is descriptive statistics, k-means clustering, and social network analysis. The results of this research were obtained the top 10 most popular destinations in Yogyakarta, map of html-based tourist destination distribution consisting of 121 tourist destination points, formed 3 clusters each consisting of cluster 1 with 52 destinations, cluster 2 with 9 destinations and cluster 3 with 60 destinations, and Central popularity of tourist destinations in the special region of Yogyakarta by district.
Nowrousian, Minou
2009-04-01
During fungal fruiting body development, hyphae aggregate to form multicellular structures that protect and disperse the sexual spores. Analysis of microarray data revealed a gene cluster strongly upregulated during fruiting body development in the ascomycete Sordaria macrospora. Real time PCR analysis showed that the genes from the orthologous cluster in Neurospora crassa are also upregulated during development. The cluster encodes putative polyketide biosynthesis enzymes, including a reducing polyketide synthase. Analysis of knockout strains of a predicted dehydrogenase gene from the cluster showed that mutants in N. crassa and S. macrospora are delayed in fruiting body formation. In addition to the upregulated cluster, the N. crassa genome comprises another cluster containing a polyketide synthase gene, and five additional reducing polyketide synthase (rpks) genes that are not part of clusters. To study the role of these genes in sexual development, expression of the predicted rpks genes in S. macrospora (five genes) and N. crassa (six genes) was analyzed; all but one are upregulated during sexual development. Analysis of knockout strains for the N. crassa rpks genes showed that one of them is essential for fruiting body formation. These data indicate that polyketides produced by RPKSs are involved in sexual development in filamentous ascomycetes.
Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.
Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J
2017-12-01
Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the limitations, trade-offs, and considerations for the sensitivities of variables and the biological interpretations of results. The Cluster app is available as a free download for Apple computers at iTunes, with a link to a user guide website.
A formal concept analysis approach to consensus clustering of multi-experiment expression data
2014-01-01
Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Clustering analysis for muon tomography data elaboration in the Muon Portal project
NASA Astrophysics Data System (ADS)
Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.
2015-05-01
Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Data depth based clustering analysis
Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...
2016-01-01
Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.
Conley, Samantha
2017-12-01
The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.
Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc
2018-01-01
In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
NASA Astrophysics Data System (ADS)
Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan
2017-12-01
Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.
Jadhav, Rohit R; Ye, Zhenqing; Huang, Rui-Lan; Liu, Joseph; Hsu, Pei-Yin; Huang, Yi-Wen; Rangel, Leticia B; Lai, Hung-Cheng; Roa, Juan Carlos; Kirma, Nameer B; Huang, Tim Hui-Ming; Jin, Victor X
2015-01-01
Recent genome-wide analysis has shown that DNA methylation spans long stretches of chromosome regions consisting of clusters of contiguous CpG islands or gene families. Hypermethylation of various gene clusters has been reported in many types of cancer. In this study, we conducted methyl-binding domain capture (MBDCap) sequencing (MBD-seq) analysis on a breast cancer cohort consisting of 77 patients and 10 normal controls, as well as a panel of 38 breast cancer cell lines. Bioinformatics analysis determined seven gene clusters with a significant difference in overall survival (OS) and further revealed a distinct feature that the conservation of a large gene cluster (approximately 70 kb) metallothionein-1 (MT1) among 45 species is much lower than the average of all RefSeq genes. Furthermore, we found that DNA methylation is an important epigenetic regulator contributing to gene repression of MT1 gene cluster in both ERα positive (ERα+) and ERα negative (ERα-) breast tumors. In silico analysis revealed much lower gene expression of this cluster in The Cancer Genome Atlas (TCGA) cohort for ERα + tumors. To further investigate the role of estrogen, we conducted 17β-estradiol (E2) and demethylating agent 5-aza-2'-deoxycytidine (DAC) treatment in various breast cancer cell types. Cell proliferation and invasion assays suggested MT1F and MT1M may play an anti-oncogenic role in breast cancer. Our data suggests that DNA methylation in large contiguous gene clusters can be potential prognostic markers of breast cancer. Further investigation of these clusters revealed that estrogen mediates epigenetic repression of MT1 cluster in ERα + breast cancer cell lines. In all, our studies identify thousands of breast tumor hypermethylated regions for the first time, in particular, discovering seven large contiguous hypermethylated gene clusters.
NASA Astrophysics Data System (ADS)
Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.
2017-07-01
Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.
Graph-Theoretic Analysis of Monomethyl Phosphate Clustering in Ionic Solutions.
Han, Kyungreem; Venable, Richard M; Bryant, Anne-Marie; Legacy, Christopher J; Shen, Rong; Li, Hui; Roux, Benoît; Gericke, Arne; Pastor, Richard W
2018-02-01
All-atom molecular dynamics simulations combined with graph-theoretic analysis reveal that clustering of monomethyl phosphate dianion (MMP 2- ) is strongly influenced by the types and combinations of cations in the aqueous solution. Although Ca 2+ promotes the formation of stable and large MMP 2- clusters, K + alone does not. Nonetheless, clusters are larger and their link lifetimes are longer in mixtures of K + and Ca 2+ . This "synergistic" effect depends sensitively on the Lennard-Jones interaction parameters between Ca 2+ and the phosphorus oxygen and correlates with the hydration of the clusters. The pronounced MMP 2- clustering effect of Ca 2+ in the presence of K + is confirmed by Fourier transform infrared spectroscopy. The characterization of the cation-dependent clustering of MMP 2- provides a starting point for understanding cation-dependent clustering of phosphoinositides in cell membranes.
Patterns of victimization between and within peer clusters in a high school social network.
Swartz, Kristin; Reyns, Bradford W; Wilcox, Pamela; Dunham, Jessica R
2012-01-01
This study presents a descriptive analysis of patterns of violent victimization between and within the various cohesive clusters of peers comprising a sample of more than 500 9th-12th grade students from one high school. Social network analysis techniques provide a visualization of the overall friendship network structure and allow for the examination of variation in victimization across the various peer clusters within the larger network. Social relationships among clusters with varying levels of victimization are also illustrated so as to provide a sense of possible spatial clustering or diffusion of victimization across proximal peer clusters. Additionally, to provide a sense of the sorts of peer clusters that support (or do not support) victimization, characteristics of clusters at both the high and low ends of the victimization scale are discussed. Finally, several of the peer clusters at both the high and low ends of the victimization continuum are "unpacked", allowing examination of within-network individual-level differences in victimization for these select clusters.
Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033
NASA Astrophysics Data System (ADS)
Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.
2018-04-01
Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
A Note on Cluster Effects in Latent Class Analysis
ERIC Educational Resources Information Center
Kaplan, David; Keller, Bryan
2011-01-01
This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…
Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.
Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah
2014-06-01
Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.
Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian
2015-01-01
In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.
Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F
2017-02-01
Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis
NASA Astrophysics Data System (ADS)
Fu, Pei-hua; Yin, Hong-bo
In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
Using cluster analysis to organize and explore regional GPS velocities
Simpson, Robert W.; Thatcher, Wayne; Savage, James C.
2012-01-01
Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.
InCHlib - interactive cluster heatmap for web applications.
Skuta, Ctibor; Bartůněk, Petr; Svozil, Daniel
2014-12-01
Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.
Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan
2017-01-01
Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.
Clustering of Health Behaviors and Cardiorespiratory Fitness Among U.S. Adolescents.
Hartz, Jacob; Yingling, Leah; Ayers, Colby; Adu-Brimpong, Joel; Rivers, Joshua; Ahuja, Chaarushi; Powell-Wiley, Tiffany M
2018-05-01
Decreased cardiorespiratory fitness (CRF) is associated with an increased risk of cardiovascular disease. However, little is known how the interaction of diet, physical activity (PA), and sedentary time (ST) affects CRF among adolescents. By using a nationally representative sample of U.S. adolescents, we used cluster analysis to investigate the interactions of these behaviors with CRF. We hypothesized that distinct clustering patterns exist and that less healthy clusters are associated with lower CRF. We used 2003-2004 National Health and Nutrition Examination Survey data for persons aged 12-19 years (N = 1,225). PA and ST were measured objectively by an accelerometer, and the American Heart Association Healthy Diet Score quantified diet quality. Maximal oxygen consumption (V˙O 2 max) was measured by submaximal treadmill exercise test. We performed cluster analysis to identify sex-specific clustering of diet, PA, and ST. Adjusting for accelerometer wear time, age, body mass index, race/ethnicity, and the poverty-to-income ratio, we performed sex-stratified linear regression analysis to evaluate the association of cluster with V˙O 2 max. Three clusters were identified for girls and boys. For girls, there was no difference across clusters for age (p = .1), weight (p = .3), and BMI (p = .5), and no relationship between clusters and V˙O 2 max. For boys, the youngest cluster (p < .01) had three healthy behaviors, weighed less, and was associated with a higher V˙O 2 max compared with the two older clusters. We observed clustering of diet, PA, and ST in U.S. adolescents. Specific patterns were associated with lower V˙O 2 max for boys, suggesting that our clusters may help identify adolescent boys most in need of interventions. Published by Elsevier Inc.
Patterns of Dysmorphic Features in Schizophrenia
Scutt, L.E.; Chow, E.W.C.; Weksberg, R.; Honer, W.G.; Bassett, Anne S.
2011-01-01
Congenital dysmorphic features are prevalent in schizophrenia and may reflect underlying neurodevelopmental abnormalities. A cluster analysis approach delineating patterns of dysmorphic features has been used in genetics to classify individuals into more etiologically homogeneous subgroups. In the present study, this approach was applied to schizophrenia, using a sample with a suspected genetic syndrome as a testable model. Subjects (n = 159) with schizophrenia or schizoaffective disorder were ascertained from chronic patient populations (random, n=123) or referred with possible 22q11 deletion syndrome (referred, n = 36). All subjects were evaluated for presence or absence of 70 reliably assessed dysmorphic features, which were used in a three-step cluster analysis. The analysis produced four major clusters with different patterns of dysmorphic features. Significant between-cluster differences were found for rates of 37 dysmorphic features (P < 0.05), median number of dysmorphic features (P = 0.0001), and validating features not used in the cluster analysis: mild mental retardation (P = 0.001) and congenital heart defects (P = 0.002). Two clusters (1 and 4) appeared to represent more developmental subgroups of schizophrenia with elevated rates of dysmorphic features and validating features. Cluster 1 (n = 27) comprised mostly referred subjects. Cluster 4 (n= 18) had a different pattern of dysmorphic features; one subject had a mosaic Turner syndrome variant. Two other clusters had lower rates and patterns of features consistent with those found in previous studies of schizophrenia. Delineating patterns of dysmorphic features may help identify subgroups that could represent neurodevelopmental forms of schizophrenia with more homogeneous origins. PMID:11803519
COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS*
Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A.
2017-01-01
Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 μg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP. PMID:28572869
ERIC Educational Resources Information Center
Kerr, Deirdre; Chung, Gregory K. W. K.; Iseli, Markus R.
2011-01-01
Analyzing log data from educational video games has proven to be a challenging endeavor. In this paper, we examine the feasibility of using cluster analysis to extract information from the log files that is interpretable in both the context of the game and the context of the subject area. If cluster analysis can be used to identify patterns of…
Dias, Claudia; Mendes, Luís
2018-01-01
Despite the importance of the literature on food quality labels in the European Union (PDO, PGI and TSG), our search did not find any review joining the various research topics on this subject. This study aims therefore to consolidate the state of academic research in this field, and so the methodological option was to elaborate a bibliometric analysis resorting to the term co-occurrence technique. Analysis was made of 501 articles on the ISI Web of Science database, covering publications up to 2016. The results of the bibliometric analysis allowed identification of four clusters: "Protected Geographical Indication", "Certification of Olive Oil and Cultivars", "Certification of Cheese and Milk" and "Certification and Chemical Composition". Unlike the other clusters, where the PDO label predominates, the "Protected Geographical Indication" cluster covers the study of PGI products, highlighting analysis of consumer behaviour in relation to this type of product. The focus of studies in the "Certification of Olive Oil and Cultivars" cluster and the "Certification of Cheese and Milk" cluster is the development of authentication methods for certified traditional products. In the "Certification and Chemical Composition" cluster, standing out is analysis of the profiles of fatty acids present in this type of product. Copyright © 2017 Elsevier Ltd. All rights reserved.
An improved K-means clustering algorithm in agricultural image segmentation
NASA Astrophysics Data System (ADS)
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data
NASA Astrophysics Data System (ADS)
Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.
2017-09-01
Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.
Romay-Tallon, Raquel; Rivera-Baltanas, Tania; Allen, Josh; Olivares, Jose M; Kalynchuk, Lisa E; Caruncho, Hector J
2017-01-01
The pattern of serotonin transporter clustering on the plasma membrane of lymphocytes extracted from human whole blood samples has been identified as a putative biomarker of therapeutic efficacy in major depression. Here we evaluated the possibility of performing a similar analysis using blood smears obtained from rats, and from control human subjects and depression patients. We hypothesized that we could optimize a protocol to make the analysis of serotonin protein clustering in blood smears comparable to the analysis of serotonin protein clustering using isolated lymphocytes. Our data indicate that blood smears require a longer fixation time and longer times of incubation with primary and secondary antibodies. In addition, one needs to optimize the image analysis settings for the analysis of smears. When these steps are followed, the quantitative analysis of both the number and size of serotonin transporter clusters on the plasma membrane of lymphocytes is similar using both blood smears and isolated lymphocytes. The development of this novel protocol will greatly facilitate the collection of appropriate samples by eliminating the necessity and cost of specialized personnel for drawing blood samples, and by being a less invasive procedure. Therefore, this protocol will help us advance the validation of membrane protein clustering in lymphocytes as a biomarker of therapeutic efficacy in major depression, and bring it closer to its clinical application.
Chen, Ling; Feng, Yanqin; Sun, Jianguo
2017-10-01
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.
Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban
2017-05-01
Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
NASA Astrophysics Data System (ADS)
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-01
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ˜0.1 dex for GCs with metallicities as high as [Fe/H] = -0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe I, Ca I, Si I, Ni I, and Ba II. The elements that show the greatest differences include Mg I and Zr I. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.
Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.
Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra
2016-11-20
The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics
ERIC Educational Resources Information Center
Chan, Julia Y. K.; Bauer, Christopher F.
2014-01-01
The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…
Analysis of large-scale gene expression data.
Sherlock, G
2000-04-01
The advent of cDNA and oligonucleotide microarray technologies has led to a paradigm shift in biological investigation, such that the bottleneck in research is shifting from data generation to data analysis. Hierarchical clustering, divisive clustering, self-organizing maps and k-means clustering have all been recently used to make sense of this mass of data.
ERIC Educational Resources Information Center
Hale, Robert L.; Dougherty, Donna
1988-01-01
Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…
Application of a Self-Similar Pressure Profile to Sunyaev-Zeldovich Effect Data from Galaxy Clusters
NASA Technical Reports Server (NTRS)
Mroczkowski, Tony; Bonamente, Max; Carlstrom, John E.; Culverhouse, Thomas L.; Greer, Christopher; Hawkins, David; Hennessy, Ryan; Joy, Marshall; Lamb, James W.; Leitch, Erik M.;
2009-01-01
We investigate the utility of a new, self-similar pressure profile for fitting Sunyaev-Zel'dovich (SZ) effect observations of galaxy clusters. Current SZ imaging instruments-such as the Sunyaev-Zel'dovich Array (SZA)- are capable of probing clusters over a large range in a physical scale. A model is therefore required that can accurately describe a cluster's pressure profile over a broad range of radii from the core of the cluster out to a significant fraction of the virial radius. In the analysis presented here, we fit a radial pressure profile derived from simulations and detailed X-ray analysis of relaxed clusters to SZA observations of three clusters with exceptionally high-quality X-ray data: A1835, A1914, and CL J1226.9+3332. From the joint analysis of the SZ and X-ray data, we derive physical properties such as gas mass, total mass, gas fraction and the intrinsic, integrated Compton y-parameter. We find that parameters derived from the joint fit to the SZ and X-ray data agree well with a detailed, independent X-ray-only analysis of the same clusters. In particular, we find that, when combined with X-ray imaging data, this new pressure profile yields an independent electron radial temperature profile that is in good agreement with spectroscopic X-ray measurements.
Murray, Nicholas P; Hunfalvay, Melissa
2017-02-01
Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
Bible, Joe; Beck, James D.; Datta, Somnath
2016-01-01
Summary Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497–505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study. PMID:26682911
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
Clustering in analytical chemistry.
Drab, Klaudia; Daszykowski, Michal
2014-01-01
Data clustering plays an important role in the exploratory analysis of analytical data, and the use of clustering methods has been acknowledged in different fields of science. In this paper, principles of data clustering are presented with a direct focus on clustering of analytical data. The role of the clustering process in the analytical workflow is underlined, and its potential impact on the analytical workflow is emphasized.
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.; Henson, Robert
2012-01-01
A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…
Characteristics of airflow and particle deposition in COPD current smokers
NASA Astrophysics Data System (ADS)
Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long
2017-11-01
A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.
A latent profile analysis of Asian American men's and women's adherence to cultural values.
Wong, Y Joel; Nguyen, Chi P; Wang, Shu-Yi; Chen, Weilin; Steinfeldt, Jesse A; Kim, Bryan S K
2012-07-01
The goal of this study was to identify diverse profiles of Asian American women's and men's adherence to values that are salient in Asian cultures (i.e., conformity to norms, family recognition through achievement, emotional self-control, collectivism, and humility). To this end, the authors conducted a latent profile analysis using the 5 subscales of the Asian American Values Scale-Multidimensional in a sample of 214 Asian Americans. The analysis uncovered a four-cluster solution. In general, Clusters 1 and 2 were characterized by relatively low and moderate levels of adherence to the 5 dimensions of cultural values, respectively. Cluster 3 was characterized by the highest level of adherence to the cultural value of family recognition through achievement, whereas Cluster 4 was typified by the highest levels of adherence to collectivism, emotional self-control, and humility. Clusters 3 and 4 were associated with higher levels of depressive symptoms than Cluster 1. Furthermore, Asian American women and Asian American men had lower odds of being in Cluster 4 and Cluster 3, respectively. These findings attest to the importance of identifying specific patterns of adherence to cultural values when examining the relationship between Asian Americans' cultural orientation and mental health status.
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.
Kristunas, Caroline; Morris, Tom; Gray, Laura
2017-11-15
To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Fučkar, Neven-Stjepan; Guemas, Virginie; Massonnet, François; Doblas-Reyes, Francisco
2015-04-01
Over the modern observational era, the northern hemisphere sea ice concentration, age and thickness have experienced a sharp long-term decline superimposed with strong internal variability. Hence, there is a crucial need to identify robust patterns of Arctic sea ice variability on interannual timescales and disentangle them from the long-term trend in noisy datasets. The principal component analysis (PCA) is a versatile and broadly used method for the study of climate variability. However, the PCA has several limiting aspects because it assumes that all modes of variability have symmetry between positive and negative phases, and suppresses nonlinearities by using a linear covariance matrix. Clustering methods offer an alternative set of dimension reduction tools that are more robust and capable of taking into account possible nonlinear characteristics of a climate field. Cluster analysis aggregates data into groups or clusters based on their distance, to simultaneously minimize the distance between data points in a given cluster and maximize the distance between the centers of the clusters. We extract modes of Arctic interannual sea-ice variability with nonhierarchical K-means cluster analysis and investigate the mechanisms leading to these modes. Our focus is on the sea ice thickness (SIT) as the base variable for clustering because SIT holds most of the climate memory for variability and predictability on interannual timescales. We primarily use global reconstructions of sea ice fields with a state-of-the-art ocean-sea-ice model, but we also verify the robustness of determined clusters in other Arctic sea ice datasets. Applied cluster analysis over the 1958-2013 period shows that the optimal number of detrended SIT clusters is K=3. Determined SIT cluster patterns and their time series of occurrence are rather similar between different seasons and months. Two opposite thermodynamic modes are characterized with prevailing negative or positive SIT anomalies over the Arctic basin. The intermediate mode, with negative anomalies centered on the East Siberian shelf and positive anomalies along the North American side of the basin, has predominately dynamic characteristics. The associated sea ice concentration (SIC) clusters vary more between different seasons and months, but the SIC patterns are physically framed by the SIT cluster patterns.
Differences in Pedaling Technique in Cycling: A Cluster Analysis.
Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A
2016-10-01
To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.
Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.
Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D; Ober, Carole; Nicolae, Dan L; Barnes, Kathleen C; London, Stephanie J; Gilliland, Frank; Weiss, Scott T; Raby, Benjamin A; Cohn, Lauren; Chupp, Geoffrey L
2015-05-15
The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10(-6)) and hospitalization (P = 0.01), respectively. There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma.
Atlas-guided cluster analysis of large tractography datasets.
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
NASA Astrophysics Data System (ADS)
Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.
2018-02-01
This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
NASA Astrophysics Data System (ADS)
Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann
2017-07-01
Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
NASA Astrophysics Data System (ADS)
Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.
2014-04-01
An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.
Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma
Yan, Xiting; Chu, Jen-Hwa; Gomez, Jose; Koenigs, Maria; Holm, Carole; He, Xiaoxuan; Perez, Mario F.; Zhao, Hongyu; Mane, Shrikant; Martinez, Fernando D.; Ober, Carole; Nicolae, Dan L.; Barnes, Kathleen C.; London, Stephanie J.; Gilliland, Frank; Weiss, Scott T.; Raby, Benjamin A.; Cohn, Lauren
2015-01-01
Rationale: The airway transcriptome includes genes that contribute to the pathophysiologic heterogeneity seen in individuals with asthma. Objectives: We analyzed sputum gene expression for transcriptomic endotypes of asthma (TEA), gene signatures that discriminate phenotypes of disease. Methods: Gene expression in the sputum and blood of patients with asthma was measured using Affymetrix microarrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes was used to identify TEA clusters. Logistic regression analysis of matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma to replicate clinical phenotypes. Measurements and Main Results: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P = 0.05), a lower prebronchodilator FEV1 (P = 0.006), a higher bronchodilator response (P = 0.03), and higher exhaled nitric oxide levels (P = 0.04) compared with the other TEA clusters. TEA cluster 2, the smallest cluster, had the most subjects that were hospitalized for asthma (P = 0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P = 5.58 × 10−6) and hospitalization (P = 0.01), respectively. Conclusions: There are common patterns of gene expression in the sputum and blood of children and adults that are associated with near-fatal, severe, and milder asthma. PMID:25763605
Transcriptional and Chromatin Dynamics of Muscle Regeneration After Severe Trauma
2016-10-12
performed pathway analysis of the time-clustered RNA- Seq data16 and showed an initial burst of pro-inflammatory and immune-response transcripts in the...143 showed dynamic behavior (See Methods) and analysis of the dynamic miRNAs reinforced many of the results observed from the RNA-Seq datasets...excellent agreement was viewed. Hierarchical clustering of the datasets through time revealed 5 clusters, and gene ontology (GO) analysis of the
On-Line Pattern Analysis and Recognition System. OLPARS VI. Software Reference Manual,
1982-06-18
Discriminant Analysis Data Transformation, Feature Extraction, Feature Evaluation Cluster Analysis, Classification Computer Software 20Z. ABSTRACT... cluster /scatter cut-off value, (2) change the one-space bin factor, (3) change from long prompts to short prompts or vice versa, (4) change the...value, a cluster plot is displayed, otherwise a scatter plot is shown. if option 1 is selected, the program requests that a new value be input
Feng, Sujuan; Qian, Xiaosong; Li, Han; Zhang, Xiaodong
2017-12-01
The aim of the present study was to investigate the effectiveness of the miR-17-92 cluster as a disease progression marker in prostate cancer (PCa). Reverse transcription-quantitative polymerase chain reaction analysis was used to detect the microRNA (miR)-17-92 cluster expression levels in tissues from patients with PCa or benign prostatic hyperplasia (BPH), in addition to in PCa and BPH cell lines. Spearman correlation was used for comparison and estimation of correlations between miRNA expression levels and clinicopathological characteristics such as the Gleason score and prostate-specific antigen (PSA). Receiver operating curve (ROC) analysis was performed for evaluation of specificity and sensitivity of miR-17-92 cluster expression levels for discriminating patients with PCa from patients with BPH. Kaplan-Meier analysis was plotted to investigate the predictive potential of miR-17-92 cluster for PCa biochemical recurrence. Expression of the majority of miRNAs in the miR-17-92 cluster was identified to be significantly increased in PCa tissues and cell lines. Bivariate correlation analysis indicated that the high expression of unregulated miRNAs was positively correlated with Gleason grade, but had no significant association with PSA. ROC curves demonstrated that high expression of miR-17-92 cluster predicted a higher diagnostic accuracy compared with PSA. Improved discriminating quotients were observed when combinations of unregulated miRNAs with PSA were used. Survival analysis confirmed a high combined miRNA score of miR-17-92 cluster was associated with shorter biochemical recurrence interval. miR-17-92 cluster could be a potential diagnostic and prognostic biomarker for PCa, and the combination of the miR-17-92 cluster and serum PSA may enhance the accuracy for diagnosis of PCa.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu; Hammond, Karl D.
We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes ofmore » helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.« less
Freitas-Vilela, Ana Amélia; Smith, Andrew D A C; Kac, Gilberto; Pearson, Rebecca M; Heron, Jon; Emond, Alan; Hibbeln, Joseph R; Castro, Maria Beatriz Trindade; Emmett, Pauline M
2017-04-01
Little is known about how dietary patterns of mothers and their children track over time. The objectives of this study are to obtain dietary patterns in pregnancy using cluster analysis, to examine women's mean nutrient intakes in each cluster and to compare the dietary patterns of mothers to those of their children. Pregnant women (n = 12 195) from the Avon Longitudinal Study of Parents and Children reported their frequency of consumption of 47 foods and food groups. These data were used to obtain dietary patterns during pregnancy by cluster analysis. The absolute and energy-adjusted nutrient intakes were compared between clusters. Women's dietary patterns were compared with previously derived clusters of their children at 7 years of age. Multinomial logistic regression was performed to evaluate relationships comparing maternal and offspring clusters. Three maternal clusters were identified: 'fruit and vegetables', 'meat and potatoes' and 'white bread and coffee'. After energy adjustment women in the 'fruit and vegetables' cluster had the highest mean nutrient intakes. Mothers in the 'fruit and vegetables' cluster were more likely than mothers in 'meat and potatoes' (adjusted odds ratio [OR]: 2.00; 95% Confidence Interval [CI]: 1.69-2.36) or 'white bread and coffee' (OR: 2.18; 95% CI: 1.87-2.53) clusters to have children in a 'plant-based' cluster. However the majority of children were in clusters unrelated to their mother dietary pattern. Three distinct dietary patterns were obtained in pregnancy; the 'fruit and vegetables' pattern being the most nutrient dense. Mothers' dietary patterns were associated with but did not dominate offspring dietary patterns. © 2016 The Authors. Maternal & Child Nutrition published by John Wiley & Sons Ltd.
Cluster analysis of sputum cytokine-high profiles reveals diversity in T(h)2-high asthma patients.
Seys, Sven F; Scheers, Hans; Van den Brande, Paul; Marijsse, Gudrun; Dilissen, Ellen; Van Den Bergh, Annelies; Goeminne, Pieter C; Hellings, Peter W; Ceuppens, Jan L; Dupont, Lieven J; Bullens, Dominique M A
2017-02-23
Asthma is characterized by a heterogeneous inflammatory profile and can be subdivided into T(h)2-high and T(h)2-low airway inflammation. Profiling of a broader panel of airway cytokines in large unselected patient cohorts is lacking. Patients (n = 205) were defined as being "cytokine-low/high" if sputum mRNA expression of a particular cytokine was outside the respective 10 th /90 th percentile range of the control group (n = 80). Unsupervised hierarchical clustering was used to determine clusters based on sputum cytokine profiles. Half of patients (n = 108; 52.6%) had a classical T(h)2-high ("IL-4-, IL-5- and/or IL-13-high") sputum cytokine profile. Unsupervised cluster analysis revealed 5 clusters. Patients with an "IL-4- and/or IL-13-high" pattern surprisingly did not cluster but were equally distributed among the 5 clusters. Patients with an "IL-5-, IL-17A-/F- and IL-25- high" profile were restricted to cluster 1 (n = 24) with increased sputum eosinophil as well as neutrophil counts and poor lung function parameters at baseline and 2 years later. Four other clusters were identified: "IL-5-high or IL-10-high" (n = 16), "IL-6-high" (n = 8), "IL-22-high" (n = 25). Cluster 5 (n = 132) consists of patients without "cytokine-high" pattern or patients with only high IL-4 and/or IL-13. We identified 5 unique asthma molecular phenotypes by biological clustering. Type 2 cytokines cluster with non-type 2 cytokines in 4 out of 5 clusters. Unsupervised analysis thus not supports a priori type 2 versus non-type 2 molecular phenotypes. www.clinicaltrials.gov NCT01224938. Registered 18 October 2010.
Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging
NASA Astrophysics Data System (ADS)
Spinelli, Antonello E.; Boschi, Federico
2011-12-01
Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.
El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane
2018-01-01
Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.
An improved clustering algorithm based on reverse learning in intelligent transportation
NASA Astrophysics Data System (ADS)
Qiu, Guoqing; Kou, Qianqian; Niu, Ting
2017-05-01
With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.
Cluster randomised trials in the medical literature: two bibliometric surveys
Bland, J Martin
2004-01-01
Background Several reviews of published cluster randomised trials have reported that about half did not take clustering into account in the analysis, which was thus incorrect and potentially misleading. In this paper I ask whether cluster randomised trials are increasing in both number and quality of reporting. Methods Computer search for papers on cluster randomised trials since 1980, hand search of trial reports published in selected volumes of the British Medical Journal over 20 years. Results There has been a large increase in the numbers of methodological papers and of trial reports using the term 'cluster random' in recent years, with about equal numbers of each type of paper. The British Medical Journal contained more such reports than any other journal. In this journal there was a corresponding increase over time in the number of trials where subjects were randomised in clusters. In 2003 all reports showed awareness of the need to allow for clustering in the analysis. In 1993 and before clustering was ignored in most such trials. Conclusion Cluster trials are becoming more frequent and reporting is of higher quality. Perhaps statistician pressure works. PMID:15310402
Applied anatomic site study of palatal anchorage implants using cone beam computed tomography.
Lai, Ren-fa; Zou, Hui; Kong, Wei-dong; Lin, Wei
2010-06-01
The purpose of this study was to conduct quantitative research on bone height and bone mineral density of palatal implant sites for implantation, and to provide reference sites for safe and stable palatal implants. Three-dimensional reformatting images were reconstructed by cone beam computed tomography (CBCT) in 34 patients, aged 18 to 35 years, using EZ Implant software. Bone height was measured at 20 sites of interest on the palate. Bone mineral density was measured at the 10 sites with the highest implantation rate, classified using K-mean cluster analysis based on bone height and bone mineral density. According to the cluster analysis, 10 sites were classified into three clusters. Significant differences in bone height and bone mineral density were detected between these three clusters (P<0.05). The greatest bone height was obtained in cluster 2, followed by cluster 1 and cluster 3. The highest bone mineral density was found in cluster 3, followed by cluster 1 and cluster 2. CBCT plays an important role in pre-surgical treatment planning. CBCT is helpful in identifying safe and stable implantation sites for palatal anchorage.
Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar
2018-06-07
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.
Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A
2018-01-01
Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
The Psychology of Yoga Practitioners: A Cluster Analysis.
Genovese, Jeremy E C; Fondran, Kristine M
2017-11-01
Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall -Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.
The Psychology of Yoga Practitioners: A Cluster Analysis.
Genovese, Jeremy E C; Fondran, Kristine M
2017-03-30
Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.
Determining the Optimal Number of Clusters with the Clustergram
NASA Technical Reports Server (NTRS)
Fluegemann, Joseph K.; Davies, Misty D.; Aguirre, Nathan D.
2011-01-01
Cluster analysis aids research in many different fields, from business to biology to aerospace. It consists of using statistical techniques to group objects in large sets of data into meaningful classes. However, this process of ordering data points presents much uncertainty because it involves several steps, many of which are subject to researcher judgment as well as inconsistencies depending on the specific data type and research goals. These steps include the method used to cluster the data, the variables on which the cluster analysis will be operating, the number of resulting clusters, and parts of the interpretation process. In most cases, the number of clusters must be guessed or estimated before employing the clustering method. Many remedies have been proposed, but none is unassailable and certainly not for all data types. Thus, the aim of current research for better techniques of determining the number of clusters is generally confined to demonstrating that the new technique excels other methods in performance for several disparate data types. Our research makes use of a new cluster-number-determination technique based on the clustergram: a graph that shows how the number of objects in the cluster and the cluster mean (the ordinate) change with the number of clusters (the abscissa). We use the features of the clustergram to make the best determination of the cluster-number.
First CCD UBVI photometric analysis of six open cluster candidates
NASA Astrophysics Data System (ADS)
Piatti, A. E.; Clariá, J. J.; Ahumada, A. V.
2011-04-01
We have obtained CCD UBVIKC photometry down to V ˜ 22 for the open cluster candidates Haffner 3, Haffner 5, NGC 2368, Haffner 25, Hogg 3 and Hogg 4 and their surrounding fields. None of these objects have been photometrically studied so far. Our analysis shows that these stellar groups are not genuine open clusters since no clear main sequences or other meaningful features can be seen in their colour-magnitude and colour-colour diagrams. We checked for possible differential reddening across the studied fields that could be hiding the characteristics of real open clusters. However, the dust in the directions to these objects appears to be uniformly distributed. Moreover, star counts carried out within and outside the open cluster candidate fields do not support the hypothesis that these objects are real open clusters or even open cluster remnants.
Ringdal, Kjetil G; Skaga, Nils Oddvar; Hestnes, Morten; Steen, Petter Andreas; Røislien, Jo; Rehn, Marius; Røise, Olav; Krüger, Andreas J; Lossius, Hans Morten
2013-05-01
Injury severity is most frequently classified using the Abbreviated Injury Scale (AIS) as a basis for the Injury Severity Score (ISS) and the New Injury Severity Score (NISS), which are used for assessment of overall injury severity in the multiply injured patient and in outcome prediction. European trauma registries recommended the AIS 2008 edition, but the levels of inter-rater agreement and reliability of ISS and NISS, associated with its use, have not been reported. Nineteen Norwegian AIS-certified trauma registry coders were invited to score 50 real, anonymised patient medical records using AIS 2008. Rater agreements for ISS and NISS were analysed using Bland-Altman plots with 95% limits of agreement (LoA). A clinically acceptable LoA range was set at ± 9 units. Reliability was analysed using a two-way mixed model intraclass correlation coefficient (ICC) statistics with corresponding 95% confidence intervals (CI) and hierarchical agglomerative clustering. Ten coders submitted their coding results. Of their AIS codes, 2189 (61.5%) agreed with a reference standard, 1187 (31.1%) real injuries were missed, and 392 non-existing injuries were recorded. All LoAs were wider than the predefined, clinically acceptable limit of ± 9, for both ISS and NISS. The joint ICC (range) between each rater and the reference standard was 0.51 (0.29,0.86) for ISS and 0.51 (0.27,0.78) for NISS. The joint ICC (range) for inter-rater reliability was 0.49 (0.19,0.85) for ISS and 0.49 (0.16,0.82) for NISS. Univariate linear regression analyses indicated a significant relationship between the number of correctly AIS-coded injuries and total number of cases coded during the rater's career, but no significant relationship between the rater-against-reference ISS and NISS ICC values and total number of cases coded during the rater's career. Based on AIS 2008, ISS and NISS were not reliable for summarising anatomic injury severity in this study. This result indicates a limitation in their use as benchmarking tools for trauma system performance. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Weil, Gilad; Lensky, Itamar M.; Levin, Noam
2017-10-01
The spectral reflectance of most plant species is quite similar, and thus the feasibility of identifying most plant species based on single date multispectral data is very low. Seasonal phenological patterns of plant species may enable to face the challenge of using remote sensing for mapping plant species at the individual level. We used a consumer-grade digital camera with near infra-red capabilities in order to extract and quantify vegetation phenological information in four East Mediterranean sites. After illumination corrections and other noise reduction steps, the phenological patterns of 1839 individuals representing 12 common species were analyzed, including evergreen trees, winter deciduous trees, semi-deciduous summer shrubs and annual herbaceous patches. Five vegetation indices were used to describe the phenology: relative green and red (green/red chromatic coordinate), excess green (ExG), normalized difference vegetation index (NDVI) and green-red vegetation index (GRVI). We found significant differences between the phenology of the various species, and defined the main phenological groups using agglomerative hierarchical clustering. Differences between species and sites regarding the start of season (SOS), maximum of season (MOS) and end of season (EOS) were displayed in detail, using ExG values, as this index was found to have the lowest percentage of outliers. An additional visible band spectral index (relative red) was found as useful for characterizing seasonal phenology, and had the lowest correlation with the other four vegetation indices, which are more sensitive to greenness. We used a linear mixed model in order to evaluate the influences of various factors on the phenology, and found that unlike the significant effect of species and individuals on SOS, MOS and EOS, the sites' location did not have a direct significant effect on the timing of phenological events. In conclusion, the relative advantage of the proposed methodology is the exploitation of representative temporal information that is collected with accessible and simple devices, for the subsequent determination of optimal temporal acquisition of images by overhead sensors, for vegetation mapping over larger areas.
NASA Astrophysics Data System (ADS)
Hozé, Nathanaël; Holcman, David
2012-01-01
We develop a coagulation-fragmentation model to study a system composed of a small number of stochastic objects moving in a confined domain, that can aggregate upon binding to form local clusters of arbitrary sizes. A cluster can also dissociate into two subclusters with a uniform probability. To study the statistics of clusters, we combine a Markov chain analysis with a partition number approach. Interestingly, we obtain explicit formulas for the size and the number of clusters in terms of hypergeometric functions. Finally, we apply our analysis to study the statistical physics of telomeres (ends of chromosomes) clustering in the yeast nucleus and show that the diffusion-coagulation-fragmentation process can predict the organization of telomeres.
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.
ERIC Educational Resources Information Center
Carter, Randy L.; And Others
1989-01-01
The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
ERIC Educational Resources Information Center
Hofmann, Richard J.
A very general model for the computation of independent cluster solutions in factor analysis is presented. The model is discussed as being either orthogonal or oblique. Furthermore, it is demonstrated that for every orthogonal independent cluster solution there is an oblique analog. Using three illustrative examples, certain generalities are made…
A Constraint-Based Approach to Acquisition of Word-Final Consonant Clusters in Turkish Children
ERIC Educational Resources Information Center
Gokgoz-Kurt, Burcu
2017-01-01
The current study provides a constraint-based analysis of L1 word-final consonant cluster acquisition in Turkish child language, based on the data originally presented by Topbas and Kopkalli-Yavuz (2008). The present analysis was done using [?]+obstruent consonant cluster acquisition. A comparison of Gradual Learning Algorithm (GLA) under…
Ogden, Lorraine G; Stroebele, Nanette; Wyatt, Holly R; Catenacci, Victoria A; Peters, John C; Stuht, Jennifer; Wing, Rena R; Hill, James O
2012-10-01
The National Weight Control Registry (NWCR) is the largest ongoing study of individuals successful at maintaining weight loss; the registry enrolls individuals maintaining a weight loss of at least 13.6 kg (30 lb) for a minimum of 1 year. The current report uses multivariate latent class cluster analysis to identify unique clusters of individuals within the NWCR that have distinct experiences, strategies, and attitudes with respect to weight loss and weight loss maintenance. The cluster analysis considers weight and health history, weight control behaviors and strategies, effort and satisfaction with maintaining weight, and psychological and demographic characteristics. The analysis includes 2,228 participants enrolled between 1998 and 2002. Cluster 1 (50.5%) represents a weight-stable, healthy, exercise conscious group who are very satisfied with their current weight. Cluster 2 (26.9%) has continuously struggled with weight since childhood; they rely on the greatest number of resources and strategies to lose and maintain weight, and report higher levels of stress and depression. Cluster 3 (12.7%) represents a group successful at weight reduction on the first attempt; they were least likely to be overweight as children, are maintaining the longest duration of weight loss, and report the least difficulty maintaining weight. Cluster 4 (9.9%) represents a group less likely to use exercise to control weight; they tend to be older, eat fewer meals, and report more health problems. Further exploration of the unique characteristics of these clusters could be useful for tailoring future weight loss and weight maintenance programs to the specific characteristics of an individual.
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.
Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana
2016-08-31
Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.
Potential of SNP markers for the characterization of Brazilian cassava germplasm.
de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte
2014-06-01
High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
NASA Astrophysics Data System (ADS)
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-10
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clustersmore » may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.« less
Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis
2016-10-01
AWARD NUMBER: W81XWH-15-2-0032 TITLE: Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis PRINCIPAL...4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis 5b...Public Release; Distribution Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT The subject of the project is FY14 PRMRP Topic Area – Tinnitus . The broad
2017-01-30
dynamic structural time- history response analysis of flexible approach walls founded on clustered pile groups using Impact_Deck. In Preparation, ERDC...research (Ebeling et al. 2012) has developed simplified analysis procedures for flexible approach wall systems founded on clustered groups of vertical...history response analysis of flexible approach walls founded on clustered pile groups using Impact_Deck. In Preparation, ERDC/ITL TR-16-X. Vicksburg, MS
NASA Technical Reports Server (NTRS)
Wharton, S. W.
1980-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets.
Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis
Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.
2012-01-01
Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360
Using cluster analysis for medical resource decision making.
Dilts, D; Khamalah, J; Plotkin, A
1995-01-01
Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F; Perez, Danny
2017-10-21
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
NASA Astrophysics Data System (ADS)
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F.; Perez, Danny
2017-10-01
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
Pérez-Rodrigo, Carmen; Gil, Ángel; González-Gross, Marcela; Ortega, Rosa M.; Serra-Majem, Lluis; Varela-Moreiras, Gregorio; Aranceta-Bartrina, Javier
2015-01-01
Weight gain has been associated with behaviors related to diet, sedentary lifestyle, and physical activity. We investigated dietary patterns and possible meaningful clustering of physical activity, sedentary behavior, and sleep time in Spanish children and adolescents and whether the identified clusters could be associated with overweight. Analysis was based on a subsample (n = 415) of the cross-sectional ANIBES study in Spain. We performed exploratory factor analysis and subsequent cluster analysis of dietary patterns, physical activity, sedentary behaviors, and sleep time. Logistic regression analysis was used to explore the association between the cluster solutions and overweight. Factor analysis identified four dietary patterns, one reflecting a profile closer to the traditional Mediterranean diet. Dietary patterns, physical activity behaviors, sedentary behaviors and sleep time on weekdays in Spanish children and adolescents clustered into two different groups. A low physical activity-poorer diet lifestyle pattern, which included a higher proportion of girls, and a high physical activity, low sedentary behavior, longer sleep duration, healthier diet lifestyle pattern. Although increased risk of being overweight was not significant, the Prevalence Ratios (PRs) for the low physical activity-poorer diet lifestyle pattern were >1 in children and in adolescents. The healthier lifestyle pattern included lower proportions of children and adolescents from low socioeconomic status backgrounds. PMID:26729155
NASA Astrophysics Data System (ADS)
Yu, P.; Block, H. C.; Doiron, K.
2009-01-01
Conventional "wet" chemical analyses rely heavily on the use of harsh chemicals and derivatization, thereby altering native seed structures leaving them unable to detect any original inherent structures within an intact tissue sample. A synchrotron is a giant particle accelerator that turns electrons into light (million times brighter than sunlight) which can be used to study the structure of materials at the molecular level. Synchrotron radiation-based Fourier transform IR microspectroscopy (SR-FTIRM) has been developed as a rapid, direct, non-destructive and bioanalytical technique. This technique, taking advantage of the brightness of synchrotron light and a small effective source size, is capable of exploring the molecular chemistry within the microstructures of a biological tissue without the destruction of inherent structures at ultraspatial resolutions within cellular dimensions. This is in contrast to traditional 'wet' chemical methods, which, during processing for analysis, often result in the destruction of the intrinsic structures of feeds. To date there has been very little application of this technique to the study of plant seed tissue in relation to nutrient utilization. The objective of this study was to use novel synchrotron radiation-based technology (SR-FTIRM) to identify the differences in the molecular chemistry and conformation of carbohydrate and protein in various plant seed endosperms within intact tissues at cellular and subcellular level from grains with different biodegradation kinetics. Barley grain (cv. Harrington) with a high rate (31.3%/h) and extent (78%), corn grain (cv. Pioneer) with a low rate (9.6%/h) and extent of (57%), and wheat grain (cv. AC Barrie) with an intermediate rate (23%/h) and extent (72%) of ruminal DM degradation were selected for evaluation. SR-FTIRM evaluations were performed at the National Synchrotron Light Source at the Brookhaven National Laboratory (Brookhaven, NY). The molecular structure spectral analysis involved the fingerprint regions of ca. 1720-1485 cm -1 (attributed to protein amide I C dbnd O and C sbnd N stretching; amide II N sbnd H bending and C sbnd N stretching), ca. 1650-950 cm -1 (non-structural CHO starch in endosperms), and ca. 1185-800 cm -1 (attributed to total CHO C sbnd O stretching vibrations) together with agglomerative hierarchical cluster and principal component analyses. Analyses involving the protein amide I features consistently identified differences between all three grains. Other analyses involving carbohydrate features were able to differentiate between wheat and barley but failed however to differentiate between wheat and corn. These results suggest that SR-FTIRM plus the multivariate analyses can be used to identify spectral features associated with the molecular structure of endosperm from grains with different biodegradation kinetics, especially in relation to protein structure. The Novel synchrotron radiation-based bioanalytical technique provides a new approach for plant seed structural molecular studies at ultraspatial resolution and within intact tissue in relation to nutrient availability.
Distant Massive Clusters and Cosmology
NASA Technical Reports Server (NTRS)
Donahue, Megan
1999-01-01
We present a status report of our X-ray study and analysis of a complete sample of distant (z=0.5-0.8), X-ray luminous clusters of galaxies. We have obtained ASCA and ROSAT observations of the five brightest Extended Medium Sensitivity (EMSS) clusters with z > 0.5. We have constructed an observed temperature function for these clusters, and measured iron abundances for all of these clusters. We have developed an analytic expression for the behavior of the mass-temperature relation in a low-density universe. We use this mass-temperature relation together with a Press-Schechter-based model to derive the expected temperature function for different values of Omega-M. We combine this analysis with the observed temperature functions at redshifts from 0 - 0.8 to derive maximum likelihood estimates for the value of Omega-M. We report preliminary results of this analysis.
McKenna, J.E.
2003-01-01
The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2012-05-04
Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
Onda, Kyle; Crocker, Jonny; Kayser, Georgia Lyn; Bartram, Jamie
2013-01-01
The fields of global health and international development commonly cluster countries by geography and income to target resources and describe progress. For any given sector of interest, a range of relevant indicators can serve as a more appropriate basis for classification. We create a new typology of country clusters specific to the water and sanitation (WatSan) sector based on similarities across multiple WatSan-related indicators. After a literature review and consultation with experts in the WatSan sector, nine indicators were selected. Indicator selection was based on relevance to and suggested influence on national water and sanitation service delivery, and to maximize data availability across as many countries as possible. A hierarchical clustering method and a gap statistic analysis were used to group countries into a natural number of relevant clusters. Two stages of clustering resulted in five clusters, representing 156 countries or 6.75 billion people. The five clusters were not well explained by income or geography, and were unique from existing country clusters used in international development. Analysis of these five clusters revealed that they were more compact and well separated than United Nations and World Bank country clusters. This analysis and resulting country typology suggest that previous geography- or income-based country groupings can be improved upon for applications in the WatSan sector by utilizing globally available WatSan-related indicators. Potential applications include guiding and discussing research, informing policy, improving resource targeting, describing sector progress, and identifying critical knowledge gaps in the WatSan sector. PMID:24054545
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
Nursing home care quality: a cluster analysis.
Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit
2017-02-13
Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.
Structural evolution in the crystallization of rapid cooling silver melt
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Z.A., E-mail: ze.tian@gmail.com; Laboratory for Simulation and Modelling of Particulate Systems School of Materials Science and Engineering, University of New South Wales, Sydney, NSW 2052; Dong, K.J.
2015-03-15
The structural evolution in a rapid cooling process of silver melt has been investigated at different scales by adopting several analysis methods. The results testify Ostwald’s rule of stages and Frank conjecture upon icosahedron with many specific details. In particular, the cluster-scale analysis by a recent developed method called LSCA (the Largest Standard Cluster Analysis) clarified the complex structural evolution occurred in crystallization: different kinds of local clusters (such as ico-like (ico is the abbreviation of icosahedron), ico-bcc like (bcc, body-centred cubic), bcc, bcc-like structures) in turn have their maximal numbers as temperature decreases. And in a rather wide temperaturemore » range the icosahedral short-range order (ISRO) demonstrates a saturated stage (where the amount of ico-like structures keeps stable) that breeds metastable bcc clusters. As the precursor of crystallization, after reaching the maximal number bcc clusters finally decrease, resulting in the final solid being a mixture mainly composed of fcc/hcp (face-centred cubic and hexagonal-closed packed) clusters and to a less degree, bcc clusters. This detailed geometric picture for crystallization of liquid metal is believed to be useful to improve the fundamental understanding of liquid–solid phase transition. - Highlights: • A comprehensive structural analysis is conducted focusing on crystallization. • The involved atoms in our analysis are more than 90% for all samples concerned. • A series of distinct intermediate states are found in crystallization of silver melt. • A novelty icosahedron-saturated state breeds the metastable bcc state.« less
Analysis of correlated mutations in HIV-1 protease using spectral clustering.
Liu, Ying; Eyal, Eran; Bahar, Ivet
2008-05-15
The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.
NASA Astrophysics Data System (ADS)
Zhang, Rui; Jiang, Shuai; Liu, Yi-Rong; Wen, Hui; Feng, Ya-Juan; Huang, Teng; Huang, Wei
2018-05-01
Despite the very important role of atmospheric aerosol nucleation in climate change and air quality, the detailed aerosol nucleation mechanism is still unclear. Here we investigated the formic acid (FA) involved multicomponent nucleation molecular clusters including sulfuric acid (SA), dimethylamine (DMA) and water (W) through a quantum chemical method. The thermodynamics and kinetics analysis was based on the global minima given by Basin-Hopping (BH) algorithm coupled with Density Functional Theory (DFT) and subsequent benchmarked calculations. Then the interaction analysis based on ElectroStatic Potential (ESP), Topological and Atomic Charges analysis was made to characterize the binding features of the clusters. The results show that FA binds weakly with the other molecules in the cluster while W binds more weakly. Further kinetic analysis about the time evolution of the clusters show that even though the formic acid's weak interaction with other nucleation precursors, its effect on sulfuric acid dimer steady state concentration cannot be neglected due to its high concentration in the atmosphere.
Atlas-Guided Cluster Analysis of Large Tractography Datasets
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Stynes, Siobhán; Konstantinou, Kika; Ogollah, Reuben; Hay, Elaine M; Dunn, Kate M
2018-04-01
Traditionally, low back-related leg pain (LBLP) is diagnosed clinically as referred leg pain or sciatica (nerve root involvement). However, within the spectrum of LBLP, we hypothesised that there may be other unrecognised patient subgroups. This study aimed to identify clusters of patients with LBLP using latent class analysis and describe their clinical course. The study population was 609 LBLP primary care consulters. Variables from clinical assessment were included in the latent class analysis. Characteristics of the statistically identified clusters were compared, and their clinical course over 1 year was described. A 5 cluster solution was optimal. Cluster 1 (n = 104) had mild leg pain severity and was considered to represent a referred leg pain group with no clinical signs, suggesting nerve root involvement (sciatica). Cluster 2 (n = 122), cluster 3 (n = 188), and cluster 4 (n = 69) had mild, moderate, and severe pain and disability, respectively, and response to clinical assessment items suggested categories of mild, moderate, and severe sciatica. Cluster 5 (n = 126) had high pain and disability, longer pain duration, and more comorbidities and was difficult to map to a clinical diagnosis. Most improvement for pain and disability was seen in the first 4 months for all clusters. At 12 months, the proportion of patients reporting recovery ranged from 27% for cluster 5 to 45% for cluster 2 (mild sciatica). This is the first study that empirically shows the variability in profile and clinical course of patients with LBLP including sciatica. More homogenous groups were identified, which could be considered in future clinical and research settings.
NASA Astrophysics Data System (ADS)
Hasan, Noor Haliza; Abdullah, M. T.
2008-01-01
The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.
Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson
2017-07-01
The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.
X-ray morphological study of galaxy cluster catalogues
NASA Astrophysics Data System (ADS)
Democles, Jessica; Pierre, Marguerite; Arnaud, Monique
2016-07-01
Context : The intra-cluster medium distribution as probed by X-ray morphology based analysis gives good indication of the system dynamical state. In the race for the determination of precise scaling relations and understanding their scatter, the dynamical state offers valuable information. Method : We develop the analysis of the centroid-shift so that it can be applied to characterize galaxy cluster surveys such as the XXL survey or high redshift cluster samples. We use it together with the surface brightness concentration parameter and the offset between X-ray peak and brightest cluster galaxy in the context of the XXL bright cluster sample (Pacaud et al 2015) and a set of high redshift massive clusters detected by Planck and SPT and observed by both XMM-Newton and Chandra observatories. Results : Using the wide redshift coverage of the XXL sample, we see no trend between the dynamical state of the systems with the redshift.
ERIC Educational Resources Information Center
Miyamoto, S.; Nakayama, K.
1983-01-01
A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
ERIC Educational Resources Information Center
Firdausiah Mansur, Andi Besse; Yusof, Norazah
2013-01-01
Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…
Bayesian network meta-analysis for cluster randomized trials with binary outcomes.
Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard
2017-06-01
Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin
2015-01-01
Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.