Science.gov

Sample records for agglomerative cluster analysis

  1. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-11-01

    In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of

  2. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-07-01

    In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor

  3. Hesitant fuzzy agglomerative hierarchical clustering algorithms

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaolu; Xu, Zeshui

    2015-02-01

    Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.

  4. Biodiversity Assessment Using Hierarchical Agglomerative Clustering and Spectral Unmixing over Hyperspectral Images

    PubMed Central

    Medina, Ollantay; Manian, Vidya; Chinea, J. Danilo

    2013-01-01

    Hyperspectral images represent an important source of information to assess ecosystem biodiversity. In particular, plant species richness is a primary indicator of biodiversity. This paper uses spectral variance to predict vegetation richness, known as Spectral Variation Hypothesis. Hierarchical agglomerative clustering is our primary tool to retrieve clusters whose Shannon entropy should reflect species richness on a given zone. However, in a high spectral mixing scenario, an additional unmixing step, just before entropy computation, is required; cluster centroids are enough for the unmixing process. Entropies computed using the proposed method correlate well with the ones calculated directly from synthetic and field data. PMID:24132230

  5. Agglomerative clustering-based approach for two-dimensional phase unwrapping.

    PubMed

    Herráez, Miguel Arevalillo; Boticario, Jesús G; Lalor, Michael J; Burton, David R

    2005-03-01

    We describe a novel algorithm for two-dimensional phase unwrapping. The technique combines the principles of agglomerative clustering and use of heuristics to construct a discontinuous quality-guided path. Unlike other quality-guided algorithms, which establish the path at the start of the unwrapping process, our technique constructs the path as the unwrapping process evolves. This makes the technique less prone to error propagation, although it presents higher execution times than other existing algorithms. The algorithm reacts satisfactorily to random noise and breaks in the phase distribution. A variation of the algorithm is also presented that considerably reduces the execution time without affecting the results significantly. PMID:15765690

  6. Agglomerative clustering-based approach for two-dimensional phase unwrapping.

    PubMed

    Herráez, Miguel Arevalillo; Boticario, Jesús G; Lalor, Michael J; Burton, David R

    2005-03-01

    We describe a novel algorithm for two-dimensional phase unwrapping. The technique combines the principles of agglomerative clustering and use of heuristics to construct a discontinuous quality-guided path. Unlike other quality-guided algorithms, which establish the path at the start of the unwrapping process, our technique constructs the path as the unwrapping process evolves. This makes the technique less prone to error propagation, although it presents higher execution times than other existing algorithms. The algorithm reacts satisfactorily to random noise and breaks in the phase distribution. A variation of the algorithm is also presented that considerably reduces the execution time without affecting the results significantly.

  7. Classifying airborne radiometry data with Agglomerative Hierarchical Clustering: A tool for geological mapping in context of rainforest (French Guiana)

    NASA Astrophysics Data System (ADS)

    Martelet, G.; Truffert, C.; Tourlière, B.; Ledru, P.; Perrin, J.

    2006-09-01

    In highly weathered environments, it is crucial that geological maps provide information concerning both the regolith and the bedrock, for societal needs, such as land-use, mineral or water resources management. Often, geologists are facing the challenge of upgrading existing maps, as relevant information concerning weathering processes and pedogenesis is currently missing. In rugged areas in particular, where access to the field is difficult, ground observations are sparsely available, and need therefore to be complemented using methods based on remotely sensed data. For this purpose, we discuss the use of Agglomerative Hierarchical Clustering (AHC) on eU, K and eTh airborne gamma-ray spectrometry grids. The AHC process allows primarily to segment the geophysical maps into zones having coherent U, K and Th contents. The analysis of these contents are discussed in terms of geochemical signature for lithological attribution of classes, as well as the use of a dendrogram, which gives indications on the hierarchical relations between classes. Unsupervised classification maps resulting from AHC can be considered as spatial models of the distribution of the radioelement content in surface and sub-surface formations. The source of gamma rays emanating from the ground is primarily related to the geochemistry of the bedrock and secondarily to modifications of the radioelement distribution by weathering and other secondary mechanisms, such as mobilisation by wind or water. The interpretation of the obtained predictive classified maps, their U, K, Th contents, and the dendrogram, in light of available geological knowledge, allows to separate signatures related to regolith and solid geology. Consequently, classification maps can be integrated within a GIS environment and used by the geologist as a support for mapping bedrock lithologies and their alteration. We illustrate the AHC classification method in the region of Cayenne using high-resolution airborne radiometric data

  8. Combining analytical hierarchy process and agglomerative hierarchical clustering in search of expert consensus in green corridors development management.

    PubMed

    Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal

    2013-07-01

    Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts-landscape architects, regional planners, and geographers-revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors. PMID:23674241

  9. Combining Analytical Hierarchy Process and Agglomerative Hierarchical Clustering in Search of Expert Consensus in Green Corridors Development Management

    NASA Astrophysics Data System (ADS)

    Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal

    2013-07-01

    Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts—landscape architects, regional planners, and geographers—revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.

  10. Agglomerative percolation on the Bethe lattice and the triangular cactus

    NASA Astrophysics Data System (ADS)

    Chae, Huiseung; Yook, Soon-Hyung; Kim, Yup

    2013-08-01

    Agglomerative percolation (AP) on the Bethe lattice and the triangular cactus is studied to establish the exact mean-field theory for AP. Using the self-consistent simulation method based on the exact self-consistent equations, the order parameter P∞ and the average cluster size S are measured. From the measured P∞ and S, the critical exponents βk and γk for k = 2 and 3 are evaluated. Here, βk and γk are the critical exponents for P∞ and S when the growth of clusters spontaneously breaks the Zk symmetry of the k-partite graph. The obtained values are β2 = 1.79(3), γ2 = 0.88(1), β3 = 1.35(5) and γ3 = 0.94(2). By comparing these exponents with those for ordinary percolation (β∞ = 1 and γ∞ = 1), we also find β∞ < β3 < β2 and γ∞ > γ3 > γ2. These results quantitatively verify the conjecture that the AP model belongs to a new universality class if the Zk symmetry is broken spontaneously, and the new universality class depends on k.

  11. [Cluster analysis in biomedical researches].

    PubMed

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781

  12. Mining a Web Citation Database for Author Co-Citation Analysis.

    ERIC Educational Resources Information Center

    He, Yulan; Hui, Siu Cheung

    2002-01-01

    Proposes a mining process to automate author co-citation analysis based on the Web Citation Database, a data warehouse for storing citation indices of Web publications. Describes the use of agglomerative hierarchical clustering for author clustering and multidimensional scaling for displaying author cluster maps, and explains PubSearch, a…

  13. Cluster Analysis and Web-Based 3-D Visualization of Large-scale Geophysical Data

    NASA Astrophysics Data System (ADS)

    Kadlec, B. J.; Yuen, D. A.; Bollig, E. F.; Dzwinel, W.; da Silva, C. R.

    2004-05-01

    We present a problem-solving environment WEB-IS (Web-based Data Interrogative System), which we have developed for remote analysis and visualization of geophysical data [Garbow et. al., 2003]. WEB-IS employs agglomerative clustering methods intended for feature extraction and studying the predictions of large magnitude earthquake events. Data-mining is accomplished using a mutual nearest meighbor (MNN) algorithm for extracting event clusters of different density and shapes based on a hierarchical proximity measure. Clustering schemes used in molecular dynamics [Da Silva et. al., 2002] are also considered for increasing computational efficiency using a linked cell algorithm for creating a Verlet neighbor list (VNL) and extracting different cluster structures by applying a canonical backtracking search on the VNL. Space and time correlations between the events are visualized dynamically in 3-D through a filter by showing clusters at different timescales according to defined units of time ranging from days to years. This WEB-IS functionality was tested both on synthetic [Eneva and Ben-Zion, 1997] and actual earthquake catalogs of Japanese earthquakes and can be applied to the soft-computing data mining methods used in hydrology and geoinformatics. Da Silva, C.R.S., Justo, J.F., Fazzio, A., Phys Rev B, vol., 65, 2002. Eneva, M., Ben-Zion, Y.,J. Geophys. Res., 102, 17785-17795, 1997. Garbow, Z.A., Yuen, D.A., Erlebacher, G., Bollig, E.F., Kadlec, B.J., Vis. Geosci., 2003.

  14. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching

    PubMed Central

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48. PMID:27348229

  15. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching.

    PubMed

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48. PMID:27348229

  16. Spatiotemporal antibiotic resistance pattern monitoring using geographical information system based hierarchical cluster analysis.

    PubMed

    Hewapathirana, Roshan; Wijayarathna, Gamini

    2010-01-01

    Bacterial antimicrobial resistance in both the medical and agricultural fields has become a serious problem worldwide. Antibiotic resistant strains of bacteria are an increasing threat to human health, with resistance mechanisms having been described to all known antimicrobials currently available for clinical use. Monitoring the geotemporal variations of antibiotic resistance pattern is crucial factor in planning a successful therapeutic guidelines preventing further emergence of antibiotic resistance. This study is based on the retrospective spatiotemporal analysis of laboratory results of Antibiotic Sensitivity Tests, time stamped with the date and time of the microbiological specimen dispatched to the laboratory. Geographic location of the isolated bacterial colony is specified with the latitude and the longitude of the patient's location. Agglomerative Hierarchical Clustering was performed on antimicrobial resistance findings based on the geographic locations generating series of Heatmaps to visualize the extent of the resistance pattern. Sequential Hierarchical cluster analysis was proven to be effective in visualization of antibiotic resistance using Heatmaps demonstrating the temporal variations of the antibiotic resistance patterns.

  17. Method for preventing plugging in the pyrolysis of agglomerative coals

    DOEpatents

    Green, Norman W.

    1979-01-23

    To prevent plugging in a pyrolysis operation where an agglomerative coal in a nondeleteriously reactive carrier gas is injected as a turbulent jet from an opening into an elongate pyrolysis reactor, the coal is comminuted to a size where the particles under operating conditions will detackify prior to contact with internal reactor surfaces while a secondary flow of fluid is introduced along the peripheral inner surface of the reactor to prevent backflow of the coal particles. The pyrolysis operation is depicted by two equations which enable preselection of conditions which insure prevention of reactor plugging.

  18. SUPERMODEL ANALYSIS OF GALAXY CLUSTERS

    SciTech Connect

    Fusco-Femiano, R.; Cavaliere, A.; Lapi, A.

    2009-11-01

    We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core (CC) and Non Cool Core (NCC) classes, in terms of the Supermodel (SM) developed by Cavaliere et al. Based on the gravitational wells set by the dark matter (DM) halos, the SM straightforwardly expresses the equilibrium of the intracluster plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blast waves from mergers or from outbursts of active galactic nuclei. The cluster set analyzed here highlights not only how simply the SM represents the main dichotomy CC versus NCC clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For CC clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked profile of the temperature marked by a decline toward the center, without requiring currently strong radiative cooling and high mass deposition rates. NCC clusters like A1656 require instead a central entropy floor of a substantial level, and some like A2256 and even more A644 feature structured temperature profiles that also call for a definite floor extension; in such conditions the SM accurately fits the observations, and suggests that in these clusters the ICP has been just remolded by a merger event, in the way of a remnant cool core. The SM also predicts that DM halos with high concentration should correlate with flatter entropy profiles and steeper brightness in the outskirts; this is indeed the case with A1689, for which from X-rays we find concentration values c approx 10, the hallmark of an early halo formation. Thus, we show the SM to constitute a fast tool not only to provide wide libraries of accurate fits to X-ray temperature and density profiles, but also to retrieve from the ICP

  19. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  20. Multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    In this paper, we address the feasibility of partitioning rule-based systems into a number of meaningful units to enhance the comprehensibility, maintainability and reliability of expert systems software. Preliminary results have shown that no single structuring principle or abstraction hierarchy is sufficient to understand complex knowledge bases. We therefore propose the Multi View Point - Clustering Analysis (MVP-CA) methodology to provide multiple views of the same expert system. We present the results of using this approach to partition a deployed knowledge-based system that navigates the Space Shuttle's entry. We also discuss the impact of this approach on verification and validation of knowledge-based systems.

  1. Patterns of comorbidity in community-dwelling older people hospitalised for fall-related injury: A cluster analysis

    PubMed Central

    2011-01-01

    Background Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. Methods We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity) we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. Results More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7). The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. Conclusions The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients should be investigated by

  2. Cluster analysis of multiple planetary flow regimes

    NASA Technical Reports Server (NTRS)

    Mo, Kingtse; Ghil, Michael

    1987-01-01

    A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.

  3. Data Clustering

    NASA Astrophysics Data System (ADS)

    Wagstaff, Kiri L.

    2012-03-01

    particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity

  4. Cluster Analysis of Adolescent Blogs

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  5. Cluster Analysis of the Malaysian Hipposideros

    NASA Astrophysics Data System (ADS)

    Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.

    2008-01-01

    A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.

  6. Correcting an analysis of variance for clustering.

    PubMed

    Hedges, Larry V; Rhoads, Christopher H

    2011-02-01

    A great deal of educational and social data arises from cluster sampling designs where clusters involve schools, classrooms, or communities. A mistake that is sometimes encountered in the analysis of such data is to ignore the effect of clustering and analyse the data as if it were based on a simple random sample. This typically leads to an overstatement of the precision of results and too liberal conclusions about precision and statistical significance of mean differences. This paper gives simple corrections to the test statistics that would be computed in an analysis of variance if clustering were (incorrectly) ignored. The corrections are multiplicative factors depending on the total sample size, the cluster size, and the intraclass correlation structure. For example, the corrected F statistic has Fisher's F distribution with reduced degrees of freedom. The corrected statistic reduces to the F statistic computed by ignoring clustering when the intraclass correlations are zero. It reduces to the F statistic computed using cluster means when the intraclass correlations are unity, and it is in between otherwise. A similar adjustment to the usual statistic for testing a linear contrast among group means is described.

  7. ASteCA: Automated Stellar Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Perren, G. I.; Vázquez, R. A.; Piatti, A. E.

    2015-04-01

    We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.

  8. Using Cluster Analysis to Examine Husband-Wife Decision Making

    ERIC Educational Resources Information Center

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  9. Clustering analysis of seismicity and aftershock identification.

    PubMed

    Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry

    2008-07-01

    We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.

  10. Cluster and constraint analysis in tetrahedron packings.

    PubMed

    Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

    2015-04-01

    The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.

  11. Deterministic algorithm with agglomerative heuristic for location problems

    NASA Astrophysics Data System (ADS)

    Kazakovtsev, L.; Stupina, A.

    2015-10-01

    Authors consider the clustering problem solved with the k-means method and p-median problem with various distance metrics. The p-median problem and the k-means problem as its special case are most popular models of the location theory. They are implemented for solving problems of clustering and many practically important logistic problems such as optimal factory or warehouse location, oil or gas wells, optimal drilling for oil offshore, steam generators in heavy oil fields. Authors propose new deterministic heuristic algorithm based on ideas of the Information Bottleneck Clustering and genetic algorithms with greedy heuristic. In this paper, results of running new algorithm on various data sets are given in comparison with known deterministic and stochastic methods. New algorithm is shown to be significantly faster than the Information Bottleneck Clustering method having analogous preciseness.

  12. Identifying Peer Institutions Using Cluster Analysis

    ERIC Educational Resources Information Center

    Boronico, Jess; Choksi, Shail S.

    2012-01-01

    The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…

  13. Systematization of actinides using cluster analysis

    SciTech Connect

    Kopyrin, A.A.; Terent`eva, T.N.; Khramov, N.N.

    1994-11-01

    A representation of the actinides in multidimensional property space is proposed for systematization of these elements using cluster analysis. Literature data for their atomic properties are used. Owing to the wide variation of published ionization potentials, medians are used to estimate them. Vertical dendograms are used for classification on the basis of distances between the actinides in atomic-property space. The properties of actinium and lawrencium are furthest removed from the main group. Thorium and mendelevium exhibit individualized properties. A cluster based on the einsteinium-fermium pair is joined by californium.

  14. A Multivariate Analysis of Galaxy Cluster Properties

    NASA Astrophysics Data System (ADS)

    Ogle, P. M.; Djorgovski, S.

    1993-05-01

    We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.

  15. Hierarchical cluster analysis as an approach for systematic grouping of diet constituents on basis of fatty acid, energy and cholesterol content: application on consumable lamb products.

    PubMed

    Akbay, A; Elhan, A; Ozcan, C; Demirtaş, S

    2000-08-01

    The role of dietary fat in the etiology of chronic diseases is both a qualitative and a quantitative issue. The dietary fat intake is largely influenced by behavioral and social influences on food choice. Ongoing scientific research has led to dietary recommendations with main concerns being the percentage of saturated, essential fatty acids and cholesterol with respect to total energy intake. However, the compositional complexity of food choice constituting the diet is a critical concept complicating the interpretation of epidemiologic, clinical and laboratory evidence to define the role of dietary fat in the etiology of diseases. This study was conducted on the observation of the need to better systematically classify consumable food based on complex composition and lamb meat is randomly selected as a non-specific subset for application of hierarchical cluster analysis method to obtain the dendogram using average linkage. Data on fat composition of consumable lamb prepared by different methods was obtained from USDA Nutrient Database for Standart Reference. Using agglomerative hierarchical cluster analysis lamb meat was grouped into two main clusters among which one divided into two families of which each was subdivided into two subfamilies based on fatty acids, cholesterol and energy composition. Present work may be considered as a leading study to systematically classify larger food sets. As high fat foods are rich in flavor and overall palatability, the outcome of this study may lead to behaviorally more acceptable but healthier dietary replacements. Besides future use of the results obtained may reveal the effect of complex compositional dietary influences on health and disease and may have superiority to studies questioning individual dietary items. Furthermore, hieararchial cluster analysis may be used to cluster food including other compositional data in food items like amino acids, vitamins, carbohydrates, as well.

  16. Cluster analysis of respiratory time series.

    PubMed

    Adams, J M; Attinger, E O; Attinger, F M

    1978-03-01

    We have investigated the respiratory control system with the hypothesis that, although many variables such as minute ventilation (VI), tidal volume (VT), breathing period (TT), inspiratory duration (TI), and expiratory duration (TE) may be observed, the controller functions more simply by manipulating only 2 or 3 of these. Thus, if tidal volume is the only independent variable, TI being determined by the "off-switch" threshold, these variables should have very similar time courses. Anesthetized dogs were subjected to CO2 breathing and carotid sinus perfusion to stimulate both chemoreceptors. The time series of the variables VI, VT, TT, TE, and TI as well as PACO2 were determined on a breath by breath basis. Derived characteristics of these time series were compared using Cluster Analysis and the latent dimensionality of respiratory control determined by Factor Analysis. The characteristics of the time series clustered into 4 groups: magnitude (of the response), speed, variability and relative change. One respiratory factor accounted for 86% of the variance for the variability characteristics, 2 factors for magnitude (91%) and relative change (85%) and 3 factors for speed (89%). The respiratory variables were analysed for each of the 4 groups of characteristics with the following results: VT and TI clustered together only for the magnitude and relative change characteristics where as TT and TE clustered closely for all four characteristics. One latent factor was associated with the [TT-TE] group and the other usually with PACO2.

  17. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks. PMID:26357321

  18. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    NASA Technical Reports Server (NTRS)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  19. Cluster analysis of contaminated sediment data: nodal analysis.

    PubMed

    Hartwell, S Ian; Claflin, Larry W

    2005-07-01

    The objective of the present study was to explore the use of multivariate statistical methods as a means to discern relationships between contaminants and biological and/or toxicological effects in a representative data set from the National Status and Trends (NS&T) Program. Data from the National Oceanic and Atmospheric Administration, NS&T Program's Bioeffects Survey of Delaware Bay, USA, were examined using various univariate and multivariate statistical techniques, including cluster analysis. Each approach identified consistent patterns and relationships between the three types of triad data. The analyses also identified factors that bias the interpretation of the data, primarily the presence of rare and unique species and the dependence of species distributions on physical parameters. Sites and species were clustered with the unweighted pair-group method using arithmetic averages clustering with the Jaccard coefficient that clustered species and sites into mutually consistent groupings. Pearson product moment correlation coefficients, normalized for salinity, also were clustered. The most informative analysis, termed nodal analysis, was the intersection of species cluster analysis with site cluster analysis. This technique produced a visual representation of species association patterns among site clusters. Site characteristics, such as salinity and grain size, not contaminant concentrations, appeared to be the primary factors determining species distributions. This suggests the sediment-quality triad needs to use physical parameters as a distinct leg from chemical concentrations to improve sediment-quality assessments in large bodies of water. Because the Delaware Bay system has confounded gradients of contaminants and physical parameters, analyses were repeated with data from northern Chesapeake Bay, USA, with similar results. PMID:16050601

  20. Equivalent damage validation by variable cluster analysis

    NASA Astrophysics Data System (ADS)

    Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

    2016-06-01

    The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.

  1. Chaotic map clustering algorithm for EEG analysis

    NASA Astrophysics Data System (ADS)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.

    2004-03-01

    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  2. Constructing storyboards based on hierarchical clustering analysis

    NASA Astrophysics Data System (ADS)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  3. Estimating the number of clusters via system evolution for cluster analysis of gene expression data.

    PubMed

    Wang, Kaijun; Zheng, Jie; Zhang, Junying; Dong, Jiyang

    2009-09-01

    The estimation of the number of clusters (NC) is one of crucial problems in the cluster analysis of gene expression data. Most approaches available give their answers without the intuitive information about separable degrees between clusters. However, this information is useful for understanding cluster structures. To provide this information, we propose system evolution (SE) method to estimate NC based on partitioning around medoids (PAM) clustering algorithm. SE analyzes cluster structures of a dataset from the viewpoint of a pseudothermodynamics system. The system will go to its stable equilibrium state, at which the optimal NC is found, via its partitioning process and merging process. The experimental results on simulated and real gene expression data demonstrate that the SE works well on the data with well-separated clusters and the one with slightly overlapping clusters. PMID:19527960

  4. Recent Trends in Hierarchic Document Clustering: A Critical Review.

    ERIC Educational Resources Information Center

    Willett, Peter

    1988-01-01

    Reviews recent research into the use of hierarchic agglomerative clustering methods for document retrieval. The topics discussed include the calculation of interdocument similarities, algorithms used to implement clustering methods on large databases, validity testing of document hierarchies, appropriate search strategies, and other applications…

  5. Random sequential renormalization and agglomerative percolation in networks: Application to Erdös-Rényi and scale-free graphs

    NASA Astrophysics Data System (ADS)

    Bizhani, Golnoosh; Grassberger, Peter; Paczuski, Maya

    2011-12-01

    We study the statistical behavior under random sequential renormalization (RSR) of several network models including Erdös-Rényi (ER) graphs, scale-free networks, and an annealed model related to ER graphs. In RSR the network is locally coarse grained by choosing at each renormalization step a node at random and joining it to all its neighbors. Compared to previous (quasi-)parallel renormalization methods [Song , Nature (London)NATUAS0028-083610.1038/nature03248 433, 392 (2005)], RSR allows a more fine-grained analysis of the renormalization group (RG) flow and unravels new features that were not discussed in the previous analyses. In particular, we find that all networks exhibit a second-order transition in their RG flow. This phase transition is associated with the emergence of a giant hub and can be viewed as a new variant of percolation, called agglomerative percolation. We claim that this transition exists also in previous graph renormalization schemes and explains some of the scaling behavior seen there. For critical trees it happens as N/N0→0 in the limit of large systems (where N0 is the initial size of the graph and N its size at a given RSR step). In contrast, it happens at finite N/N0 in sparse ER graphs and in the annealed model, while it happens for N/N0→1 on scale-free networks. Critical exponents seem to depend on the type of the graph but not on the average degree and obey usual scaling relations for percolation phenomena. For the annealed model they agree with the exponents obtained from a mean-field theory. At late times, the networks exhibit a starlike structure in agreement with the results of Radicchi [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.101.148701 101, 148701 (2008)]. While degree distributions are of main interest when regarding the scheme as network renormalization, mass distributions (which are more relevant when considering “supernodes” as clusters) are much easier to study using the fast Newman-Ziff algorithm for

  6. Cluster analysis of word frequency dynamics

    NASA Astrophysics Data System (ADS)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  7. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System

    PubMed Central

    Alexander, Nathan; Woetzel, Nils; Meiler, Jens

    2016-01-01

    Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.

  8. Failure Mode Identification Through Clustering Analysis

    NASA Technical Reports Server (NTRS)

    Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)

    2002-01-01

    Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.

  9. A hybrid monkey search algorithm for clustering analysis.

    PubMed

    Chen, Xin; Zhou, Yongquan; Luo, Qifang

    2014-01-01

    Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.

  10. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    ERIC Educational Resources Information Center

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is applied…

  11. A Survey of Popular R Packages for Cluster Analysis

    ERIC Educational Resources Information Center

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  12. Using Cluster Analysis for Data Mining in Educational Technology Research

    ERIC Educational Resources Information Center

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  13. Early Hemostatic Responses to Trauma Identified Using Hierarchical Clustering Analysis

    PubMed Central

    White, N.J.; Contaifer, D.; Martin, E.J.; Newton, J.C.; Mohammed, B.M.; Bostic, J.L.; Brophy, G.M.; Spiess, B.D.; Pusateri, A.E.; Ward, K.R.; Brophy, D.F.

    2015-01-01

    Background Trauma-induced coagulopathy is a complex multifactorial hemostatic response that is poorly understood. Objectives Identify distinct hemostatic responses to trauma and identify key components of the hemostatic system that vary between responses. Patients/Methods Cross-sectional observational study of adult trauma patients at an urban Level I trauma center Emergency Department. Hierarchical clustering analysis was used to identify distinct clusters of similar subjects using vital signs, injury/shock severity, and by comprehensive assessment of coagulation, clot formation, platelet function, and thrombin generation. Results Of 84 total trauma patients included in the model, three distinct trauma clusters were identified. Cluster 1 (N=57) displayed platelet activation, preserved peak thrombin generation, plasma coagulation dysfunction, moderately decreased fibrinogen concentration, and normal clot formation relative to healthy controls. Cluster 2 (N=18) displayed platelet activation, preserved peak thrombin generation, and preserved fibrinogen concentration with normal clot formation. Cluster 3 (N=9) was the most severely injured and shocked and displayed a strong inflammatory and bleeding phenotype. Platelet dysfunction, thrombin inhibition, plasma coagulation dysfunction, and decreased fibrinogen concentration were present in this cluster. Fibrinolytic activation was present in all clusters, but increased more so in Cluster 3. Trauma clusters were different most noticeably in their relative fibrinogen concentration, peak thrombin generation, and platelet-induced clot contraction. Conclusions Hierarchical clustering analysis identified 3 distinct hemostatic responses to trauma. Further insight into the underlying hemostatic mechanisms responsible for these responses is needed. PMID:25816845

  14. Photometric analysis of Collinder Cluster 223

    NASA Astrophysics Data System (ADS)

    Duplancic Videla, M. F.; Molina, S.; González, J. F.

    We present photometric observations of the open-cluster Collinder 223 (RA= 10h 30m 38s , dec =-60° 06' 39'' ), obtained from observation with the HSH telescope in CASLEO. This cluster has not been studied extensively, there is only one photoelectric photometric UBV study, done by Clariá and Lapasset (1991). A later study, done by Tadross (2004), reanalyzed the data, however, no other photometric measurements have been carried out until present. We observed seven fields in the cluster which were chosen prioritizing the zones of major stellar concentration. We obtained color-magnitude diagrams of the cluster, reaching stars two magnitudes weaker than those previously obtained by Clariá and Lapasset. The cluster sequence shows well in accordance with the isochrone corresponding to the age of 3.5 10E7 yr.

  15. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    PubMed Central

    Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large

  16. Spectral Analysis of Cluster Induced Turbulence

    NASA Astrophysics Data System (ADS)

    Patel, Ravi; Ireland, Peter; Capecelatro, Jesse; Fox, Rodney; Desjardins, Olivier

    2015-11-01

    Particle laden turbulent flows are an important feature of many industrial processes such as fluidized bed reactors. The study of cluster-induced turbulence (CIT), wherein particles falling under gravity generate turbulence in the carrier gas via fluctuations in particle concentration, may lead to better models for these processes. We present a spectral analysis of a database of statistically stationary CIT simulations. These simulations were previously performed using a two way coupled Eulerian-Lagrangian approach for various mass loadings and particle-scale Reynolds numbers. The Lagrangian particle data is carefully filtered to obtain Eulerian fields for particle phase volume fraction, velocity, and granular temperature. We perform a spectral decomposition of the particle and fluid turbulent kinetic energy budget. We investigate the contributions to the particle and fluid turbulent kinetic energy by pressure strain, viscous dissipation, drag exchange, viscous exchange, and pressure exchange over the range of wavenumbers. Results from this study may help develop closure models for large eddy simulation of particle laden turbulent flows.

  17. MASSCLEAN: MASSive CLuster Evolution and ANalysis package -- A new tool for stellar clusters

    NASA Astrophysics Data System (ADS)

    Popescu, Bogdan

    2010-11-01

    Stellar clusters are laboratories for stellar evolution. Their stellar content have an uniform age and chemical composition, but span a large mass interval. The majority of stars are born in clusters and end up in the general field population. An accurate characterization of stellar clusters could be used to built better models, from stellar evolution to the evolution of an entire galaxy. Regardless of the fact that they are so close, for many Milky Way clusters it is difficult to be observed because they are obscured by the dust in the disk of our Galaxy. The clusters from the Local Group and beyond are too distant, so only their integrated properties could be used most of the time. There is one way to analyze the observational data, to search for clusters, and to describe them: simulations. MASSCLEAN (MASSive CLuster Evolution and ANalysis) package was developed to provide a better characterization of Galactic clusters, to derive selection effects of current surveys, and to provide information about the extra-galactic clusters. Simulations of known Galactic clusters are used to get better constraints on their parameters, like mass, age, extinction, chemical composition and distance. This is the traditional way to describe the Galactic clusters, fitting the data using the available models. The difference is that MASSCLEAN simulations provide a consistent set of parameters. The majority of extra-galactic clusters are known only from their integrated properties, integrated magnitudes and colors. The current models for stellar populations are available only in the infinite mass limit. But the real clusters have a finite mass, and their integrated colors show a large dispersion (stochastic fluctuations). The description of the variation of integrated colors as a function of mass and age lead to the creation of MASSCLEANcolors database, based on 70 million Monte Carlo simulations. Since the entries in the database form a consistent set of integrated colors, integrated

  18. The REFLEX II Galaxy Cluster sample: mock catalogues and clustering analysis

    NASA Astrophysics Data System (ADS)

    Balaguera-Antolinez, Andres; Sanchez, Ariel G.; Bohringer, Hans

    2012-09-01

    We present results of the analysis of abundance and clustering from the new ROSAT-ESO Flux-Limited X-Ray (REFLEX) II galaxy cluster catalogue. To model the covariance matrix of the different statistics, we have created a set of 100 mock galaxy cluster catalogues built from a suite large volume LambdaCDM N-Body simulations (L-BASICC and calibrated with the X-ray luminosity function. We discuss the calibration scheme and some implications regarding the cluster scaling relations, particularly, the link between mass and luminosity. Similarly we show the behavior of the clustering signal as a function of the X-ray luminosity and some cosmological implications.

  19. Visual cluster analysis and pattern recognition methods

    DOEpatents

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    2001-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  20. Using Cluster Analysis To Facilitate the Standard Setting Process.

    ERIC Educational Resources Information Center

    Sireci, Stephen G.; Robin, Frederic; Patelis, Thanos

    The most popular methods for setting passing scores and other standards on educational tests rely heavily on subjective judgment. This paper presents and evaluates a new procedure for setting and evaluating standards on tests based on cluster analysis of test data. The clustering procedure was applied to a statewide mathematics proficiency test…

  1. A Note on Cluster Effects in Latent Class Analysis

    ERIC Educational Resources Information Center

    Kaplan, David; Keller, Bryan

    2011-01-01

    This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…

  2. Hierarchical spike clustering analysis for investigation of interneuron heterogeneity.

    PubMed

    Boehlen, Anne; Heinemann, Uwe; Henneberger, Christian

    2016-04-21

    Action potentials represent the output of a neuron. Especially interneurons display a variety of discharge patterns ranging from regular action potential firing to prominent spike clustering or stuttering. The mechanisms underlying this heterogeneity remain incompletely understood. We established hierarchical cluster analysis of spike trains as a measure of spike clustering. A clustering index was calculated from action potential trains recorded in the whole-cell patch clamp configuration from hippocampal (CA1, stratum radiatum) and entorhinal (medial entorhinal cortex, layer 2) interneurons in acute slices and simulated data. Prominent, region-dependent, but also variable spike clustering was detected using this measure. Further analysis revealed a strong positive correlation between spike clustering and membrane potentials oscillations but an inverse correlation with neuronal resonance. Furthermore, clustering was more pronounced when the balance between fast-activating K(+) currents, assessed by the spike repolarisation time, and hyperpolarization-activated currents, gauged by the size of the sag potential, was shifted in favour of fast K(+) currents. Simulations of spike clustering confirmed that variable ratios of fast K(+) and hyperpolarization-activated currents could underlie different degrees of spike clustering and could thus be crucial for temporally structuring interneuron spike output. PMID:26987719

  3. Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

    PubMed Central

    Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

    2016-01-01

    Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

  4. Network Analysis Tools: from biological networks to clusters and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  5. Visual verification and analysis of cluster detection for molecular dynamics.

    PubMed

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented. PMID:17968118

  6. Visual verification and analysis of cluster detection for molecular dynamics.

    PubMed

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.

  7. A Flocking Based algorithm for Document Clustering Analysis

    SciTech Connect

    Cui, Xiaohui; Gao, Jinzhu; Potok, Thomas E

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  8. Automated analysis of organic particles using cluster SIMS

    NASA Astrophysics Data System (ADS)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-01

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF 5+ or C 8-) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  9. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    NASA Astrophysics Data System (ADS)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  10. Identification of chronic rhinosinusitis phenotypes using cluster analysis

    PubMed Central

    Soler, Zachary M.; Hyer, J. Madison; Ramakrishnan, Viswanathan; Smith, Timothy L.; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J.

    2015-01-01

    Introduction Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. Methods A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the SinoNasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Results Clustering was largely determined by age, severity of patient reported outcome measures, depression and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures including polyp/atopic status, prior surgery, B-SIT and asthma did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. Discussion A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. PMID:25694390

  11. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    ERIC Educational Resources Information Center

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  12. Using cluster analysis to organize and explore regional GPS velocities

    USGS Publications Warehouse

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  13. Comparative analysis of genomic signal processing for microarray data clustering.

    PubMed

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  14. Automated classification of visible and infrared spectra using cluster analysis

    NASA Astrophysics Data System (ADS)

    Marzo, G. A.; Roush, T. L.; Hogan, R. C.

    2009-08-01

    Planetary space experiments collect large volumes of data whose scientific content requires understanding. Marzo et al. (2006) presented an unsupervised cluster analysis scheme that is able to reduce a spectral data set to a few clusters, allowing for more focused and rapid evaluation of their scientific meaning. Here, we extend the original approach to account for the measurement uncertainty and build a classification scheme. We apply the clustering technique to the ASTER and RELAB libraries of visible and infrared spectral reflectance. These spectral libraries are documented, allowing assignment of a label to each spectrum reflecting its physical and chemical properties. We assess the ability of the original and extended approaches to identify natural clusters of the library spectra and estimate associated uncertainties of the results. We evaluate the scientific meaning of the derived clusters based on the labels contained within each cluster. Once the cluster meanings are defined, we test our classification scheme using a training-testing approach and evaluate the accuracy of assigning the unknown spectra to the correct cluster.

  15. A Distributed Flocking Approach for Information Stream Clustering Analysis

    SciTech Connect

    Cui, Xiaohui; Potok, Thomas E

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  16. Cluster analysis of Southeastern U.S. climate stations

    NASA Astrophysics Data System (ADS)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  17. GE-Miner: integration of cluster ensemble and text mining for comprehensive gene expression analysis.

    PubMed

    Hu, Xiaohua

    2006-01-01

    Generating high quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis. Based on this consideration, we design and develop a unified system Gene Expression Miner (GE-Miner) by integrating cluster ensemble, text clustering and multidocument summarisation and provide an environment for comprehensive gene expression data analysis. Experimental results demonstrate that our systems can obtain high quality clusters and provide concise and informative textual summary for the gene clusters.

  18. The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Hoshen, Joseph

    1997-08-01

    In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.

  19. Multivariate Analysis of the Globular Clusters in M87

    NASA Astrophysics Data System (ADS)

    Das, Sukanta; Chattopadhayay, Tanuka; Davoust, Emmanuel

    2015-11-01

    An objective classification of 147 globular clusters (GCs) in the inner region of the giant elliptical galaxy M87 is carried out with the help of two methods of multivariate analysis. First, independent component analysis (ICA) is used to determine a set of independent variables that are linear combinations of various observed parameters (mostly Lick indices) of the GCs. Next, K-means cluster analysis (CA) is applied on the independent components (ICs), to find the optimum number of homogeneous groups having an underlying structure. The properties of the four groups of GCs thus uncovered are used to explain the formation mechanism of the host galaxy. It is suggested that M87 formed in two successive phases. First a monolithic collapse, which gave rise to an inner group of metal-rich clusters with little systematic rotation and an outer group of metal-poor clusters in eccentric orbits. In a second phase, the galaxy accreted low-mass satellites in a dissipationless fashion, from the gas of which the two other groups of GCs formed. Evidence is given for a blue stellar population in the more metal rich clusters, which we interpret by Helium enrichment. Finally, it is found that the clusters of M87 differ in some of their chemical properties (NaD, TiO1, light-element abundances) from GCs in our Galaxy and M31.

  20. Application of Subspace Clustering in DNA Sequence Analysis.

    PubMed

    Wallace, Tim; Sekmen, Ali; Wang, Xiaofei

    2015-10-01

    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. PMID:26162018

  1. Open-box spectral clustering: applications to medical image analysis.

    PubMed

    Schultz, Thomas; Kindlmann, Gordon L

    2013-12-01

    Spectral clustering is a powerful and versatile technique, whose broad range of applications includes 3D image analysis. However, its practical use often involves a tedious and time-consuming process of tuning parameters and making application-specific choices. In the absence of training data with labeled clusters, help from a human analyst is required to decide the number of clusters, to determine whether hierarchical clustering is needed, and to define the appropriate distance measures, parameters of the underlying graph, and type of graph Laplacian. We propose to simplify this process via an open-box approach, in which an interactive system visualizes the involved mathematical quantities, suggests parameter values, and provides immediate feedback to support the required decisions. Our framework focuses on applications in 3D image analysis, and links the abstract high-dimensional feature space used in spectral clustering to the three-dimensional data space. This provides a better understanding of the technique, and helps the analyst predict how well specific parameter settings will generalize to similar tasks. In addition, our system supports filtering outliers and labeling the final clusters in such a way that user actions can be recorded and transferred to different data in which the same structures are to be found. Our system supports a wide range of inputs, including triangular meshes, regular grids, and point clouds. We use our system to develop segmentation protocols in chest CT and brain MRI that are then successfully applied to other datasets in an automated manner.

  2. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, R.; Stenning, D. C.; Sarajedini, A.; von Hippel, T.; van Dyk, D. A.; Robinson, E.; Stein, N.; Jefferys, W. H.

    2016-09-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ˜0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on inconsistencies between the theoretical models and the observed data.

  3. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    PubMed Central

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  4. Mokken Scale Analysis Using Hierarchical Clustering Procedures

    ERIC Educational Resources Information Center

    van Abswoude, Alexandra A. H.; Vermunt, Jeroen K.; Hemker, Bas T.; van der Ark, L. Andries

    2004-01-01

    Mokken scale analysis (MSA) can be used to assess and build unidimensional scales from an item pool that is sensitive to multiple dimensions. These scales satisfy a set of scaling conditions, one of which follows from the model of monotone homogeneity. An important drawback of the MSA program is that the sequential item selection and scale…

  5. Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

    PubMed

    Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

    2012-10-01

    The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. PMID:22672933

  6. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    PubMed

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-01

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (P<0.001), independent of age, height, weight, and running speed. When these two groups were compared to PFP runners, one cluster exhibited greater while the other exhibited reduced peak knee abduction angles (P<0.05). The variability observed in running patterns across this sample could be the result of different gait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.

  7. Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

    PubMed

    Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

    2012-10-01

    The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes.

  8. Phage cluster relationships identified through single gene analysis

    PubMed Central

    2013-01-01

    Background Phylogenetic comparison of bacteriophages requires whole genome approaches such as dotplot analysis, genome pairwise maps, and gene content analysis. Currently mycobacteriophages, a highly studied phage group, are categorized into related clusters based on the comparative analysis of whole genome sequences. With the recent explosion of phage isolation, a simple method for phage cluster prediction would facilitate analysis of crude or complex samples without whole genome isolation and sequencing. The hypothesis of this study was that mycobacteriophage-cluster prediction is possible using comparison of a single, ubiquitous, semi-conserved gene. Tape Measure Protein (TMP) was selected to test the hypothesis because it is typically the longest gene in mycobacteriophage genomes and because regions within the TMP gene are conserved. Results A single gene, TMP, identified the known Mycobacteriophage clusters and subclusters using a Gepard dotplot comparison or a phylogenetic tree constructed from global alignment and maximum likelihood comparisons. Gepard analysis of 247 mycobacteriophage TMP sequences appropriately recovered 98.8% of the subcluster assignments that were made by whole-genome comparison. Subcluster-specific primers within TMP allow for PCR determination of the mycobacteriophage subcluster from DNA samples. Using the single-gene comparison approach for siphovirus coliphages, phage groupings by TMP comparison reflected relationships observed in a whole genome dotplot comparison and confirm the potential utility of this approach to another widely studied group of phages. Conclusions TMP sequence comparison and PCR results support the hypothesis that a single gene can be used for distinguishing phage cluster and subcluster assignments. TMP single-gene analysis can quickly and accurately aid in mycobacteriophage classification. PMID:23777341

  9. A Cluster Analysis of Personality Style in Adults with ADHD

    ERIC Educational Resources Information Center

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  10. Influence of Scholarships on STEM Teachers: Cluster Analysis and Characteristics

    ERIC Educational Resources Information Center

    Liou, Pey-Yan; Desjardins, Christopher David; Lawrenz, Frances

    2010-01-01

    Science, technology, engineering, and mathematics (STEM) teachers' perceptions about the influence of scholarship on their decision to teach and to teach in a high-needs school were examined using cluster analysis. Three hundred and four STEM scholars, who were currently teaching, and who received funding from 45 institutions located throughout…

  11. Language Learner Motivational Types: A Cluster Analysis Study

    ERIC Educational Resources Information Center

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  12. K-means cluster analysis and seismicity partitioning for Pakistan

    NASA Astrophysics Data System (ADS)

    Rehman, Khaista; Burton, Paul W.; Weatherill, Graeme A.

    2014-07-01

    Pakistan and the western Himalaya is a region of high seismic activity located at the triple junction between the Arabian, Eurasian and Indian plates. Four devastating earthquakes have resulted in significant numbers of fatalities in Pakistan and the surrounding region in the past century (Quetta, 1935; Makran, 1945; Pattan, 1974 and the recent 2005 Kashmir earthquake). It is therefore necessary to develop an understanding of the spatial distribution of seismicity and the potential seismogenic sources across the region. This forms an important basis for the calculation of seismic hazard; a crucial input in seismic design codes needed to begin to effectively mitigate the high earthquake risk in Pakistan. The development of seismogenic source zones for seismic hazard analysis is driven by both geological and seismotectonic inputs. Despite the many developments in seismic hazard in recent decades, the manner in which seismotectonic information feeds the definition of the seismic source can, in many parts of the world including Pakistan and the surrounding regions, remain a subjective process driven primarily by expert judgment. Whilst much research is ongoing to map and characterise active faults in Pakistan, knowledge of the seismogenic properties of the active faults is still incomplete in much of the region. Consequently, seismicity, both historical and instrumental, remains a primary guide to the seismogenic sources of Pakistan. This study utilises a cluster analysis approach for the purposes of identifying spatial differences in seismicity, which can be utilised to form a basis for delineating seismogenic source regions. An effort is made to examine seismicity partitioning for Pakistan with respect to earthquake database, seismic cluster analysis and seismic partitions in a seismic hazard context. A magnitude homogenous earthquake catalogue has been compiled using various available earthquake data. The earthquake catalogue covers a time span from 1930 to 2007 and

  13. Structural cluster analysis of chemical reactions in solution

    NASA Astrophysics Data System (ADS)

    Gallet, Grégoire A.; Pietrucci, Fabio

    2013-08-01

    We introduce a simple and general approach to the problem of clustering structures from atomic trajectories of chemical reactions in solution. By considering distance metrics which are invariant under permutation of identical atoms or molecules, we demonstrate that it is possible to automatically resolve as distinct structural clusters the configurations corresponding to reactants, products, and transition states, even in presence of atom-exchanges and of hundreds of solvent molecules. Our approach strongly simplifies the analysis of large trajectories and it opens the way to the construction of kinetic network models of activated processes in solution employing the available efficient schemes developed for proteins conformational ensembles.

  14. Cluster analysis of movement patterns in multiarticular actions: a tutorial.

    PubMed

    Rein, Robert; Button, Chris; Davids, Keith; Summers, Jeffery

    2010-04-01

    The present paper proposes a technical analysis method for extracting information about movement patterning in studies of motor control, based on a cluster analysis of movement kinematics. In a tutorial fashion, data from three different experiments are presented to exemplify and validate the technical method. When applied to three different basketball-shooting techniques, the method clearly distinguished between the different patterns. When applied to a cyclical wrist supination-pronation task, the cluster analysis provided the same results as an analysis using the conventional discrete relative phase measure. Finally, when analyzing throwing performance constrained by distance to target, the method grouped movement patterns together according to throwing distance. In conclusion, the proposed technical method provides a valuable tool to improve understanding of coordination and control in different movement models, including multiarticular actions.

  15. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    PubMed

    Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

    2015-01-01

    One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome. PMID:26562156

  16. Outcome-Driven Cluster Analysis with Application to Microarray Data

    PubMed Central

    Hsu, Jessie J.; Finkelstein, Dianne M.; Schoenfeld, David A.

    2015-01-01

    One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome. PMID:26562156

  17. Cluster coarsening during polymer collapse: Finite-size scaling analysis

    NASA Astrophysics Data System (ADS)

    Majumder, Suman; Janke, Wolfhard

    2015-06-01

    We study the kinetics of the collapse of a single flexible polymer when it is quenched from a good solvent to a poor solvent. Results obtained from Monte Carlo simulations show that the collapse occurs through a sequence of events with the formation, growth and subsequent coalescence of clusters of monomers to a single compact globule. Particular emphasis is given in this work to the cluster growth during the collapse, analyzed via the application of finite-size scaling techniques. The growth exponent obtained in our analysis is suggestive of the universal Lifshitz-Slyozov mechanism of cluster growth. The methods used in this work could be of more general validity and applicable to other phenomena such as protein folding.

  18. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    SciTech Connect

    Glascoe, L G; Glaser, R E; Chin, H S; Loosmore, G A

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goal of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.

  19. Full text clustering and relationship network analysis of biomedical publications.

    PubMed

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  20. The Productivity Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  1. Full text clustering and relationship network analysis of biomedical publications.

    PubMed

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers. PMID:25250864

  2. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    PubMed

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  3. The Quantitative Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  4. Bayesian Analysis of Multiple Populations in Galactic Globular Clusters

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan

    2016-01-01

    We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.

  5. Applying cluster analysis to physics education research data

    NASA Astrophysics Data System (ADS)

    Springuel, R. Padraic

    One major thrust of Physics Education Research (PER) is the identification of student ideas about specific physics concepts, both correct ideas and those that differ from the expert consensus. Typically the research process of eliciting the spectrum of student ideas involves the administration of specially designed questions to students. One major analysis task in PER is the sorting of these student responses into thematically coherent groups. This process is one which has previously been done by eye in PER. This thesis explores the possibility of using cluster analysis to perform the task in a more rigorous and less time-intensive fashion while making fewer assumptions about what the students are doing. Since this technique has not previously been used in PER, a summary of the various kinds of cluster analysis is included as well as a discussion of which might be appropriate for the task of sorting student responses into groups. Two example data sets (one based on the Force and Motion Conceptual Evaluation (DICE) the other looking at acceleration in two-dimensions (A2D) are examined in depth to demonstrate how cluster analysis can be applied to PER data and the various considerations which must be taken into account when doing so. In both cases, the techniques described in this thesis found 5 groups which contained about 90% of the students in the data set. The results of this application are compared to previous research on the topics covered by the two examples to demonstrate that cluster analysis can effectively uncover the same patterns in student responses that have already been identified.

  6. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

    PubMed Central

    Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683

  7. Volatile trace elements in and cluster analysis of Martian meteorites

    NASA Astrophysics Data System (ADS)

    Wang, Ming-Sheng; Mokos, Jennifer A.; Lipschutz, Michael E.

    1998-07-01

    We report data for 15 mainly volatile trace elements (Ag, Au, Bi, Cd, Co, Cs, Ga, In, Rb, Sb, Se, Te, Tl, U, Zn) by radiochemical neutron activation analysis (RNAA) in whole-rock samples of 5 martian meteorites which, with 7 others studied earlier, completes the 12-member martian meteorite suite. Nearly all of these elements exhibit highly variable compositional continua and are richer in the martian suite compared with other basaltic meteorites. From cluster analysis, we find that the clustering of subtypes based on these elements is virtually identical to that based on contents of major refractory elements and mineralogic/petrographic characteristics, implying that each source region on Mars was closed to volatile transport. Martian meteorite data can be used to infer volatile element contents in that planet.

  8. Segment clustering methodology for unsupervised Holter recordings analysis

    NASA Astrophysics Data System (ADS)

    Rodríguez-Sotelo, Jose Luis; Peluffo-Ordoñez, Diego; Castellanos Dominguez, German

    2015-01-01

    Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabelled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.

  9. Finite element adaptive mesh analysis using a cluster of workstations

    NASA Astrophysics Data System (ADS)

    Wang, K. P.; Bruch, J. C., Jr.

    1998-01-01

    Parallel computation on clusters of workstations is becoming one of the major trends in the study of parallel computations, because of their high computing speed, cost effectiveness and scalability. This paper presents studies of using a cluster of workstations for the finite element adaptive mesh analysis of a free surface seepage problem. A parallel algorithm proven to be simple to implement and efficient is used to perform the analysis. A network of workstations is used as the hardware of a parallel system. Two parallel software packages, P4 and PVM (parallel virtual machine), are used to handle communications among networked workstations. Computational issues to be discussed are domain decomposition, load balancing, and communication time.

  10. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis.

    PubMed

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  11. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    PubMed Central

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  12. Coupled two-way clustering analysis of gene microarray data

    NASA Astrophysics Data System (ADS)

    Getz, Gad; Levine, Erel; Domany, Eytan

    2000-10-01

    We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.

  13. Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis

    ERIC Educational Resources Information Center

    Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian

    2006-01-01

    Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…

  14. Medical record linkage in health information systems by approximate string matching and clustering

    PubMed Central

    Sauleau, Erik A; Paumier, Jean-Philippe; Buemi, Antoine

    2005-01-01

    Background Multiplication of data sources within heterogeneous healthcare information systems always results in redundant information, split among multiple databases. Our objective is to detect exact and approximate duplicates within identity records, in order to attain a better quality of information and to permit cross-linkage among stand-alone and clustered databases. Furthermore, we need to assist human decision making, by computing a value reflecting identity proximity. Methods The proposed method is in three steps. The first step is to standardise and to index elementary identity fields, using blocking variables, in order to speed up information analysis. The second is to match similar pair records, relying on a global similarity value taken from the Porter-Jaro-Winkler algorithm. And the third is to create clusters of coherent related records, using graph drawing, agglomerative clustering methods and partitioning methods. Results The batch analysis of 300,000 "supposedly" distinct identities isolates 240,000 true unique records, 24,000 duplicates (clusters composed of 2 records) and 3,000 clusters whose size is greater than or equal to 3 records. Conclusion Duplicate-free databases, used in conjunction with relevant indexes and similarity values, allow immediate (i.e.: real-time) proximity detection when inserting a new identity. PMID:16219102

  15. Cluster Analysis and Fuzzy Query in Ship Maintenance and Design

    NASA Astrophysics Data System (ADS)

    Che, Jianhua; He, Qinming; Zhao, Yinggang; Qian, Feng; Chen, Qi

    Cluster analysis and fuzzy query win wide-spread applications in modern intelligent information processing. In allusion to the features of ship maintenance data, a variant of hypergraph-based clustering algorithm, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST), is proposed to analyze the bulky data rooting in ship maintenance process, discovery the unknown rules and help ship maintainers make a decision on various device fault causes. At the same time, revising or renewing an existed design of ship or device maybe necessary to eliminate those device faults. For the sake of offering ship designers some valuable hints, a fuzzy query mechanism is designed to retrieve the useful information from large-scale complicated and reluctant ship technical and testing data. Finally, two experiments based on a real ship device fault statistical dataset validate the flexibility and efficiency of the CC-MST algorithm. A fuzzy query prototype demonstrates the usability of our fuzzy query mechanism.

  16. Accident patterns for construction-related workers: a cluster analysis

    NASA Astrophysics Data System (ADS)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2012-01-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  17. Accident patterns for construction-related workers: a cluster analysis

    NASA Astrophysics Data System (ADS)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2011-12-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  18. Cluster: Mission Overview and End-of-Life Analysis

    NASA Technical Reports Server (NTRS)

    Pallaschke, S.; Munoz, I.; Rodriquez-Canabal, J.; Sieg, D.; Yde, J. J.

    2007-01-01

    The Cluster mission is part of the scientific programme of the European Space Agency (ESA) and its purpose is the analysis of the Earth's magnetosphere. The Cluster project consists of four satellites. The selected polar orbit has a shape of 4.0 and 19.2 Re which is required for performing measurements near the cusp and the tail of the magnetosphere. When crossing these regions the satellites form a constellation which in most of the cases so far has been a regular tetrahedron. The satellite operations are carried out by the European Space Operations Centre (ESOC) at Darmstadt, Germany. The paper outlines the future orbit evolution and the envisaged operations from a Flight Dynamics point of view. In addition a brief summary of the LEOP and routine operations is included beforehand.

  19. Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks

    NASA Astrophysics Data System (ADS)

    Johnson, Joseph

    2016-03-01

    We have designed and build an optimal numerical standardization algorithm that links numerical values with their associated units, error level, and defining metadata thus supporting automated data exchange and new levels of artificial intelligence (AI). The software manages all dimensional and error analysis and computational tracing. Tables of entities verses properties of these generalized numbers (called ``metanumbers'') support a transformation of each table into a network among the entities and another network among their properties where the network connection matrix is based upon a proximity metric between the two items. We previously proved that every network is isomorphic to the Lie algebra that generates continuous Markov transformations. We have also shown that the eigenvectors of these Markov matrices provide an agnostic clustering of the underlying patterns. We will present this methodology and show how our new work on conversion of scientific numerical data through this process can reveal underlying information clusters ordered by the eigenvalues. We will also show how the linking of clusters from different tables can be used to form a ``supernet'' of all numerical information supporting new initiatives in AI.

  20. Covariance analysis of differential drag-based satellite cluster flight

    NASA Astrophysics Data System (ADS)

    Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini

    2016-06-01

    One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.

  1. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering

    PubMed Central

    Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample. PMID:27764138

  2. The REFLEX II galaxy cluster survey: power spectrum analysis

    NASA Astrophysics Data System (ADS)

    Balaguera-Antolínez, A.; Sánchez, Ariel G.; Böhringer, H.; Collins, C.; Guzzo, L.; Phleps, S.

    2011-05-01

    We present the power spectrum of galaxy clusters measured from the new ROSAT-ESO Flux-Limited X-Ray (REFLEX II) galaxy cluster catalogue. This new sample extends the flux limit of the original REFLEX catalogue to 1.8 × 10-12 erg s-1 cm-2, yielding a total of 911 clusters with ≥94 per cent completeness in redshift follow-up. The analysis of the data is improved by creating a set of 100 REFLEX II-catalogue-like mock galaxy cluster catalogues built from a suite of large-volume Λ cold dark matter (ΛCDM) N-body simulations (L-BASICC II). The measured power spectrum is in agreement with the predictions from a ΛCDM cosmological model. The measurements show the expected increase in the amplitude of the power spectrum with increasing X-ray luminosity. On large scales, we show that the shape of the measured power spectrum is compatible with a scale-independent bias and provide a model for the amplitude that allows us to connect our measurements with a cosmological model. By implementing a luminosity-dependent power-spectrum estimator, we observe that the power spectrum measured from the REFLEX II sample is weakly affected by flux-selection effects. The shape of the measured power spectrum is compatible with a featureless power spectrum on scales k > 0.01 h Mpc-1 and hence no statistically significant signal of baryonic acoustic oscillations can be detected. We show that the measured REFLEX II power spectrum displays signatures of non-linear evolution.

  3. Proteomics Analysis Reveals Overlapping Functions of Clustered Protocadherins*

    PubMed Central

    Han, Meng-Hsuan; Lin, Chengyi; Meng, Shuxia; Wang, Xiaozhong

    2010-01-01

    The three tandem-arrayed protocadherin (Pcdh) gene clusters, namely Pcdh-α, Pcdh-β, and Pcdh-γ, play important roles in the development of the vertebrate central nervous system. To gain insight into the molecular action of PCDHs, we performed a systematic proteomics analysis of PCDH-γ-associated protein complexes. We identified a list of 154 non-redundant proteins in the PCDH-γ complexes. This list includes nearly 30 members of clustered Pcdh-α, -β, and -γ families as core components of the complexes and additionally over 120 putative PCDH-associated proteins. We validated a selected subset of PCDH-γ-associated proteins using specific antibodies. Analysis of the identities of PCDH-associated proteins showed that the majority of them overlap with the proteomic profile of postsynaptic density preparations. Further analysis of membrane protein complexes revealed that several validated PCDH-γ-associated proteins exhibit reduced levels in Pcdh-γ-deficient brain tissues. Therefore, PCDH-γs are required for the integrity of the complexes. However, the size of the overall complexes and the abundance of many other proteins remained unchanged, raising a possibility that PCDH-αs and PCDH-βs might compensate for PCDH-γ function in complex formation. As a test of this idea, RNA interference knockdown of both PCDH-αs and PCDH-γs showed that PCDHs have redundant functions in regulating neuronal survival in the chicken spinal cord. Taken together, our data provide evidence that clustered PCDHs coexist in large protein complexes and have overlapping functions during vertebrate neural development. PMID:19843561

  4. [Dermatoglyphics parameters and cluster analysis of seven minority nationalities].

    PubMed

    Zhang, H G; Shen, R C; Su, Y B; Chen, R B; Feng, B; Ding, M; Huang, M L; Wang, Y P; Jiao, Y P; Peng, L

    1989-01-01

    This paper reports the normal values of dermatoglyphics parameters of seven minority nationalities in Yunnan Province which are Bai, Blang, Yi, Hui, Lisu, Nu and Jinuo. The test of difference signification and cluster analysis show different parameters in several nationalities and the greatest most remarkable difference between Jinou and other nationalities. Han is very different from several nationalities. In each nationality, the symmetry pattern of same name finger or area is highly unanimous, the symmetry between left and right does not show random combination.

  5. Nonuniqueness in traveltime tomography: Ensemble inference and cluster analysis

    SciTech Connect

    Vasco, D.W.; Peterson, J.E. Jr.; Majer, E.L.

    1996-07-01

    The authors examine the nonlinear aspects of seismic traveltime tomography. This is accomplished by completing an extensive set of conjugate gradient inversions on a parallel virtual machine, with each initiated by a different starting model. The goal is an exploratory analysis of a set of conjugate gradient solutions to the traveltime tomography problem. The authors find that distinct local minima are generated when prior constraints are imposed on traveltime tomographic inverse problems. Methods from cluster analysis determine the number and location of the isolated solutions to the traveltime tomography problem. They apply the cluster analysis techniques to a cross-borehole traveltime data set gathered at the Gypsy Pilot Site in Pawnee County, Oklahoma. They find that the 1075 final models, satisfying the traveltime data and a model norm penalty, form up to 61 separate solutions. All solutions appear to contain a central low velocity zone bounded above and below by higher velocity layers. Such a structure agrees with well-logs, hydrological well tests, and a previous seismic inversion.

  6. [Clustering analysis applied to near-infrared spectroscopy analysis of Chinese traditional medicine].

    PubMed

    Liu, Mu-qing; Zhou, De-cheng; Xu, Xin-yuan; Sun, Yao-jie; Zhou, Xiao-li; Han, Lei

    2007-10-01

    The present article discusses the clustering analysis used in the near-infrared (NIR) spectroscopy analysis of Chinese traditional medicines, which provides a new method for the classification of Chinese traditional medicines. Samples selected purposely in the authors' research to measure their absorption spectra in seconds by a multi-channel NIR spectrometer developed in the authors' lab were safrole, eucalypt oil, laurel oil, turpentine, clove oil and three samples of costmary oil from different suppliers. The spectra in the range of 0.70-1.7 microm were measured with air as background and the results indicated that they are quite distinct. Qualitative mathematical model was set up and cluster analysis based on the spectra was carried out through different clustering methods for optimization, and came out the cluster correlation coefficient of 0.9742 in the authors' research. This indicated that cluster analysis of the group of samples is practicable. Also it is reasonable to get the result that the calculated classification of 8 samples was quite accorded with their characteristics, especially the three samples of costmary oil were in the closest classification of the clustering analysis. PMID:18306778

  7. Multivariate cluster analysis of forest fire events in Portugal

    NASA Astrophysics Data System (ADS)

    Tonini, Marj; Pereira, Mario; Vega Orozco, Carmen; Parente, Joana

    2015-04-01

    Portugal is one of the major fire-prone European countries, mainly due to its favourable climatic, topographic and vegetation conditions. Compared to the other Mediterranean countries, the number of events registered here from 1980 up to nowadays is the highest one; likewise, with respect to the burnt area, Portugal is the third most affected country. Portuguese mapped burnt areas are available from the website of the Institute for the Conservation of Nature and Forests (ICNF). This official geodatabase is the result of satellite measurements starting from the year 1990. The spatial information, delivered in shapefile format, provides a detailed description of the shape and the size of area burnt by each fire, while the date/time information relate to the ignition fire is restricted to the year of occurrence. In terms of a statistical formalism wildfires can be associated to a stochastic point process, where events are analysed as a set of geographical coordinates corresponding, for example, to the centroid of each burnt area. The spatio/temporal pattern of stochastic point processes, including the cluster analysis, is a basic procedure to discover predisposing factorsas well as for prevention and forecasting purposes. These kinds of studies are primarily focused on investigating the spatial cluster behaviour of environmental data sequences and/or mapping their distribution at different times. To include both the two dimensions (space and time) a comprehensive spatio-temporal analysis is needful. In the present study authors attempt to verify if, in the case of wildfires in Portugal, space and time act independently or if, conversely, neighbouring events are also closer in time. We present an application of the spatio-temporal K-function to a long dataset (1990-2012) of mapped burnt areas. Moreover, the multivariate K-function allowed checking for an eventual different distribution between small and large fires. The final objective is to elaborate a 3D

  8. An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

    USGS Publications Warehouse

    McKenna, J.E.

    2003-01-01

    The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.

  9. Time series clustering analysis of health-promoting behavior

    NASA Astrophysics Data System (ADS)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  10. Clustering Financial Time Series by Network Community Analysis

    NASA Astrophysics Data System (ADS)

    Piccardi, Carlo; Calatroni, Lisa; Bertoni, Fabio

    In this paper, we describe a method for clustering financial time series which is based on community analysis, a recently developed approach for partitioning the nodes of a network (graph). A network with N nodes is associated to the set of N time series. The weight of the link (i, j), which quantifies the similarity between the two corresponding time series, is defined according to a metric based on symbolic time series analysis, which has recently proved effective in the context of financial time series. Then, searching for network communities allows one to identify groups of nodes (and then time series) with strong similarity. A quantitative assessment of the significance of the obtained partition is also provided. The method is applied to two distinct case-studies concerning the US and Italy Stock Exchange, respectively. In the US case, the stability of the partitions over time is also thoroughly investigated. The results favorably compare with those obtained with the standard tools typically used for clustering financial time series, such as the minimal spanning tree and the hierarchical tree.

  11. Analysis of clustered data in community psychology: with an example from a worksite smoking cessation project.

    PubMed

    Hedeker, D; McMahon, S D; Jason, L A; Salina, D

    1994-10-01

    Although it is common in community psychology research to have data at both the community, or cluster, and individual level, the analysis of such clustered data often presents difficulties for many researchers. Since the individuals within the cluster cannot be assumed to be independent, the use of many traditional statistical techniques that assumes independence of observations is problematic. Further, there is often interest in assessing the degree of dependence in the data resulting from the clustering of individuals within communities. In this paper, a random-effects regression model is described for analysis of clustered data. Unlike ordinary regression analysis of clustered data, random-effects regression models do not assume that each observation is independent, but do assume data within clusters are dependent to some degree. The degree of this dependency is estimated along with estimates of the usual model parameters, thus adjusting these effects for the dependency resulting from the clustering of the data. Models are described for both continuous and dichotomous outcome variables, and available statistical software for these models is discussed. An analysis of a data set where individuals are clustered within firms is used to illustrate features of random-effects regression analysis, relative to both individual-level analysis which ignores the clustering of the data, and cluster-level analysis which aggregates the individual data. PMID:7755003

  12. The composite sequential clustering technique for analysis of multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  13. Subgroups of physically abusive parents based on cluster analysis of parenting behavior and affect.

    PubMed

    Haskett, Mary E; Scott, Susan Smith; Ward, Caryn Sabourin

    2004-10-01

    Cluster analysis of observed parenting and self-reported discipline was used to categorize 83 abusive parents into subgroups. A 2-cluster solution received support for validity. Cluster 1 parents were relatively warm, positive, sensitive, and engaged during interactions with their children, whereas Cluster 2 parents were relatively negative, disengaged or intrusive, and insensitive. Further, clusters differed in emotional health, parenting stress, perceptions of children, and problem solving. Children of parents in the 2 clusters differed on several indexes of social adjustment. Cluster 1 parents were similar to nonabusive parents (n = 66) on parenting and related constructs, but Cluster 2 parents differed from nonabusive parents on all clustering variables and many validation variables. Results highlight clinically relevant diversity in parenting practices and functioning among abusive parents.

  14. Highlights of the Merging Cluster Collaboration's Analysis of 26 Radio Relic Galaxy Cluster Mergers

    NASA Astrophysics Data System (ADS)

    Dawson, William; Golovich, Nathan; Wittman, David M.; Bradac, Marusa; Brüggen, Marcus; Bullock, James; Elbert, Oliver; Jee, James; Kaplinghat, Manoj; Kim, Stacy; Mahdavi, Andisheh; Merten, Julian; Ng, Karen; Annika, Peter; Rocha, Miguel E.; Sobral, David; Stroe, Andra; Van Weeren, Reinout J.; Merging Cluster Collaboration

    2016-01-01

    Merging galaxy clusters are now recognized as multifaceted probes providing unique insight into the properties of dark matter, the environmental impact of plasma shocks on galaxy evolution, and the physics of high energy particle acceleration. The Merging Cluster Collaboration has used the diffuse radio emission associated with the synchrotron radiation of relativistic particles accelerated by shocks generated during major cluster mergers (i.e. radio relics) to identify a homogenous sample of 26 galaxy cluster mergers. We have confirmed theoretical expectations that radio relics are predominantly associated with mergers occurring near the plane of the sky and at a relatively common merger phase; making them ideal probes of self-interacting dark matter, and eliminating much of the dominant uncertainty when relating the observed star formation rates to the event of the major cluster merger. We will highlight a number of the discovered common traits of this sample as well as detailed measurements of individual mergers.

  15. Symbolic clustering

    SciTech Connect

    Reinke, R.E.

    1991-01-01

    Clustering is the problem of finding a good organization for data. Because there are many kinds of clustering problems, and because there are many possible clusterings for any data set, clustering programs use knowledge and assumptions about individual problems to make clustering tractable. Cluster-analysis techniques allow knowledge to be expressed in the choice of a pairwise distance measure and in the choice of clustering algorithm. Conceptual clustering adds knowledge and preferences about cluster descriptions. In this study the author describes symbolic clustering, which adds representation choice to the set of ways a data analyst can use problem-specific knowledge. He develops an informal model for symbolic clustering, and uses it to suggest where and how knowledge can be expressed in clustering. A language for creating symbolic clusters, based on the model, was developed and tested on three real clustering problems. The study concludes with a discussion of the implications of the model and the results for clustering in general.

  16. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    ERIC Educational Resources Information Center

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  17. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    PubMed

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements.

  18. The XMM Cluster Survey: optical analysis methodology and the first data release

    NASA Astrophysics Data System (ADS)

    Mehrtens, Nicola; Romer, A. Kathy; Hilton, Matt; Lloyd-Davies, E. J.; Miller, Christopher J.; Stanford, S. A.; Hosmer, Mark; Hoyle, Ben; Collins, Chris A.; Liddle, Andrew R.; Viana, Pedro T. P.; Nichol, Robert C.; Stott, John P.; Dubois, E. Naomi; Kay, Scott T.; Sahlén, Martin; Young, Owain; Short, C. J.; Christodoulou, L.; Watson, William A.; Davidson, Michael; Harrison, Craig D.; Baruah, Leon; Smith, Mathew; Burke, Claire; Mayers, Julian A.; Deadman, Paul-James; Rooney, Philip J.; Edmondson, Edward M.; West, Michael; Campbell, Heather C.; Edge, Alastair C.; Mann, Robert G.; Sabirli, Kivanc; Wake, David; Benoist, Christophe; da Costa, Luiz; Maia, Marcio A. G.; Ogando, Ricardo

    2012-06-01

    The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters using all publicly available data in the XMM-Newton Science Archive. Its main aims are to measure cosmological parameters and trace the evolution of X-ray scaling relations. In this paper we present the first data release from the XMM Cluster Survey (XCS-DR1). This consists of 503 optically confirmed, serendipitously detected, X-ray clusters. Of these clusters, 256 are new to the literature and 357 are new X-ray discoveries. We present 463 clusters with a redshift estimate (0.06 < z < 1.46), including 261 clusters with spectroscopic redshifts. The remainder have photometric redshifts. In addition, we have measured X-ray temperatures (TX) for 401 clusters (0.4 < TX < 14.7 keV). We highlight seven interesting subsamples of XCS-DR1 clusters: (i) 10 clusters at high redshift (z > 1.0, including a new spectroscopically confirmed cluster at z= 1.01); (ii) 66 clusters with high TX (>5 keV) (iii) 130 clusters/groups with low TX (<2 keV) (iv) 27 clusters with measured TX values in the Sloan Digital Sky Survey (SDSS) ‘Stripe 82’ co-add region; (v) 77 clusters with measured TX values in the Dark Energy Survey region; (vi) 40 clusters detected with sufficient counts to permit mass measurements (under the assumption of hydrostatic equilibrium); (vii) 104 clusters that can be used for applications such as the derivation of cosmological parameters and the measurement of cluster scaling relations. The X-ray analysis methodology used to construct and analyse the XCS-DR1 cluster sample has been presented in a companion paper, Lloyd-Davies et al.

  19. Outlier Identification in Model-Based Cluster Analysis

    PubMed Central

    Evans, Katie; Love, Tanzy; Thurston, Sally W.

    2015-01-01

    In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data. PMID:26806993

  20. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis.

    PubMed

    Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I

    2016-02-01

    Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at

  1. Cluster analysis application in research on pork quality determinants

    NASA Astrophysics Data System (ADS)

    Przybylski, W.; Wasiewicz, P.; Zieliński, P.; Gromadzka-Ostrowska, J.; Olczak, E.; Jaworska, D.; Niemyjski, S.; Santé-Lhoutellier, V.

    2010-09-01

    In this paper data mining methods were applied to investigate features determining high quality pork meat. The aim of the study was analysis of conditionality of the pork meat quality defined in coherence with HDL and LDL cholesterol concentration, plasma leptin, triglycerides, plasma glucose and serum. The research was carried out on 54 pigs. originated from crossbreeding of Naima sows with P76-PenArLan boars hybrids line. Meat quality parameters were evaluated in samples derived from the Longissimus (LD) muscle taken behind the last rib on the basis: the pH value, meat colour, drip loss, the RTN, intramuscular fat and glycolytic potential. The results of this study were elaborated by using R environment and show that cluster and regression analysis can be a useful tool for in-depth analysis of the determinants of the quality of pig meat in homogeneous populations of pigs. However, the question of determinants of the level of glycogen and fat in meat requires further research.

  2. AVES: A Computer Cluster System approach for INTEGRAL Scientific Analysis

    NASA Astrophysics Data System (ADS)

    Federici, M.; Martino, B. L.; Natalucci, L.; Umbertini, P.

    The AVES computing system, based on an "Cluster" architecture is a fully integrated, low cost computing facility dedicated to the archiving and analysis of the INTEGRAL data. AVES is a modular system that uses the software resource manager (SLURM) and allows almost unlimited expandibility (65,536 nodes and hundreds of thousands of processors); actually is composed by 30 Personal Computers with Quad-Cores CPU able to reach the computing power of 300 Giga Flops (300x10{9} Floating point Operations Per Second), with 120 GB of RAM and 7.5 Tera Bytes (TB) of storage memory in UFS configuration plus 6 TB for users area. AVES was designed and built to solve growing problems raised from the analysis of the large data amount accumulated by the INTEGRAL mission (actually about 9 TB) and due to increase every year. The used analysis software is the OSA package, distributed by the ISDC in Geneva. This is a very complex package consisting of dozens of programs that can not be converted to parallel computing. To overcome this limitation we developed a series of programs to distribute the workload analysis on the various nodes making AVES automatically divide the analysis in N jobs sent to N cores. This solution thus produces a result similar to that obtained by the parallel computing configuration. In support of this we have developed tools that allow a flexible use of the scientific software and quality control of on-line data storing. The AVES software package is constituted by about 50 specific programs. Thus the whole computing time, compared to that provided by a Personal Computer with single processor, has been enhanced up to a factor 70.

  3. Combined clustering models for the analysis of gene expression

    SciTech Connect

    Angelova, M. Ellman, J.

    2010-02-15

    Clustering has become one of the fundamental tools for analyzing gene expression and producing gene classifications. Clustering models enable finding patterns of similarity in order to understand gene function, gene regulation, cellular processes and sub-types of cells. The clustering results however have to be combined with sequence data or knowledge about gene functionality in order to make biologically meaningful conclusions. In this work, we explore a new model that integrates gene expression with sequence or text information.

  4. Fuzzy and hard clustering analysis for thyroid disease.

    PubMed

    Azar, Ahmad Taher; El-Said, Shaimaa Ahmed; Hassanien, Aboul Ella

    2013-07-01

    Thyroid hormones produced by the thyroid gland help regulation of the body's metabolism. A variety of methods have been proposed in the literature for thyroid disease classification. As far as we know, clustering techniques have not been used in thyroid diseases data set so far. This paper proposes a comparison between hard and fuzzy clustering algorithms for thyroid diseases data set in order to find the optimal number of clusters. Different scalar validity measures are used in comparing the performances of the proposed clustering systems. To demonstrate the performance of each algorithm, the feature values that represent thyroid disease are used as input for the system. Several runs are carried out and recorded with a different number of clusters being specified for each run (between 2 and 11), so as to establish the optimum number of clusters. To find the optimal number of clusters, the so-called elbow criterion is applied. The experimental results revealed that for all algorithms, the elbow was located at c=3. The clustering results for all algorithms are then visualized by the Sammon mapping method to find a low-dimensional (normally 2D or 3D) representation of a set of points distributed in a high dimensional pattern space. At the end of this study, some recommendations are formulated to improve determining the actual number of clusters present in the data set. PMID:23357404

  5. Study on Cluster Analysis Used with Laser-Induced Breakdown Spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Li'ao; Wang, Qianqian; Zhao, Yu; Liu, Li; Peng, Zhong

    2016-06-01

    Supervised learning methods (eg. PLS-DA, SVM, etc.) have been widely used with laser-induced breakdown spectroscopy (LIBS) to classify materials; however, it may induce a low correct classification rate if a test sample type is not included in the training dataset. Unsupervised cluster analysis methods (hierarchical clustering analysis, K-means clustering analysis, and iterative self-organizing data analysis technique) are investigated in plastics classification based on the line intensities of LIBS emission in this paper. The results of hierarchical clustering analysis using four different similarity measuring methods (single linkage, complete linkage, unweighted pair-group average, and weighted pair-group average) are compared. In K-means clustering analysis, four kinds of choosing initial centers methods are applied in our case and their results are compared. The classification results of hierarchical clustering analysis, K-means clustering analysis, and ISODATA are analyzed. The experiment results demonstrated cluster analysis methods can be applied to plastics discrimination with LIBS. supported by Beijing Natural Science Foundation of China (No. 4132063)

  6. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    SciTech Connect

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh; Wang, Shaobu; Mackey, Patrick S.; Hines, Paul; Huang, Zhenyu

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques on two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.

  7. Multidimensional cluster stability analysis from a Brazilian Bradyrhizobium sp. RFLP/PCR data set

    NASA Astrophysics Data System (ADS)

    Milagre, S. T.; Maciel, C. D.; Shinoda, A. A.; Hungria, M.; Almeida, J. R. B.

    2009-05-01

    The taxonomy of the N2-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradyrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster.

  8. Investigating Regional Disparities of China's Human Development with Cluster Analysis: A Historical Perspective

    ERIC Educational Resources Information Center

    Yang, Yongheng; Hu, Angang

    2008-01-01

    This paper adopts both one-dimensional and multi-dimensional cluster analysis to analyze China's HDI data for 1982, 1995, 1999, and 2003, and to classify China's provinces into four tiers based on the three basic developmental aspects embedded in HDI. The classifications by cluster analysis depends on the observations' similarities with respect to…

  9. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    ERIC Educational Resources Information Center

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  10. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  11. Alternatives to Multilevel Modeling for the Analysis of Clustered Data

    ERIC Educational Resources Information Center

    Huang, Francis L.

    2016-01-01

    Multilevel modeling has grown in use over the years as a way to deal with the nonindependent nature of observations found in clustered data. However, other alternatives to multilevel modeling are available that can account for observations nested within clusters, including the use of Taylor series linearization for variance estimation, the design…

  12. Detecting Hotspots from Taxi Trajectory Data Using Spatial Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Zhao, P. X.; Qin, K.; Zhou, Q.; Liu, C. K.; Chen, Y. X.

    2015-07-01

    A method of trajectory clustering based on decision graph and data field is proposed in this paper. The method utilizes data field to describe spatial distribution of trajectory points, and uses decision graph to discover cluster centres. It can automatically determine cluster parameters and is suitable to trajectory clustering. The method is applied to trajectory clustering on taxi trajectory data, which are on the holiday (May 1st, 2014), weekday (Wednesday, May 7th, 2014) and weekend (Saturday, May 10th, 2014) respectively, in Wuhan City, China. The hotspots in four hours (8:00-9:00, 12:00-13:00, 18:00-19:00 and 23:00-24:00) for three days are discovered and visualized in heat maps. In the future, we will further research the spatiotemporal distribution and laws of these hotspots, and use more data to carry out the experiments.

  13. Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

    SciTech Connect

    Steenbergen, K. G.; Gaston, N.

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.

  14. Visual Cluster Analysis in Support of Clinical Decision Intelligence

    PubMed Central

    Gotz, David; Sun, Jimeng; Cao, Nan; Ebadollahi, Shahram

    2011-01-01

    Electronic health records (EHRs) contain a wealth of information about patients. In addition to providing efficient and accurate records for individual patients, large databases of EHRs contain valuable information about overall patient populations. While statistical insights describing an overall population are beneficial, they are often not specific enough to use as the basis for individualized patient-centric decisions. To address this challenge, we describe an approach based on patient similarity which analyzes an EHR database to extract a cohort of patient records most similar to a specific target patient. Clusters of similar patients are then visualized to allow interactive visual refinement by human experts. Statistics are then extracted from the refined patient clusters and displayed to users. The statistical insights taken from these refined clusters provide personalized guidance for complex decisions. This paper focuses on the cluster refinement stage where an expert user must interactively (a) judge the quality and contents of automatically generated similar patient clusters, and (b) refine the clusters based on his/her expertise. We describe the DICON visualization tool which allows users to interactively view and refine multidimensional similar patient clusters. We also present results from a preliminary evaluation where two medical doctors provided feedback on our approach. PMID:22195102

  15. Topological Analysis of Emerging Bipole Clusters Producing Violent Solar Events

    NASA Astrophysics Data System (ADS)

    Mandrini, C. H.; Schmieder, B.; Démoulin, P.; Guo, Y.; Cristiani, G. D.

    2014-06-01

    During the rising phase of Solar Cycle 24 tremendous activity occurred on the Sun with rapid and compact emergence of magnetic flux leading to bursts of flares (C to M and even X-class). We investigate the violent events occurring in the cluster of two active regions (ARs), NOAA numbers 11121 and 11123, observed in November 2010 with instruments onboard the Solar Dynamics Observatory and from Earth. Within one day the total magnetic flux increased by 70 % with the emergence of new groups of bipoles in AR 11123. From all the events on 11 November, we study, in particular, the ones starting at around 07:16 UT in GOES soft X-ray data and the brightenings preceding them. A magnetic-field topological analysis indicates the presence of null points, associated separatrices, and quasi-separatrix layers (QSLs) where magnetic reconnection is prone to occur. The presence of null points is confirmed by a linear and a non-linear force-free magnetic-field model. Their locations and general characteristics are similar in both modelling approaches, which supports their robustness. However, in order to explain the full extension of the analysed event brightenings, which are not restricted to the photospheric traces of the null separatrices, we compute the locations of QSLs. Based on this more complete topological analysis, we propose a scenario to explain the origin of a low-energy event preceding a filament eruption, which is accompanied by a two-ribbon flare, and a consecutive confined flare in AR 11123. The results of our topology computation can also explain the locations of flare ribbons in two other events, one preceding and one following the ones at 07:16 UT. Finally, this study provides further examples where flare-ribbon locations can be explained when compared to QSLs and only, partially, when using separatrices.

  16. Selecting representative climate simulations for impact studies using cluster analysis

    NASA Astrophysics Data System (ADS)

    Mendlik, Thomas; Gobiet, Andreas

    2013-04-01

    In climate change impact research it is crucial to carefully select the climatic input in order to realistically represent the uncertainty in climate scenarios. Usually, the selection of a few simulations as input for the impact investigation is mostly based on subjective expert judgment. However, a more sophisticated objective approach should consider the fact that these climate simulations stem from an ensemble of opportunity, which might inherit model inter-dependencies and biases. Such objective methods for sub-sampling climate simulations from a larger ensemble receive relatively small attention in scientific literature. This study represents one possible framework to aid selecting representative climate simulations for specific climate impact studies. By doing so, model interdependence is taken into account, leading to a more reliable ensemble. Multivariate statistical methods are used to describe model dependence based on the spatial patterns of their climate change signals. Several meteorological parameters important for impact models are therefor considered simultaneously. After using dimension reduction techniques, like principal component analysis, similar behavior of climate simulations is detected using cluster analysis. From each grouping found, one representative simulation will be selected, leading to a more independent sub-sample while conserving the main climate change characteristics of the original ensemble. This method can be applied using standard statistical software and is easily adoptable to various sets of meteorological variables and regions. We present an application of this method to select representative simulations from the ENSEMBLES regional multi-model ensemble for a variety of climate impact studies spread over the whole European continent in the EU-FP7 project IMPACT2C.

  17. Analysis of radial velocities in the Antlia cluster

    NASA Astrophysics Data System (ADS)

    Faifer, F. R.; Smith Castelli, A. V.; Calderón, J. P.; Caso, J. P.; Bassino, L. P.; Cellone, S. A.; Richtler, T.

    We present preliminary results of a radial velocity survey in the central re- gion of the Antlia cluster. These velocities have been measured on spec- tra obtained, in the 2008A and 2009A semesters, with GMOS (GEMINI South). In this way, several dwarf galaxies that had no previous radial ve- locities, have been confirmed as cluster members. Our work is based on the Ferguson & Sandage (1990) catalogue, in which originally only 6% of the catalogued galaxies (375) had radial velocities. Thanks to the newly determined radial velocities we are able to begin to disentangle the cluster internal structure. FULL TEXT IN SPANISH

  18. Cluster Analysis in Patients with GOLD 1 Chronic Obstructive Pulmonary Disease

    PubMed Central

    Gagnon, Philippe; Casaburi, Richard; Saey, Didier; Porszasz, Janos; Provencher, Steeve; Milot, Julie; Bourbeau, Jean; O’Donnell, Denis E.; Maltais, François

    2015-01-01

    Background We hypothesized that heterogeneity exists within the Global Initiative for Chronic Obstructive Lung Disease (GOLD) 1 spirometric category and that different subgroups could be identified within this GOLD category. Methods Pre-randomization study participants from two clinical trials were symptomatic/asymptomatic GOLD 1 chronic obstructive pulmonary disease (COPD) patients and healthy controls. A hierarchical cluster analysis used pre-randomization demographics, symptom scores, lung function, peak exercise response and daily physical activity levels to derive population subgroups. Results Considerable heterogeneity existed for clinical variables among patients with GOLD 1 COPD. All parameters, except forced expiratory volume in 1 second (FEV1)/forced vital capacity (FVC), had considerable overlap between GOLD 1 COPD and controls. Three-clusters were identified: cluster I (18 [15%] COPD patients; 105 [85%] controls); cluster II (45 [80%] COPD patients; 11 [20%] controls); and cluster III (22 [92%] COPD patients; 2 [8%] controls). Apart from reduced diffusion capacity and lower baseline dyspnea index versus controls, cluster I COPD patients had otherwise preserved lung volumes, exercise capacity and physical activity levels. Cluster II COPD patients had a higher smoking history and greater hyperinflation versus cluster I COPD patients. Cluster III COPD patients had reduced physical activity versus controls and clusters I and II COPD patients, and lower FEV1/FVC versus clusters I and II COPD patients. Conclusions The results emphasize heterogeneity within GOLD 1 COPD, supporting an individualized therapeutic approach to patients. Trial registration www.clinicaltrials.gov. NCT01360788 and NCT01072396. PMID:25906326

  19. Delineation of river bed-surface patches by clustering high-resolution spatial grain size data

    NASA Astrophysics Data System (ADS)

    Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.

    2014-01-01

    The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences

  20. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    PubMed Central

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  1. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than

  2. Visual cluster analysis and pattern recognition template and methods

    SciTech Connect

    Osbourn, G.C.; Martinez, R.F.

    1993-12-31

    This invention is comprised of a method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  3. Visual cluster analysis and pattern recognition template and methods

    DOEpatents

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    1999-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  4. Visual cluster analysis and pattern recognition template and methods

    DOEpatents

    Osbourn, G.C.; Martinez, R.F.

    1999-05-04

    A method of clustering using a novel template to define a region of influence is disclosed. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques. 30 figs.

  5. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    SciTech Connect

    Lalonde, Michel Wassenaar, Richard; Wells, R. Glenn; Birnie, David; Ruddy, Terrence D.

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  6. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    ERIC Educational Resources Information Center

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  7. Evidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data

    NASA Astrophysics Data System (ADS)

    Folino, Gianluigi; Gori, Fabio; Jetten, Mike S. M.; Marchiori, Elena

    The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.

  8. Mass spectrometric analysis with cluster projectiles and coincidence counting

    SciTech Connect

    Cox, B.D.

    1992-01-01

    Methods for maximizing the amount of secondary ion information, per primary projectile, are described. The method is based on time-of-flight mass spectrometry and event-by-event coincidence counting. The information obtained from coincidence counting time-of-flight mass spectrometry includes: (a) surface composition, (b) relative concentrations, and (c) degree of intermolecular mixing. The technique was applied to the study of an important new class of polymers: polymer blends. Secondary ion mass spectrometry, when applied to the analysis of synthetic polymers, induces backbone fragmentation which is characteristic of the homopolymer. The characteristic fingerprint peaks from polystyrene and poly(vinyl methyl ether) were used to identify the presence of these two polymers in a polymer blend. The percent coincidence between the characteristic secondary ions from each component of the blend were used to determine both the relative concentration and the degree of molecular mixing. Results indicate molecular segregation of the two polymers on the film surface. The largest degree of segregation was determined for the phase separated blends. The performance of this technique depends on the desorption efficiency of the primary projectiles. In practice one seeks primary ions which are surface sensitive, have controllable parameters such as size, velocity, and charge state, and generate high secondary ion yields. Focus was placed on the use of keV organic cluster projectiles to meet these criteria. Of interest to this study were C[sub 18] (chrysene), C[sub 24] (coronene), and C[sub 60] (buckminster-fulleren). Results indicate enhanced secondary ion yields for C[sub 60]. For example, when CsI is bombarded with 30 keV C[sub 60], the yields for I[sup [minus

  9. Strategies for Distributing Time When Studying Text: An Exploratory Cluster-Analysis Approach.

    ERIC Educational Resources Information Center

    Freebody, Peter; And Others

    1986-01-01

    Indicates that membership in pausing and skimming clusters appears to relate to text comprehension, grade level, and rated academic ability, but that these relationships are not all simple or direct. Finds that the cluster-analytic approach provides a useful empirical adjunct to current theoretical perspectives on text analysis and reading…

  10. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    ERIC Educational Resources Information Center

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  11. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    ERIC Educational Resources Information Center

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  12. On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

    ERIC Educational Resources Information Center

    Carter, Randy L.; And Others

    1989-01-01

    The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…

  13. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    ERIC Educational Resources Information Center

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  14. Quantitative Methylation Analysis of the PCDHB Gene Cluster.

    PubMed

    Banelli, Barbara; Romani, Massimo

    2015-01-01

    Long Range Epigenetic Silencing (LRES) is a repressed chromatin state of large chromosomal regions caused by DNA hypermethylation and histone modifications and is commonly observed in cancer. At 5q31 a LRES region of 800 kb includes three multi-gene clusters (PCDHA@, PCDHB@, and PCDHG@, respectively). Multiple experimental evidences have led to consider the PCDHB cluster as a DNA methylation marker of aggressiveness in neuroblastoma, second most common solid tumor in childhood. Because of its potential involvement not only in neuroblastoma but also in other malignancies, an easy and fast assay to screen the DNA methylation content of the PCDHB cluster might be useful for the precise stratification of the patients into risk groups and hence for choosing the most appropriate therapeutic protocol. Accordingly, we have developed a simple and cost-effective Pyrosequencing(®) assay to evaluate the methylation level of 17 genes in the protocadherin B cluster (PCDHB@). The rationale behind this Pyrosequencing assay can in principle be applied to analyze the DNA methylation level of any gene cluster with high homologies for screening purposes. PMID:26103900

  15. A Bayesian Analysis of the Ages of Four Open Clusters

    NASA Astrophysics Data System (ADS)

    Jeffery, Elizabeth J.; von Hippel, Ted; van Dyk, David A.; Stenning, David C.; Robinson, Elliot; Stein, Nathan; Jefferys, William H.

    2016-09-01

    In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn-off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional “by-eye” isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 ± 0.05, 1.02 ± 0.02, 1.64 ± 0.04, and 0.860 ± 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to a higher precision than that offered by these traditional methods of isochrone fitting.

  16. Robust growing neural gas algorithm with application in cluster analysis.

    PubMed

    Qin, A K; Suganthan, P N

    2004-01-01

    We propose a novel robust clustering algorithm within the Growing Neural Gas (GNG) framework, called Robust Growing Neural Gas (RGNG) network.The Matlab codes are available from . By incorporating several robust strategies, such as outlier resistant scheme, adaptive modulation of learning rates and cluster repulsion method into the traditional GNG framework, the proposed RGNG network possesses better robustness properties. The RGNG is insensitive to initialization, input sequence ordering and the presence of outliers. Furthermore, the RGNG network can automatically determine the optimal number of clusters by seeking the extreme value of the Minimum Description Length (MDL) measure during network growing process. The resulting center positions of the optimal number of clusters represented by prototype vectors are close to the actual ones irrespective of the existence of outliers. Topology relationships among these prototypes can also be established. Experimental results have shown the superior performance of our proposed method over the original GNG incorporating MDL method, called GNG-M, in static data clustering tasks on both artificial and UCI data sets. PMID:15555857

  17. Deconstruction and analysis of multiphonic clusters in the modern flute

    NASA Astrophysics Data System (ADS)

    Barravecchio, Shauna

    The modern flute has been acoustically analyzed in great detail by many, but only from the point of view of traditional playing techniques. Very little research exists to date on more modem, "extended" technique performance. This paper explores the production of multiphonic note clusters as played on the modern flute. Several clusters as notated in James Pellerite's book on flute fingerings are recorded and analyzed for frequency content. Each one is then compared to the expected frequency content based on John Backus' 1978 paper on woodwind multiphonics. Using this information, the fingering configuration of each cluster can be deconstructed and each component pitch explained in terms of the root frequencies, overtone series, and sideband frequencies.

  18. Functional clustering algorithm for the analysis of dynamic network data

    NASA Astrophysics Data System (ADS)

    Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.

    2009-05-01

    We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.

  19. Connectionist approach for clustering with applications in image analysis

    SciTech Connect

    Vinod, V.V.; Chaudhury, S.; Mukherjee, J.; Ghose, S.

    1994-03-01

    A new neural network strategy for clustering is presented. The network works on the histogram and the process is similar to mode separation. The number of clusters are autonomously detected by the network and it overcomes some major difficulties encountered by mode separation techniques. Clustering is done by first selecting the prototypes and then assigning patterns to one of the prototypes based on its distance from the prototype and the distribution of data. The network does not employ weight learning and is therefore faster than existing unsupervised learning networks. The network was applied to a wide class of problems including gray level image reduction, color segmentation and remotely sensed image segmentation. The experimental results obtained are promising. 26 refs.

  20. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    SciTech Connect

    Hu, Lin; Maroudas, Dimitrios; Hammond, Karl D.; Wirth, Brian D.

    2015-10-28

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes of helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.

  1. X-ray analysis of filaments in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Walker, S. A.; Kosec, P.; Fabian, A. C.; Sanders, J. S.

    2015-11-01

    We perform a detailed X-ray study of the filaments surrounding the brightest cluster galaxies in a sample of nearby galaxy clusters using deep Chandra observations, namely the Perseus, Centaurus and Virgo clusters, and Abell 1795. We compare the X-ray properties and spectra of the filaments in all of these systems, and find that their Chandra X-ray spectra are all broadly consistent with an absorbed two-temperature thermal model, with temperature components at 0.75 and 1.7 keV. We find that it is also possible to model the Chandra ACIS filament spectra with a charge exchange model provided a thermal component is also present, and the abundance of oxygen is suppressed relative to the abundance of Fe. In this model, charge exchange provides the dominant contribution to the spectrum in the 0.5-1.0 keV band. However, when we study the high spectral resolution RGS spectrum of the filamentary plume seen in X-rays in Centaurus, the opposite appears to be the case. The properties of the filaments in our sample of clusters are also compared to the X-ray tails of galaxies in the Coma cluster and Abell 3627. In the Perseus cluster, we search for signs of absorption by a prominent region of molecular gas in the filamentary structure around NGC 1275. We do find a decrement in the X-ray spectrum below 2 keV, indicative of absorption. However, the spectral shape is inconsistent with this decrement being caused by simply adding an additional absorbing component. We find that the spectrum can be well fit (with physically sensible parameters) with a model that includes both absorption by molecular gas and X-ray emission from the filament, which partially counteracts the absorption.

  2. MASSCLEAN - MASSive CLuster Evolution and ANalysis Package - Description and Tests

    NASA Astrophysics Data System (ADS)

    Hanson, Margaret M.; Popescu, B.

    2009-05-01

    We present MASSCLEAN, a new, sophisticated and robust stellar cluster image and photometry simulation package. This package is able to create color-magnitude diagrams and standard FITS images in any of the traditional optical and near-infrared bands based on cluster characteristics input by the user, including but not limited to distance, age, mass, radius and extinction. At the limit of very distant, unresolved clusters, we have checked the integrated colors created in MASSCLEAN against those from other simple stellar population models with consistent results. We have also tested models which provide a reasonable estimate of the field star contamination in images and color-magnitude diagrams. We demonstrate the package by simulating images and color-magnitude diagrams of well known massive Milky Way clusters and compare their appearance to real data. Because the algorithm populates the cluster with a discrete number of tenable stars, it can be used as part of a Monte Carlo Method to derive the probabilistic range of characteristics (integrated colors, for example) consistent with a given cluster mass and age. The discrete nature of our code is demonstrated in the realistic stochastic variation seen in the predicted V-K integrated colors as compared to the unrealistically smooth color from other SSP codes. Our simulation package is available to download and will run on any standard desktop running UNIX/Linux. Full documentation on installation and its use is also available. Finally, a web-based version of MASSCLEAN which can be immediately used and is sufficiently adaptable for most applications is available through a web interface.

  3. An analysis of spatial clustering and implications for wildlife management: a burrowing owl example.

    PubMed

    Fisher, Joshua B; Trulio, Lynne A; Biging, Gregory S; Chromczak, Debra

    2007-03-01

    Analysis tools that combine large spatial and temporal scales are necessary for efficient management of wildlife species, such as the burrowing owl (Athene cunicularia). We assessed the ability of Ripley's K-function analysis integrated into a geographic information system (GIS) to determine changes in burrowing owl nest clustering over two years at NASA Ames Research Center. Specifically, we used these tools to detect changes in spatial and temporal nest clustering before, during, and after conducting management by mowing to maintain low vegetation height at nest burrows. We found that the scale and timing of owl nest clustering matched the scale and timing of our conservation management actions over a short time frame. While this study could not determine a causal link between mowing and nest clustering, we did find that Ripley's K and GIS were effective in detecting owl nest clustering and show promise for future conservation uses. PMID:17253092

  4. An Analysis of Spatial Clustering and Implications for Wildlife Management: A Burrowing Owl Example

    NASA Astrophysics Data System (ADS)

    Fisher, Joshua B.; Trulio, Lynne A.; Biging, Gregory S.; Chromczak, Debra

    2007-03-01

    Analysis tools that combine large spatial and temporal scales are necessary for efficient management of wildlife species, such as the burrowing owl ( Athene cunicularia). We assessed the ability of Ripley’s K-function analysis integrated into a geographic information system (GIS) to determine changes in burrowing owl nest clustering over two years at NASA Ames Research Center. Specifically, we used these tools to detect changes in spatial and temporal nest clustering before, during, and after conducting management by mowing to maintain low vegetation height at nest burrows. We found that the scale and timing of owl nest clustering matched the scale and timing of our conservation management actions over a short time frame. While this study could not determine a causal link between mowing and nest clustering, we did find that Ripley’s K and GIS were effective in detecting owl nest clustering and show promise for future conservation uses.

  5. An analysis of spatial clustering and implications for wildlife management: a burrowing owl example.

    PubMed

    Fisher, Joshua B; Trulio, Lynne A; Biging, Gregory S; Chromczak, Debra

    2007-03-01

    Analysis tools that combine large spatial and temporal scales are necessary for efficient management of wildlife species, such as the burrowing owl (Athene cunicularia). We assessed the ability of Ripley's K-function analysis integrated into a geographic information system (GIS) to determine changes in burrowing owl nest clustering over two years at NASA Ames Research Center. Specifically, we used these tools to detect changes in spatial and temporal nest clustering before, during, and after conducting management by mowing to maintain low vegetation height at nest burrows. We found that the scale and timing of owl nest clustering matched the scale and timing of our conservation management actions over a short time frame. While this study could not determine a causal link between mowing and nest clustering, we did find that Ripley's K and GIS were effective in detecting owl nest clustering and show promise for future conservation uses.

  6. An Empirical Comparison of Variable Standardization Methods in Cluster Analysis.

    ERIC Educational Resources Information Center

    Schaffer, Catherine M.; Green, Paul E.

    1996-01-01

    The common marketing research practice of standardizing the columns of a persons-by-variables data matrix prior to clustering the entities corresponding to the rows was evaluated with 10 large-scale data sets. Results indicate that the column standardization practice may be problematic for some kinds of data that marketing researchers used for…

  7. Functional Analysis of a Mosquito Short Chain Dehydrogenase Cluster

    PubMed Central

    Mayoral, Jaime G.; Leonard, Kate T.; Defelipe, Lucas A.; Turjansksi, Adrian G.; Nouzova, Marcela; Noriegal, Fernando G.

    2013-01-01

    The short chain dehydrogenases (SDR) constitute one the oldest and largest families of enzymes with over 46,000 members in sequence databases. About 25% of all known dehydrogenases belong to the SDR family. SDR enzymes have critical roles in lipid, amino acid, carbohydrate, hormone and xenobiotic metabolism as well as in redox sensor mechanisms. This family is present in archaea, bacteria, and eukaryota, emphasizing their versatility and fundamental importance for metabolic processes. We identified a cluster of eight SDRs in the mosquito Aedes aegypti (AaSDRs). Members of the cluster differ in tissue specificity and developmental expression. Heterologous expression produced recombinant proteins that had diverse substrate specificities, but distinct from the conventional insect alcohol (ethanol) dehydrogenases. They are all NADP+-dependent and they have S-enantioselectivity and preference for secondary alcohols with 8–15 carbons. Homology modeling was used to build the structure of AaSDR1 and two additional cluster members. The computational study helped explain the selectivity towards the (10S)-isomers as well as the reduced activity of AaSDR4 and AaSDR9 for longer isoprenoid substrates. Similar clusters of SDRs are present in other species of insects, suggesting similar selection mechanisms causing duplication and diversification of this family of enzymes. PMID:23238893

  8. Representation in GIS of the Results Obtained by Cluster Analysis in Territorial Profile

    NASA Astrophysics Data System (ADS)

    Dârdalą, Marian; Furtuną, Titus Felix; Reveiu, Adriana

    2010-05-01

    Cluster analysis involves grouping characteristics analyzed by the values of grouping parameters. The statistical cluster analysis uses the method of minimum dispersion of hierarchical tree method, in order to obtain the information necessary to group the administrative units. Territorial profile economic analyses can use the cluster analysis in order to make hierarchical classifications, according to performance, strategies. The hierarchical tree methods consist in identifying certain hierarchies used to take into consideration the units. According to their organization mode, clusters can be: vertically integrated, horizontally integrated, emerging clusters. With GIS, spatial data clustering can be applied to spatial data to represent the territorial analysis performed. In terms of viewing the results of cluster analysis by GIS, a usual way is to generate cartograms. In this case, a cartogram supposes defining a colors ramp, having a number of colors equal with the number of groups that divide the collectivity. The parameters used as the basis of the clustering process may exist as independent data or can be stored in the database of an informatic system. As a case study we implemented an ArcMap extension to analyze the clusters by selecting the grouping parameters and by setting the number of groups that will divide the collectivity. Cartograma can be defined taking into consideration multi-level administrative division of the territory. For example, Romania uses the split on villages, counties, regions and macro-regions. Analysis can be applied on different levels of administrative organization by aggregating the values of parameters. For example, the value of a parameter for a county can be obtained by aggregating all parameter values, for all villages, belonging to the county.

  9. Dynamical analysis of the cluster pair: A3407 + A3408

    NASA Astrophysics Data System (ADS)

    Nascimento, R. S.; Ribeiro, A. L. B.; Trevisan, M.; Carrasco, E. R.; Plana, H.; Dupke, R.

    2016-08-01

    We carried out a dynamical study of the galaxy cluster pair A3407 and A3408 based on a spectroscopic survey obtained with the 4 metre Blanco telescope at the Cerro Tololo Interamerican Observatory, plus 6dF data, and ROSAT All-Sky Survey. The sample consists of 122 member galaxies brighter than mR = 20. Our main goal is to probe the galaxy dynamics in this field and verify if the sample constitutes a single galaxy system or corresponds to an ongoing merging process. Statistical tests were applied to clusters members showing that both the composite system A3407 + A3408 as well as each individual cluster have Gaussian velocity distribution. A velocity gradient of ˜847 ± 114 km s- 1 was identified around the principal axis of the projected distribution of galaxies, indicating that the global field may be rotating. Applying the KMM algorithm to the distribution of galaxies, we found that the solution with two clusters is better than the single unit solution at the 99 per cent cl. This is consistent with the X-ray distribution around this field, which shows no common X-ray halo involving A3407 and A3408. We also estimated virial masses and applied a two-body model to probe the dynamics of the pair. The more likely scenario is that in which the pair is gravitationally bound and probably experiences a collapse phase, with the cluster cores crossing in less than ˜1 h-1 Gyr, a pre-merger scenario. The complex X-ray morphology, the gas temperature, and some signs of galaxy evolution in A3408 suggest a post-merger scenario, with cores having crossed each other ˜1.65 h-1 Gyr ago, as an alternative solution.

  10. Galaxy cluster mass estimation from stacked spectroscopic analysis

    NASA Astrophysics Data System (ADS)

    Farahi, Arya; Evrard, August E.; Rozo, Eduardo; Rykoff, Eli S.; Wechsler, Risa H.

    2016-08-01

    We use simulated galaxy surveys to study: (i) how galaxy membership in redMaPPer clusters maps to the underlying halo population, and (ii) the accuracy of a mean dynamical cluster mass, Mσ(λ), derived from stacked pairwise spectroscopy of clusters with richness λ. Using ˜130 000 galaxy pairs patterned after the Sloan Digital Sky Survey (SDSS) redMaPPer cluster sample study of Rozo et al., we show that the pairwise velocity probability density function of central-satellite pairs with mi < 19 in the simulation matches the form seen in Rozo et al. Through joint membership matching, we deconstruct the main Gaussian velocity component into its halo contributions, finding that the top-ranked halo contributes ˜60 per cent of the stacked signal. The halo mass scale inferred by applying the virial scaling of Evrard et al. to the velocity normalization matches, to within a few per cent, the log-mean halo mass derived through galaxy membership matching. We apply this approach, along with miscentring and galaxy velocity bias corrections, to estimate the log-mean matched halo mass at z = 0.2 of SDSS redMaPPer clusters. Employing the velocity bias constraints of Guo et al., we find = ln (M30) + αm ln (λ/30) with M30 = 1.56 ± 0.35 × 1014 M⊙ and αm = 1.31 ± 0.06stat ± 0.13sys. Systematic uncertainty in the velocity bias of satellite galaxies overwhelmingly dominates the error budget.

  11. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques

    2008-07-01

    The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.

  12. Efficient clustering aggregation based on data fragments.

    PubMed

    Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing

    2012-06-01

    Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy. PMID:22334025

  13. Groundwater source contamination mechanisms: physicochemical profile clustering, risk factor analysis and multivariate modelling.

    PubMed

    Hynds, Paul; Misstear, Bruce D; Gill, Laurence W; Murphy, Heather M

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p=0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p<0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.

  14. A Unified Framework for Clustering and Quantitative Analysis of White Matter Fiber Tracts

    PubMed Central

    Maddah, Mahnaz; Grimson, W. Eric L.; Warfield, Simon K.; Wells, William M.

    2008-01-01

    We present a novel approach for joint clustering and point-by-point mapping of white matter fiber pathways. Knowledge of the point correspondence along the fiber pathways is not only necessary for accurate clustering of the trajectories into fiber bundles, but also crucial for any tract-oriented quantitative analysis. We employ an expectation-maximization (EM) algorithm to cluster the trajectories in a Gamma mixture model context. The result of clustering is the probabilistic assignment of the fiber trajectories to each cluster, an estimate of the cluster parameters, i.e. spatial mean and variance, and point correspondences. The fiber bundles are modeled by the mean trajectory and its spatial variation. Point-by-point correspondence of the trajectories within a bundle is obtained by constructing a distance map and a label map from each cluster center at every iteration of the EM algorithm. This offers a time-efficient alternative to pairwise curve matching of all trajectories with respect to each cluster center. The proposed method has the potential to benefit from an anatomical atlas of fiber tracts by incorporating it as prior information in the EM algorithm. The algorithm is also capable of handling outliers in a principled way. The presented results confirm the efficiency and effectiveness of the proposed framework for quantitative analysis of diffusion tensor MRI. PMID:18180197

  15. Analysis of Cluster spacecraft potential during active control

    NASA Astrophysics Data System (ADS)

    Torkar, K.; Fehringer, M.; Escoubet, C. P.; André, M.; Pedersen, A.; Svenes, K. R.; Décréau, P. M. E.

    The floating potential of a spacecraft is determined by an equilibrium between photo-electron emission from the sunlit spacecraft surfaces and the plasma electron current, while other currents play a secondary role. On the Cluster spacecraft, the presence of the experiment ASPOC to control the potential by an ion beam with currents up to several tens of microamperes and energies of several keV provides an opportunity to study the interaction between the spacecraft and the ambient plasma with the current of the artificial ion beam as an additional parameter. The effect of active control on the Cluster spacecraft potential in the various plasma environments is presented in an overall statistics. Changes of the potential resulting from switching the ion beam current to different levels serve to calibrate the density-potential relationship.

  16. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    SciTech Connect

    Harmon, S; Wendelberger, B; Jeraj, R

    2014-06-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms

  17. Insights into quasar UV spectra using unsupervised clustering analysis

    NASA Astrophysics Data System (ADS)

    Tammour, A.; Gallagher, S. C.; Daley, M.; Richards, G. T.

    2016-06-01

    Machine learning techniques can provide powerful tools to detect patterns in multidimensional parameter space. We use K-means - a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabelled data - to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey (SDSS-DR10) of Paris et al. Detecting patterns in large data sets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 Å line, the C IV 1549 Å line, and the C III] 1908 Å blend in samples of broad absorption line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing spectra energy distribution (SED) probed by the strength of He II and the Si III]/C III] ratio. We find this to be particularly evident when the properties of C III] are used to find the clusters, while those of Mg II proved to be less strongly correlated with the properties of the other lines in the spectra such as the width of C IV or the Si III]/C III] ratio. We conclude that unsupervised clustering methods (such as K-means) are powerful methods for finding `natural' binning boundaries in multidimensional data sets and discuss caveats and future work.

  18. Photometric analysis of Galactic Stellar Clusters in VVV Survey

    NASA Astrophysics Data System (ADS)

    Mauro, F.; Moni Bidin, C.; Cohen, R. E.; Geisler, D.; Villanova, S.; Chené, A. N.

    2014-10-01

    We show the preliminary results of the study of the structure of the Horizontal Branch of Liller 1 and some results from the Calcium Triplet method using Ks magnitude applied to several Galactic Globular clusters using data from the VISTA Variables in the Via Lactea Survey (Minniti et al. 2010) and obtained with GeMS/GSAOI. The data are extracted with the new automatic VVV-SkZ_pipeline photometric pipeline (Mauro et al. 2013).

  19. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    SciTech Connect

    Data Analysis and Visualization and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  20. Analysis of Cluster spacecraft potential during active control

    NASA Astrophysics Data System (ADS)

    Torkar, K.; Fehringer, M.; Escoubet, C.; Andre, M.; Pedersen, A.; Svenes, K.; Décréau, P.

    The floating potential of a spacecraft is determined by an equilibrium between photo-electron emission from the sunlit spacecraft surfaces, plasma electron current, and secondary effects. Without spacecraft potential control, the result largely reflects the density and temperature of the ambient plasma. On the Cluster spacecraft, the presence of the experiment ASPOC to control the potential by an ion beam with currents up to several tens of microamperes and energies of several keV provides an opportunity to study the interaction between the spacecraft and the ambient plasma with the current of the artificial ion beam as an additional parameter. Changes of the potential resulting from switching the ion beam current to different levels serve to calibrate the density-potential relationship. Wave data are used to obtain independent information on plasma density. The measurements onboard Cluster are compared with models and data from other spacecraft. After describing the principle of the interaction and showing some events out of the first 1.5 years of operation, an overall statistic is presented, describing the effect of active control on the Cluster spacecraft potential in the various plasma environments.

  1. Variability in body size and shape of UK offshore workers: A cluster analysis approach.

    PubMed

    Stewart, Arthur; Ledingham, Robert; Williams, Hector

    2017-01-01

    Male UK offshore workers have enlarged dimensions compared with UK norms and knowledge of specific sizes and shapes typifying their physiques will assist a range of functions related to health and ergonomics. A representative sample of the UK offshore workforce (n = 588) underwent 3D photonic scanning, from which 19 extracted dimensional measures were used in k-means cluster analysis to characterise physique groups. Of the 11 resulting clusters four somatotype groups were expressed: one cluster was muscular and lean, four had greater muscularity than adiposity, three had equal adiposity and muscularity and three had greater adiposity than muscularity. Some clusters appeared constitutionally similar to others, differing only in absolute size. These cluster centroids represent an evidence-base for future designs in apparel and other applications where body size and proportions affect functional performance. They also constitute phenotypic evidence providing insight into the 'offshore culture' which may underpin the enlarged dimensions of offshore workers. PMID:27633221

  2. How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Hou, Huei-Tse

    2013-01-01

    The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…

  3. Abundance analysis of an extended sample of open clusters: A search for chemical inhomogeneities

    NASA Astrophysics Data System (ADS)

    Reddy, Arumalla B. S.; Giridhar, Sunetra; Lambert, David L.

    We have initiated a program to explore the presence of chemical inhomogeneities in the Galactic disk using the open clusters as ideal probes. We have analyzed high-dispersion echelle spectra (R ≥ 55,000) of red giant members for eleven open clusters to derive abundances for many elements. The membership to the cluster has been confirmed through their radial velocities and proper motions. The spread in temperatures and gravities being very small among the red giants, nearly the same stellar lines were employed thereby reducing the random errors. The errors of average abundance for the cluster were generally in 0.02 to 0.07 dex range. Our present sample covers galactocentric distances of 8.3 to 11.3 kpc and an age range of 0.2 to 4.3 Gyrs. Our earlier analysis of four open clusters (Reddy A.B.S. et al., 2012, MNRAS, 419,1350) indicate that abundances relative to Fe for elements from Na to Eu are equal within measurement uncertainties to published abundances for thin disk giants in the field. This supports the view that field stars come from disrupted open clusters. In the enlarged sample of eleven open clusters we find cluster to cluster abundance variations for some s- and r- process elements, with certain elements such as Zr and Ba showing large variation. These differences mark the signatures that these clusters had formed under different environmental conditions (Type II SN, Type Ia SN, AGB stars or a mixture of any of these) unique to the time and site of formation. These eleven clusters support the widely held impression that there is an abundance gradient such that the metallicity [Fe/H] at the solar galactocentric distance decreases outwards at about -0.1 dex per kpc.

  4. The CERN analysis facility—a PROOF cluster for day-one physics analysis

    NASA Astrophysics Data System (ADS)

    G-Oetringhaus, J. F.

    2008-07-01

    ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - CERN Analysis Facility) for analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus require a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as calibration and alignment. Furthermore, the use of an interactive system with very fast response will allow ALICE to extract physics observables out of first data quickly. An additional use case is fast event simulation and reconstruction. A test setup consisting of 40 machines is used for evaluation since May 2006. The PROOF system enables the parallel processing and xrootd the access to files distributed on the test cluster. An automatic staging system for files either catalogued in the ALICE file catalog or stored in the CASTOR mass storage system has been developed. The current setup and ongoing development towards disk quotas and CPU fairshare are described. Furthermore, the integration of PROOF into ALICE's software framework (AliRoot) is discussed.

  5. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

    NASA Astrophysics Data System (ADS)

    Žibert, Janez; Pražnikar, Jure

    2012-09-01

    The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of

  6. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    PubMed Central

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  7. Weighing the Giants - I. Weak-lensing masses for 51 massive galaxy clusters: project overview, data analysis methods and cluster images

    NASA Astrophysics Data System (ADS)

    von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

    2014-03-01

    This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic

  8. Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

    PubMed Central

    Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

    2015-01-01

    The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694

  9. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    PubMed

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups. PMID:26020225

  10. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    PubMed

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.

  11. The application of cluster analysis in the intercomparison of loop structures in RNA.

    PubMed

    Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

    2005-04-01

    We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.

  12. Analysis of cluster explosive synchronization in complex networks

    NASA Astrophysics Data System (ADS)

    Ji, Peng; Peron, Thomas K. DM.; Rodrigues, Francisco A.; Kurths, Jürgen

    2014-12-01

    Correlations between intrinsic dynamics and local topology have become a new trend in the study of synchronization in complex networks. In this paper, we investigate the influence of topology on the dynamics of networks made up of second-order Kuramoto oscillators. In particular, based on mean-field calculations, we provide a detailed investigation of cluster explosive synchronization (CES) [Phys. Rev. Lett. 110, 218701 (2013), 10.1103/PhysRevLett.110.218701] in scale-free networks as a function of several topological properties. Moreover, we investigate the robustness of discontinuous transitions by including an additional quenched disorder, and we show that the phase coherence decreases with increasing strength of the quenched disorder. These results complement the previous findings regarding CES and also fundamentally deepen the understanding of the interplay between topology and dynamics under the constraint of correlating natural frequencies and local structure.

  13. A landscape-based cluster analysis using recursive search instead of a threshold parameter.

    PubMed

    Gladwin, Thomas E; Vink, Matthijs; Mars, Roger B

    2016-01-01

    Cluster-based analysis methods in neuroimaging provide control of whole-brain false positive rates without the need to conservatively correct for the number of voxels and the associated false negative results. The current method defines clusters based purely on shapes in the landscape of activation, instead of requiring the choice of a statistical threshold that may strongly affect results. Statistical significance is determined using permutation testing, combining both size and height of activation. A method is proposed for dealing with relatively small local peaks. Simulations confirm the method controls the false positive rate and correctly identifies regions of activation. The method is also illustrated using real data. •A landscape-based method to define clusters in neuroimaging data avoids the need to pre-specify a threshold to define clusters.•The implementation of the method works as expected, based on simulated and real data.•The recursive method used for defining clusters, the method used for combining clusters, and the definition of the "value" of a cluster may be of interest for future variations.

  14. Cluster analysis for the probability of DSB site induced by electron tracks

    NASA Astrophysics Data System (ADS)

    Yoshii, Y.; Sasaki, K.; Matsuya, Y.; Date, H.

    2015-05-01

    To clarify the influence of bio-cells exposed to ionizing radiations, the densely populated pattern of the ionization in the cell nucleus is of importance because it governs the extent of DNA damage which may lead to cell lethality. In this study, we have conducted a cluster analysis of ionization and excitation events to estimate the number of double-strand breaks (DSBs) induced by electron tracks. A Monte Carlo simulation for electrons in liquid water was performed to determine the spatial location of the ionization and excitation events. The events were divided into clusters by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. The algorithm enables us to sort out the events into the groups (clusters) in which a minimum number of neighboring events are contained within a given radius. For evaluating the number of DSBs in the extracted clusters, we have introduced an aggregation index (AI). The computational results show that a sub-keV electron produces DSBs in a dense formation more effectively than higher energy electrons. The root-mean square radius (RMSR) of the cluster size is below 5 nm, which is smaller than the chromatin fiber thickness. It was found that this size of clustering events has a high possibility to cause lesions in DNA within the chromatin fiber site.

  15. A landscape-based cluster analysis using recursive search instead of a threshold parameter.

    PubMed

    Gladwin, Thomas E; Vink, Matthijs; Mars, Roger B

    2016-01-01

    Cluster-based analysis methods in neuroimaging provide control of whole-brain false positive rates without the need to conservatively correct for the number of voxels and the associated false negative results. The current method defines clusters based purely on shapes in the landscape of activation, instead of requiring the choice of a statistical threshold that may strongly affect results. Statistical significance is determined using permutation testing, combining both size and height of activation. A method is proposed for dealing with relatively small local peaks. Simulations confirm the method controls the false positive rate and correctly identifies regions of activation. The method is also illustrated using real data. •A landscape-based method to define clusters in neuroimaging data avoids the need to pre-specify a threshold to define clusters.•The implementation of the method works as expected, based on simulated and real data.•The recursive method used for defining clusters, the method used for combining clusters, and the definition of the "value" of a cluster may be of interest for future variations. PMID:27489780

  16. Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Home Health Aide. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lake County Area Vocational Center, Grayslake, IL.

    This document contains a task analysis for health occupations (home health aid) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

  17. Task Analysis for Health Occupations. Cluster: Dental Assisting. Occupation: Dental Assistant. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lathrop, Janice

    This document contains a task analysis for health occupations (dental assistant) in the dental assisting cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing…

  18. Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

    ERIC Educational Resources Information Center

    Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

    2015-01-01

    This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…

  19. Prediction of lithology types at the Hanford 300 Area using a clustering analysis

    NASA Astrophysics Data System (ADS)

    Thai, J.; Rockhold, M. L.; Vermeul, V.; Johnson, T. E.; Zachara, J. M.; Rubin, Y.

    2011-12-01

    The purpose of this study is to find an optimal method for mapping the three-dimensional distribution of lithology at the Hanford IFRC site 300 Area based on surrogate measurements. We considered 6 types of measurements for this analysis: gamma ray, concentration of U-238 (609), K-40, U-238 (1764), Th-232, and the hydraulic conductivity. To decide which combinations of variables are best suited for determining lithology type, we trained our classification method using training sets that included several wells with lithological information. A clustering analysis was applied to each training set and the lithology types for each cluster of the training set were fitted with a probability distribution function. The lithology type at each point in the testing set was selected to be the one linked with the mode of the distribution at the corresponding cluster. The predictions were then checked against the data of the testing set. This process was applied repeatedly using different numbers of clusters. In addition, many different configurations of training sets and testing sets were used to establish confidence in the predictive ability of the clustering and classification methods. Our best success rates as measured by matching predictions with observations were obtained for 2 or 3 clusters, and the following measurements: concentration of U-238 (609), K-40, U-238 (1764), and Th-232, and were consistently around 80%.

  20. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

    PubMed Central

    Cimermancic, Peter; Medema, Marnix H.; Claesen, Jan; Kurita, Kenji; Wieland Brown, Laura C.; Mavrommatis, Konstantinos; Pati, Amrita; Godfrey, Paul A.; Koehrsen, Michael; Clardy, Jon; Birren, Bruce W.; Takano, Eriko; Sali, Andrej; Linington, Roger G.; Fischbach, Michael A.

    2014-01-01

    Summary Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the predicted BGCs revealed large gene cluster families, the vast majority uncharacterized. We experimentally characterized the most prominent family, consisting of two subfamilies of hundreds of BGCs distributed throughout the Proteobacteria; their products are aryl polyenes, lipids with an aryl head group conjugated to a polyene tail. We identified a distant relationship to a third subfamily of aryl polyene BGCs, and together the three subfamilies represent the largest known family of biosynthetic gene clusters, with more than 1,000 members. Although these clusters are widely divergent in sequence, their small molecule products are remarkably conserved, indicating for the first time the important roles these compounds play in Gram-negative cell biology. PMID:25036635

  1. 3D Plasma Clusters: Analysis of dynamical evolution and individual particle interaction

    SciTech Connect

    Antonova, T.; Thomas, H. M.; Morfill, G. E.; Annaratone, B. M.

    2008-09-07

    3D plasma clusters (up to 100 particles) have been built inside small (32 mm{sup 3}) plasma volume in gravity. It has been estimated that the external confinement has a negligible influence on the processes inside the clusters. At such conditions the analysis of dynamical evolution and individual particle interactions have shown that the binary interaction among particles in addition to the repelling Coulomb force exhibits also an attractive part. The tendency of the systems to approach the state with minimum energy by rearranging particles inside has been detected. The measured 63 particles' cluster vibrations are in close agreement with vibrations of a drop with surface tension. This indicates that even a 63 particle cluster already exhibits properties normally associated with the cooperative regime.

  2. 3D Plasma Clusters: Analysis of dynamical evolution and individual particle interaction

    NASA Astrophysics Data System (ADS)

    Antonova, T.; Annaratone, B. M.; Thomas, H. M.; Morfill, G. E.

    2008-09-01

    3D plasma clusters (up to 100 particles) have been built inside small (32 mm3) plasma volume in gravity. It has been estimated that the external confinement has a negligible influence on the processes inside the clusters. At such conditions the analysis of dynamical evolution and individual particle interactions have shown that the binary interaction among particles in addition to the repelling Coulomb force exhibits also an attractive part. The tendency of the systems to approach the state with minimum energy by rearranging particles inside has been detected. The measured 63 particles' cluster vibrations are in close agreement with vibrations of a drop with surface tension. This indicates that even a 63 particle cluster already exhibits properties normally associated with the cooperative regime.

  3. Functional Interference Clusters in Cancer Patients With Bone Metastases: A Secondary Analysis of RTOG 9714

    SciTech Connect

    Chow, Edward; James, Jennifer; Barsevick, Andrea; Hartsell, William; Ratcliffe, Sarah; Scarantino, Charles; Ivker, Robert; Roach, Mack; Suh, John; Petersen, Ivy; Konski, Andre; Demas, William; Bruner, Deborah

    2010-04-15

    Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up. Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.

  4. Cluster Analysis of Longidorus Species (Nematoda: Longidoridae), a New Approach in Species Identification

    PubMed Central

    Ye, Weimin; Robbins, R. T.

    2004-01-01

    Hierarchical cluster analysis based on female morphometric character means including body length, distance from vulva opening to anterior end, head width, odontostyle length, esophagus length, body width, tail length, and tail width were used to examine the morphometric relationships and create dendrograms for (i) 62 populations belonging to 9 Longidorus species from Arkansas, (ii) 137 published Longidorus species, and (iii) 137 published Longidorus species plus 86 populations of 16 Longidorus species from Arkansas and various other locations by using JMP 4.02 software (SAS Institute, Cary, NC). Cluster analysis dendograms visually illustrated the grouping and morphometric relationships of the species and populations. It provided a computerized statistical approach to assist by helping to identify and distinguish species, by indicating morphometric relationships among species, and by assisting with new species diagnosis. The preliminary species identification can be accomplished by running cluster analysis for unknown species together with the data matrix of known published Longidorus species. PMID:19262809

  5. Fuzzy C-means clustering for chromatographic fingerprints analysis: A gas chromatography-mass spectrometry case study.

    PubMed

    Parastar, Hadi; Bazrafshan, Alisina

    2016-03-18

    Fuzzy C-means clustering (FCM) is proposed as a promising method for the clustering of chromatographic fingerprints of complex samples, such as essential oils. As an example, secondary metabolites of 14 citrus leaves samples are extracted and analyzed by gas chromatography-mass spectrometry (GC-MS). The obtained chromatographic fingerprints are divided to desired number of chromatographic regions. Owing to the fact that chromatographic problems, such as elution time shift and peak overlap can significantly affect the clustering results, therefore, each chromatographic region is analyzed using multivariate curve resolution-alternating least squares (MCR-ALS) to address these problems. Then, the resolved elution profiles are used to make a new data matrix based on peak areas of pure components to cluster by FCM. The FCM clustering parameters (i.e., fuzziness coefficient and number of cluster) are optimized by two different methods of partial least squares (PLS) as a conventional method and minimization of FCM objective function as our new idea. The results showed that minimization of FCM objective function is an easier and better way to optimize FCM clustering parameters. Then, the optimized FCM clustering algorithm is used to cluster samples and variables to figure out the similarities and dissimilarities among samples and to find discriminant secondary metabolites in each cluster (chemotype). Finally, the FCM clustering results are compared with those of principal component analysis (PCA), hierarchical cluster analysis (HCA) and Kohonon maps. The results confirmed the outperformance of FCM over the frequently used clustering algorithms.

  6. Analysis of cardiac tissue by gold cluster ion bombardment

    NASA Astrophysics Data System (ADS)

    Aranyosiova, M.; Chorvatova, A.; Chorvat, D.; Biro, Cs.; Velic, D.

    2006-07-01

    Specific molecules in cardiac tissue of spontaneously hypertensive rats are studied by using time-of-flight secondary ion mass spectrometry (TOF-SIMS). The investigation determines phospholipids, cholesterol, fatty acids and their fragments in the cardiac tissue, with special focus on cardiolipin. Cardiolipin is a unique phospholipid typical for cardiomyocyte mitochondrial membrane and its decrease is involved in pathologic conditions. In the positive polarity, the fragments of phosphatydilcholine are observed in the mass region of 700-850 u. Peaks over mass 1400 u correspond to intact and cationized molecules of cardiolipin. In animal tissue, cardiolipin contains of almost exclusively 18 carbon fatty acids, mostly linoleic acid. Linoleic acid at 279 u, other fatty acids, and phosphatidylglycerol fragments, as precursors of cardiolipin synthesis, are identified in the negative polarity. These data demonstrate that SIMS technique along with Au 3+ cluster primary ion beam is a good tool for detection of higher mass biomolecules providing approximately 10 times higher yield in comparison with Au +.

  7. Differentiating Procrastinators from Each Other: A Cluster Analysis.

    PubMed

    Rozental, Alexander; Forsell, Erik; Svensson, Andreas; Forsström, David; Andersson, Gerhard; Carlbring, Per

    2015-01-01

    Procrastination refers to the tendency to postpone the initiation and completion of a given course of action. Approximately one-fifth of the adult population and half of the student population perceive themselves as being severe and chronic procrastinators. Albeit not a psychiatric diagnosis, procrastination has been shown to be associated with increased stress and anxiety, exacerbation of illness, and poorer performance in school and work. However, despite being severely debilitating, little is known about the population of procrastinators in terms of possible subgroups, and previous research has mainly investigated procrastination among university students. The current study examined data from a screening process recruiting participants to a randomized controlled trial of Internet-based cognitive behavior therapy for procrastination (Rozental et al., in press). In total, 710 treatment-seeking individuals completed self-report measures of procrastination, depression, anxiety, and quality of life. The results suggest that there might exist five separate subgroups, or clusters, of procrastinators: "Mild procrastinators" (24.93%), "Average procrastinators" (27.89%), "Well-adjusted procrastinators" (13.94%), "Severe procrastinators" (21.69%), and "Primarily depressed" (11.55%). Hence, there seems to be marked differences among procrastinators in terms of levels of severity, as well as a possible subgroup for which procrastinatory problems are primarily related to depression. Tailoring the treatment interventions to the specific procrastination profile of the individual could thus become important, as well as screening for comorbid psychiatric diagnoses in order to target difficulties associated with, for instance, depression. PMID:26178164

  8. Differentiating Procrastinators from Each Other: A Cluster Analysis.

    PubMed

    Rozental, Alexander; Forsell, Erik; Svensson, Andreas; Forsström, David; Andersson, Gerhard; Carlbring, Per

    2015-01-01

    Procrastination refers to the tendency to postpone the initiation and completion of a given course of action. Approximately one-fifth of the adult population and half of the student population perceive themselves as being severe and chronic procrastinators. Albeit not a psychiatric diagnosis, procrastination has been shown to be associated with increased stress and anxiety, exacerbation of illness, and poorer performance in school and work. However, despite being severely debilitating, little is known about the population of procrastinators in terms of possible subgroups, and previous research has mainly investigated procrastination among university students. The current study examined data from a screening process recruiting participants to a randomized controlled trial of Internet-based cognitive behavior therapy for procrastination (Rozental et al., in press). In total, 710 treatment-seeking individuals completed self-report measures of procrastination, depression, anxiety, and quality of life. The results suggest that there might exist five separate subgroups, or clusters, of procrastinators: "Mild procrastinators" (24.93%), "Average procrastinators" (27.89%), "Well-adjusted procrastinators" (13.94%), "Severe procrastinators" (21.69%), and "Primarily depressed" (11.55%). Hence, there seems to be marked differences among procrastinators in terms of levels of severity, as well as a possible subgroup for which procrastinatory problems are primarily related to depression. Tailoring the treatment interventions to the specific procrastination profile of the individual could thus become important, as well as screening for comorbid psychiatric diagnoses in order to target difficulties associated with, for instance, depression.

  9. Weak lensing analysis of the galaxy cluster RXJ1117.4+0743 ([VMF98]097)

    NASA Astrophysics Data System (ADS)

    Gonzalez, E. J.; Domínguez, M.; García Lambas, D.; Moreschi, O.; Foex, G.; Nilo Castellon, J. L.; Alonso, M. V.

    We present a weak lensing analysis of the galaxy cluster RXJ1117.4+0743 ([VMF98]097) at ; based on data collected with Gemini South Telescope. The cluster was formerly analyzed by Carrasco et al. (2007; ApJ; 664; 777); and they found a large discrepancy between the mass estimated from X-ray observations and lensing estimates; exceeding the lensing mass by more than a factor three. Our result for the mass from the weak lensing analysis is lower than the mass obtained by Carrasco et al. and closer to the X-ray mass.

  10. Cluster analysis and relative relocation of mining-induced seismicity using HAMNET data

    NASA Astrophysics Data System (ADS)

    Wehling-Benatelli, S.; Becker, D.; Bischoff, M.; Friederich, W.; Meier, T.

    2012-04-01

    Longwall mining activity in the Ruhr-coal mining district leads to mining-induced seismicity. For detailed studies seismicity of the single longwall panel S 109 beneath Hamm-Herringen in the eastern Ruhr area was monitored between June 2006 and July 2007. More than 7000 seismic events with magnitudes -1.7 ≤ ML ≤ 2.0 are localized in this period. 70% of the events occur in the vicinity of the moving longwall face. Moreover, the seismicity pattern shows spatial clustering of events in distances up to 500 m from the panel which is related to remnant pillars of old workings and tectonic features. Two sources with common location and rock failure mechanism are expected to show identical waveforms. Hence, similar waveforms suggest similarity of source properties. Waveform similarity can be quantified by cross-correlation. Similarity matrices have been established and build the basis of a cluster analysis presented here. We compare two approaches for cluster definition: a single-linkage approach and excerpting clusters by visual inspection of the sorted similarity matrices. Clusters are found as areas of high inter-event similarity in the depicted matrix. In contrast, the single-linkage approach assigns an event to the cluster if the similarity threshold v sl = 0.9 is exceeded to at least one other member. This method is more restrictive and, in general, leads to clusters with less members than visual inspection. Both methods exhibit clusters which show the same properties. The largest clusters are built by low-magnitude events (around ML ≈-0.6) directly at the longwall face at the mining level. Other clusters include events with magnitudes as large as ML,max = 1.8. Their locations tend to lie above or below the mining level in load-bearing sandstone layers. Mining accompanying events show face-parallel near vertical fault planes whereas more distant clusters have typical solutions of remnant pillar failure with a medium dip angle. Relative relocation of the events

  11. Earthquake Cluster Analysis for Turkey and its Application for Seismic Hazard Assessment

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2015-04-01

    Earthquake clusters are an important element in general seismology and also for the application in seismic hazard assessment. In probabilistic seismic hazard assessment, the occurrence of earthquakes is often linked to an independent Monte Carlo process, following a stationary Poisson model. But earthquakes are dependent and constrained, especially in terms of earthquake swarms, fore- and aftershocks or even larger sequences as observed for the Landers sequence in California or the Darfield-Christchurch sequence in New Zealand. For earthquake catalogues, the element of declustering is an important step to capture earthquake frequencies by avoiding a bias towards small magnitudes due to aftershocks. On the other hand, declustered catalogues for independent probabilistic seismic activity will underestimate the total number of earthquakes by neglecting dependent seismicity. In this study, the effect of clusters on probabilistic seismic hazard assessment is investigated in detail. To capture the features of earthquake clusters, a uniform framework for earthquake cluster analysis is introduced using methodologies of geostatistics and machine learning. These features represent important cluster characteristics like cluster b-values, temporal decay, rupture orientations and many more. Cluster parameters are mapped in space using kriging. Furthermore, a detailed data analysis is undertaken to provide magnitude-dependent relations for various cluster parameters. The acquired features are used to introduce dependent seismicity within stochastic earthquake catalogues. In addition, the development of smooth seismicity maps based on historic databases is in general biased to the more complete recent decades. A filling methodology is introduced which will add dependent seismicity in catalogues where none has been recorded to avoid the above mentioned bias. As a case study, Turkey has been chosen due to its inherent seismic activity and well-recorded data coverage. Clustering

  12. Comprehensive Behavioral Analysis of Cluster of Differentiation 47 Knockout Mice

    PubMed Central

    Koshimizu, Hisatsugu; Takao, Keizo; Matozaki, Takashi; Ohnishi, Hiroshi; Miyakawa, Tsuyoshi

    2014-01-01

    Cluster of differentiation 47 (CD47) is a member of the immunoglobulin superfamily which functions as a ligand for the extracellular region of signal regulatory protein α (SIRPα), a protein which is abundantly expressed in the brain. Previous studies, including ours, have demonstrated that both CD47 and SIRPα fulfill various functions in the central nervous system (CNS), such as the modulation of synaptic transmission and neuronal cell survival. We previously reported that CD47 is involved in the regulation of depression-like behavior of mice in the forced swim test through its modulation of tyrosine phosphorylation of SIRPα. However, other potential behavioral functions of CD47 remain largely unknown. In this study, in an effort to further investigate functional roles of CD47 in the CNS, CD47 knockout (KO) mice and their wild-type littermates were subjected to a battery of behavioral tests. CD47 KO mice displayed decreased prepulse inhibition, while the startle response did not differ between genotypes. The mutants exhibited slightly but significantly decreased sociability and social novelty preference in Crawley’s three-chamber social approach test, whereas in social interaction tests in which experimental and stimulus mice have direct contact with each other in a freely moving setting in a novel environment or home cage, there were no significant differences between the genotypes. While previous studies suggested that CD47 regulates fear memory in the inhibitory avoidance test in rodents, our CD47 KO mice exhibited normal fear and spatial memory in the fear conditioning and the Barnes maze tests, respectively. These findings suggest that CD47 is potentially involved in the regulation of sensorimotor gating and social behavior in mice. PMID:24586890

  13. Applying Robust Directional Similarity based Clustering approach RDSC to classification of gene expression data.

    PubMed

    Li, H X; Wang, Shitong; Xiu, Yu

    2006-06-01

    Despite the fact that the classification of gene expression data from a cDNA microarrays has been extensively studied, nowadays a robust clustering method, which can estimate an appropriate number of clusters and be insensitive to its initialization has not yet been developed. In this work, a novel Robust Clustering approach, RDSC, based on the new Directional Similarity measure is presented. This new approach RDSC, which integrates the Directional Similarity based Clustering Algorithm, DSC, with the Agglomerative Hierarchical Clustering Algorithm, AHC, exhibits its robustness to initialization and its capability to determine the appropriate number of clusters reasonably. RDSC has been successfully employed to both artificial and benchmarking gene expression datasets. Our experimental results demonstrate its distinctive superiority over the conventional method Kmeans and the two typical directional clustering algorithms SPKmeans and moVMF.

  14. Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.

    PubMed

    Min, Cheng; Wenjian, Zhang; Tianyu, Xing; Xiaohong, Yan; Yumao, Li; Hui, Li; Ning, Wang

    2016-08-01

    miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19

  15. Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

    NASA Astrophysics Data System (ADS)

    Takahashi, A.; Hashimoto, M.

    2015-12-01

    Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.

  16. A population-based analysis of clustering identifies a strong genetic contribution to lethal prostate cancer

    PubMed Central

    Nelson, Quentin; Agarwal, Neeraj; Stephenson, Robert; Cannon-Albright, Lisa A.

    2013-01-01

    Background: Prostate cancer is a common and often deadly cancer. Decades of study have yet to identify genes that explain much familial prostate cancer. Traditional linkage analysis of pedigrees has yielded results that are rarely validated. We hypothesize that there are rare segregating variants responsible for high-risk prostate cancer pedigrees, but recognize that within-pedigree heterogeneity is responsible for significant noise that overwhelms signal. Here we introduce a method to identify homogeneous subsets of prostate cancer, based on cancer characteristics, which show the best evidence for an inherited contribution. Methods: We have modified an existing method, the Genealogical Index of Familiality (GIF) used to show evidence for significant familial clustering. The modification allows a test for excess familial clustering of a subset of prostate cancer cases when compared to all prostate cancer cases. Results: Consideration of the familial clustering of eight clinical subsets of prostate cancer cases compared to the expected familial clustering of all prostate cancer cases identified three subsets of prostate cancer cases with evidence for familial clustering significantly in excess of expected. These subsets include prostate cancer cases diagnosed before age 50 years, prostate cancer cases with body mass index (BMI) greater than or equal to 30, and prostate cancer cases for whom prostate cancer contributed to death. Conclusions: This analysis identified several subsets of prostate cancer cases that cluster significantly more than expected when compared to all prostate cancer familial clustering. A focus on high-risk prostate cancer cases or pedigrees with these characteristics will reduce noise and could allow identification of the rare predisposition genes or variants responsible. PMID:23970893

  17. Indentifying the major air pollutants base on factor and cluster analysis, a case study in 74 Chinese cities

    NASA Astrophysics Data System (ADS)

    Zhang, Jing; Zhang, Lan-yue; Du, Ming; Zhang, Wei; Huang, Xin; Zhang, Ya-qi; Yang, Yue-yi; Zhang, Jian-min; Deng, Shi-huai; Shen, Fei; Li, Yuan-wei; Xiao, Hong

    2016-11-01

    This article investigated the major air pollutants and its spatial and seasonal distribution in 74 Chinese cities. Factor analysis and Cluster analysis are employed to indentify major factors of air pollutants. The following results are obtained (1) major factors are obtained in spring, summer, autumn, and winter. The first factor in spring includes NO2, PM10, CO, and PM2.5; the first factor in summer and autumn includes PM10, PM2.5, CO and SO2; in winter, the first factor includes NO2, PM10, PM2.5, and SO2. (2) In spring, cities of cluster 5 are the severest polluted by emission sources of SO2, CO, PM10, and PM2.5; the emission sources of O3 would significantly influence the air quality in cities of cluster 2; the emission sources of NO2 could significantly influence the air quality in cities of cluster 3 and cluster 5. (3) In summer, cities of cluster 5 are the severest polluted by automotive emissions and coal flue gas. Cities of cluster 1 are the lightest polluted. Cities of cluster 3 and cluster 2 are polluted by emission sources of SO2 and O3. (4) In Autumn, cities of cluster 3 and 4 are the severest polluted by the emission sources of SO2, CO, PM10, and PM2.5; the emission sources of NO2 would significantly influence the air quality in cities of cluster 5; the emission sources of O3 could significantly influence the air quality in cities of cluster 1 and cluster 4. (5) In winter, cities of cluster 5 are the severest polluted by the emission sources of SO2, CO, PM10, PM2.5, and CO; the emission sources of O3 could significantly influence the air quality in cities of cluster 1 and cluster 5.

  18. Cluster analysis of European surface ozone observations and MACC reanalysis data

    NASA Astrophysics Data System (ADS)

    Lyapina, Olga; Schultz, Martin; Hense, Andreas; Waychal, Snehal; Schröder, Sabine

    2013-04-01

    Europe has a high density of surface ozone monitoring sites, thus the comparison of measured ozone data with coarse-scale models requires special techniques. We have used Cluster Analysis (CA) to divide stations from the European air quality database (Airbase) into several groups and compare these groups with the results from a similar analysis performed on the output from the MOZART model in the Monitoring Atmospheric Composition and Climate (MACC) project. As initial set of variables the monthly averaged diurnal variations of the individual ozone time series were calculated. CA is an appropriate method for classification of a large number of monitoring sites, in order to find similar ozone behavior and representative station inside each group. Therefore CA opens new possibilities for the comparison between measured and modeled data. Airbase provides ozone data for all countries from the European Union. After applying filter criteria that 2/3 of data must be present in each month during the period 2007-2010, around 1500 stations were chosen from the Airbase. The modeled data were interpolated to the geographical site locations. Clusters from the measurements were compared with corresponding clusters obtained from the MACC model data. CA results are shown, characteristics of separate clusters are described, and seasonal-diurnal variations of clusters from monitored and modeled data are compared and discussed.

  19. Exploring the Relationship between Autism Spectrum Disorder and Epilepsy Using Latent Class Cluster Analysis

    ERIC Educational Resources Information Center

    Cuccaro, Michael L.; Tuchman, Roberto F.; Hamilton, Kara L.; Wright, Harry H.; Abramson, Ruth K.; Haines, Jonathan L.; Gilbert, John R.; Pericak-Vance, Margaret

    2012-01-01

    Epilepsy co-occurs frequently in autism spectrum disorders (ASD). Understanding this co-occurrence requires a better understanding of the ASD-epilepsy phenotype (or phenotypes). To address this, we conducted latent class cluster analysis (LCCA) on an ASD dataset (N = 577) which included 64 individuals with epilepsy. We identified a 5-cluster…

  20. [Current service invention patents and growth pathways on basis of cluster analysis].

    PubMed

    Yang, Xu-jie; Xiao, Shi-ying

    2012-09-01

    This study aims for enhancing quantity and quality of patents of traditional Chinese medicine compounds of traditional Chinese medicine enterprises, traditional Chinese medicine colleges and relevant institutions while building an efficient pathway for patent protection using simple statistics and cluster analysis, with service invention patent holders of traditional Chinese medicine compounds as the study object.

  1. Clustered Stomates in "Begonia": An Exercise in Data Collection & Statistical Analysis of Biological Space

    ERIC Educational Resources Information Center

    Lau, Joann M.; Korn, Robert W.

    2007-01-01

    In this article, the authors present a laboratory exercise in data collection and statistical analysis in biological space using clustered stomates on leaves of "Begonia" plants. The exercise can be done in middle school classes by students making their own slides and seeing imprints of cells, or at the high school level through collecting data of…

  2. Student Motivational Profiles in an Introductory MIS Course: An Exploratory Cluster Analysis

    ERIC Educational Resources Information Center

    Nelson, Klara

    2014-01-01

    This study profiles students in an introductory MIS course according to a variety of variables associated with choice of academic major. The data were collected through a survey administered to 12 sections of the course. A two-step cluster analysis was performed with gender as a categorical variable and students' perceptions of task value…

  3. Profiles of More and Less Successful L2 Learners: A Cluster Analysis Study

    ERIC Educational Resources Information Center

    Sparks, Richard L.; Patton, Jon; Ganschow, Leonore

    2012-01-01

    This retrospective study examined L1 achievement, intelligence, L2 aptitude, and L2 proficiency profiles of 208 students completing two years of high school L2 courses. A cluster analysis was performed to determine whether distinct cognitive and achievement profiles of more and less successful L2 learners would emerge. The results of…

  4. A Cluster Analysis of the Circumstances of Death in Suicides in Hong Kong

    ERIC Educational Resources Information Center

    Chen, Eric Y. H.; Chan, Wincy S. C.; Chan, Sandra S. M.; Liu, Ka Y.; Chan, Cecilia L. W.; Wong, Paul W. C.; Law, Y. W.; Yip, Paul S. F.

    2007-01-01

    Classification of suicides is essential for clinicians to better identify self-harm patients with future suicidal risks. This study examined potential subtypes of suicide in a psychological autopsy sample (N = 148) in Hong Kong. Hierarchical cluster analysis extracted two subgroups of subjects in terms of expressed deliberation assessed by the…

  5. Cluster Analysis of Assessment in Anatomy and Physiology for Health Science Undergraduates

    ERIC Educational Resources Information Center

    Brown, Stephen; White, Sue; Power, Nicola

    2016-01-01

    Academic content common to health science programs is often taught to a mixed group of students; however, content assessment may be consistent for each discipline. This study used a retrospective cluster analysis on such a group, first to identify high and low achieving students, and second, to determine the distribution of students within…

  6. Student Motivation and Learning in Mathematics and Science: A Cluster Analysis

    ERIC Educational Resources Information Center

    Ng, Betsy L. L.; Liu, W. C.; Wang, John C. K.

    2016-01-01

    The present study focused on an in-depth understanding of student motivation and self-regulated learning in mathematics and science through cluster analysis. It examined the different learning profiles of motivational beliefs and self-regulatory strategies in relation to perceived teacher autonomy support, basic psychological needs (i.e. autonomy,…

  7. An algol program for dissimilarity analysis: a divisive-omnithetic clustering technique

    USGS Publications Warehouse

    Tipper, J.C.

    1979-01-01

    Clustering techniques are used properly to generate hypotheses about patterns in data. Of the hierarchical techniques, those which are divisive and omnithetic possess many theoretically optimal properties. One such method, dissimilarity analysis, is implemented here in ALGOL 60, and determined to be competitive computationally with most other methods. ?? 1979.

  8. Fuzzy Clustering Analysis in Environmental Impact Assessment--A Complement Tool to Environmental Quality Index.

    ERIC Educational Resources Information Center

    Kung, Hsiang-Te; And Others

    1993-01-01

    In spite of rapid progress achieved in the methodological research underlying environmental impact assessment (EIA), the problem of weighting various parameters has not yet been solved. This paper presents a new approach, fuzzy clustering analysis, which is illustrated with an EIA case study on Baoshan-Wusong District in Shanghai, China. (Author)

  9. Multiscale deep drawing analysis of dual-phase steels using grain cluster-based RGC scheme

    NASA Astrophysics Data System (ADS)

    Tjahjanto, D. D.; Eisenlohr, P.; Roters, F.

    2015-06-01

    Multiscale modelling and simulation play an important role in sheet metal forming analysis, since the overall material responses at macroscopic engineering scales, e.g. formability and anisotropy, are strongly influenced by microstructural properties, such as grain size and crystal orientations (texture). In the present report, multiscale analysis on deep drawing of dual-phase steels is performed using an efficient grain cluster-based homogenization scheme. The homogenization scheme, called relaxed grain cluster (RGC), is based on a generalization of the grain cluster concept, where a (representative) volume element consists of p  ×  q  ×  r (hexahedral) grains. In this scheme, variation of the strain or deformation of individual grains is taken into account through the, so-called, interface relaxation, which is formulated within an energy minimization framework. An interfacial penalty term is introduced into the energy minimization framework in order to account for the effects of grain boundaries. The grain cluster-based homogenization scheme has been implemented and incorporated into the advanced material simulation platform DAMASK, which purposes to bridge the macroscale boundary value problems associated with deep drawing analysis to the micromechanical constitutive law, e.g. crystal plasticity model. Standard Lankford anisotropy tests are performed to validate the model parameters prior to the deep drawing analysis. Model predictions for the deep drawing simulations are analyzed and compared to the corresponding experimental data. The result shows that the predictions of the model are in a very good agreement with the experimental measurement.

  10. Molecular Clustering Interrelationships and Carbohydrate Conformation in Hull and Seeds Among Barley Cultivars

    SciTech Connect

    N Liu; P Yu

    2011-12-31

    The objective of this study was to use molecular spectral analyses with the diffuse reflectance Fourier transform infrared spectroscopy (DRIFT) bioanlytical technique to study carbohydrate conformation features, molecular clustering and interrelationships in hull and seed among six barley cultivars (AC Metcalfe, CDC Dolly, McLeod, CDC Helgason, CDC Trey, CDC Cowboy), which had different degradation kinetics in rumen. The molecular structure spectral analyses in both hull and seed involved the fingerprint regions of ca. 1536-1484 cm{sup -1} (attributed mainly to aromatic lignin semicircle ring stretch), ca. 1293-1212 cm{sup -1} (attributed mainly to cellulosic compounds in the hull), ca. 1269-1217 cm{sup -1} (attributed mainly to cellulosic compound in the seeds), and ca. 1180-800 cm{sup -1} (attributed mainly to total CHO C-O stretching vibrations) together with an agglomerative hierarchical cluster (AHCA) and principal component spectral analyses (PCA). The results showed that the DRIFT technique plus AHCA and PCA molecular analyses were able to reveal carbohydrate conformation features and identify carbohydrate molecular structure differences in both hull and seeds among the barley varieties. The carbohydrate molecular spectral analyses at the region of ca. 1185-800 cm{sup -1} together with the AHCA and PCA were able to show that the barley seed inherent structures exhibited distinguishable differences among the barley varieties. CDC Helgason had differences from AC Metcalfe, MeLeod, CDC Cowboy and CDC Dolly in carbohydrate conformation in the seed. Clear molecular cluster classes could be distinguished and identified in AHCA analysis and the separate ellipses could be grouped in PCA analysis. But CDC Helgason had no distinguished differences from CDC Trey in carbohydrate conformation. These carbohydrate conformation/structure difference could partially explain why the varieties were different in digestive behaviors in animals. The molecular spectroscopy

  11. Unraveling the dha cluster in Citrobacter werkmanii: comparative genomic analysis of bacterial 1,3-propanediol biosynthesis clusters.

    PubMed

    Maervoet, Veerle E T; De Maeseneire, Sofie L; Soetaert, Wim K; De Mey, Marjan

    2014-04-01

    In natural 1,3-propanediol (PDO) producing microorganisms such as Klebsiella pneumoniae, Citrobacter freundii and Clostridium sp., the genes coding for PDO producing enzymes are grouped in a dha cluster. This article describes the dha cluster of a novel candidate for PDO production, Citrobacter werkmanii DSM17579 and compares the cluster to the currently known PDO clusters of Enterobacteriaceae and Clostridiaceae. Moreover, we attribute a putative function to two previously unannotated ORFs, OrfW and OrfY, both in C. freundii and in C. werkmanii: both proteins might form a complex and support the glycerol dehydratase by converting cob(I)alamin to the glycerol dehydratase cofactor coenzyme B12. Unraveling this biosynthesis cluster revealed high homology between the deduced amino acid sequence of the open reading frames of C. werkmanii DSM17579 and those of C. freundii DSM30040 and K. pneumoniae MGH78578, i.e., 96 and 87.5 % identity, respectively. On the other hand, major differences between the clusters have also been discovered. For example, only one dihydroxyacetone kinase (DHAK) is present in the dha cluster of C. werkmanii DSM17579, while two DHAK enzymes are present in the cluster of K. pneumoniae MGH78578 and Clostridium butyricum VPI1718.

  12. A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data

    PubMed Central

    Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio

    2010-01-01

    Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432

  13. The Feasibility of Using Cluster Analysis to Examine Log Data from Educational Video Games. CRESST Report 790

    ERIC Educational Resources Information Center

    Kerr, Deirdre; Chung, Gregory K. W. K.; Iseli, Markus R.

    2011-01-01

    Analyzing log data from educational video games has proven to be a challenging endeavor. In this paper, we examine the feasibility of using cluster analysis to extract information from the log files that is interpretable in both the context of the game and the context of the subject area. If cluster analysis can be used to identify patterns of…

  14. Market segmentation for multiple option healthcare delivery systems--an application of cluster analysis.

    PubMed

    Jarboe, G R; Gates, R H; McDaniel, C D

    1990-01-01

    Healthcare providers of multiple option plans may be confronted with special market segmentation problems. This study demonstrates how cluster analysis may be used for discovering distinct patterns of preference for multiple option plans. The availability of metric, as opposed to categorical or ordinal, data provides the ability to use sophisticated analysis techniques which may be superior to frequency distributions and cross-tabulations in revealing preference patterns.

  15. Revealing gene clusters associated with the development of cholangiocarcinoma, based on a time series analysis.

    PubMed

    Wu, Jianyu; Xiao, Zhifu; Zhao, Xiulei; Wu, Xiangsong

    2015-05-01

    Cholangiocarcinoma (CC) is a rapidly lethal malignancy and currently is considered to be incurable. Biomarkers related to the development of CC remain unclear. The present study aimed to identify differentially expressed genes (DEGs) between normal tissue and intrahepatic CC, as well as specific gene expression patterns that changed together with the development of CC. By using a two‑way analysis of variance test, the biomarkers that could distinguish between normal tissue and intrahepatic CC dissected from different days were identified. A k‑means cluster method was used to identify gene clusters associated with the development of CC according to their changing expression pattern. Functional enrichment analysis was used to infer the function of each of the gene sets. A time series analysis was constructed to reveal gene signatures that were associated with the development of CC based on gene expression profile changes. Genes related to CC were shown to be involved in 'mitochondrion' and 'focal adhesion'. Three interesting gene groups were identified by the k‑means cluster method. Gene clusters with a unique expression pattern are related with the development of CC. The data of this study will facilitate novel discoveries regarding the genetic study of CC by further work.

  16. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  17. Fault Reactivation Analysis Using Microearthquake Clustering Based on Signal-to-Noise Weighted Waveform Similarity

    NASA Astrophysics Data System (ADS)

    Grund, Michael; Groos, Jörn C.; Ritter, Joachim R. R.

    2016-07-01

    The cluster formation of about 2000 induced microearthquakes (mostly M L < 2) is studied using a waveform similarity technique based on cross-correlation and a subsequent equivalence class approach. All events were detected within two separated but neighbouring seismic volumes close to the geothermal powerplants near Landau and Insheim in the Upper Rhine Graben, SW Germany between 2006 and 2013. Besides different sensors, sampling rates and individual data gaps, mainly low signal-to-noise ratios (SNR) of the recordings at most station sites provide a complication for the determination of a precise waveform similarity analysis of the microseismic events in this area. To include a large number of events for such an analysis, a newly developed weighting approach was implemented in the waveform similarity analysis which directly considers the individual SNRs across the whole seismic network. The application to both seismic volumes leads to event clusters with high waveform similarities within short (seconds to hours) and long (months to years) time periods covering two magnitude ranges. The estimated relative hypocenter locations are spatially concentrated for each single cluster and mirror the orientations of mapped faults as well as interpreted rupture planes determined from fault plane solutions. Depending on the waveform cross-correlation coefficient threshold, clusters can be resolved in space to as little as one dominant wavelength. The interpretation of these observations implies recurring fault reactivations by fluid injection with very similar faulting mechanisms during different time periods between 2006 and 2013.

  18. Clustering analysis of western North Pacific Tropical Cyclone tracks using the Self Organizing Map

    NASA Astrophysics Data System (ADS)

    Kim, H.; Seo, K.

    2013-12-01

    A cluster analysis using Self Organizing Map (SOM) is used to characterize tropical cyclone (TC) tracks over the western North Pacific. A False Discovery Rate (FDR) method is used to objectively determine an optimum cluster number. For 620 TC tracks over the WNP from June-October during 1979-2010, the five clusters for TC tracks are selected. These can further be categorized into three major patterns: straight-moving track, recurving track, and quasi-random pattern. Each pattern is characterized by land falling regions: near South and East China, East Asia, and off-shore of Japan. In addition, each pattern shows distinctive properties in its traveling distance, lifetime, intensity (mean minimum sea level pressure), and genesis location. It is revealed that these three patterns are associated with the large-scale dynamics such as variability of the western Pacific subtropical high and the Madden-Julian Oscillation. The impacts of El Nino and NAO will be discussed.

  19. [Investigation of fuzzy-clustering in octane number prediction model based on detailed hydrocarbon analysis data].

    PubMed

    Liu, Yingrong; Xu, Yupeng; Yang, Haiying

    2004-09-01

    A method to establish octane number prediction model based on detailed hydrocarbon analysis (DHA) data is presented. The techniques of fuzzy-clustering and the Euclidian distance are employed to select the samples needed in pattern establishment. One hundred and fifty gasoline samples and an amount of 140 characteristic components in the DHA chromatogram of each sample are used for the fuzzy-clustering research. It is found that the 3 - 10 samples, which have the nearest Euclidian distance ( < 1.5) to the prediction sample in the same cluster, are enough to build the octane number prediction model. The experimental results proved that the model obtained according to the above method has more predictable accuracy, wider application range and higher data resource utility compared with the current prediction method.

  20. Assessing antibiotic resistance in fecal Escherichia coli in young calves using cluster analysis techniques.

    PubMed

    Berge, A C B; Atwill, E R; Sischo, W M

    2003-10-15

    This study uses cluster analysis techniques to describe the antibiotic susceptibility patterns seen in calf fecal Escherichia coli (E. coli). Cohorts of 30 dairy calves at six farms were sampled at 2-week intervals during the pre-weaning period. At each sampling occasion five fecal E. coli isolates per calf were analyzed for antibiotic susceptibility to 12 antibiotics using the disk diffusion method. All isolates had a profile consisting of the aggregate measured inhibition zone size for each of the evaluated antibiotics. Several cluster analytic algorithms were assessed to partition the E. coli isolates. For our data, Ward's minimum variance method met the objectives of the study. Relative to the number of possible combinations of resistance clusters, a parsimonious set of 14 patterns was developed. This set of E. coli isolates exhibited a limited set of resistance patterns to the different antibiotics indicating that certain resistance genes may be linked.

  1. Cluster Method Analysis of K. S. C. Image

    NASA Technical Reports Server (NTRS)

    Rodriguez, Joe, Jr.; Desai, M.

    1997-01-01

    Information obtained from satellite-based systems has moved to the forefront as a method in the identification of many land cover types. Identification of different land features through remote sensing is an effective tool for regional and global assessment of geometric characteristics. Classification data acquired from remote sensing images have a wide variety of applications. In particular, analysis of remote sensing images have special applications in the classification of various types of vegetation. Results obtained from classification studies of a particular area or region serve towards a greater understanding of what parameters (ecological, temporal, etc.) affect the region being analyzed. In this paper, we make a distinction between both types of classification approaches although, focus is given to the unsupervised classification method using 1987 Thematic Mapped (TM) images of Kennedy Space Center.

  2. Automation of Large-scale Computer Cluster Monitoring Information Analysis

    NASA Astrophysics Data System (ADS)

    Magradze, Erekle; Nadal, Jordi; Quadt, Arnulf; Kawamura, Gen; Musheghyan, Haykuhi

    2015-12-01

    High-throughput computing platforms consist of a complex infrastructure and provide a number of services apt to failures. To mitigate the impact of failures on the quality of the provided services, a constant monitoring and in time reaction is required, which is impossible without automation of the system administration processes. This paper introduces a way of automation of the process of monitoring information analysis to provide the long and short term predictions of the service response time (SRT) for a mass storage and batch systems and to identify the status of a service at a given time. The approach for the SRT predictions is based on Adaptive Neuro Fuzzy Inference System (ANFIS). An evaluation of the approaches is performed on real monitoring data from the WLCG Tier 2 center GoeGrid. Ten fold cross validation results demonstrate high efficiency of both approaches in comparison to known methods.

  3. Symmetry analysis in the investigation of clusters in complex metallic alloys

    NASA Astrophysics Data System (ADS)

    Sikora, W.; Malinowski, J.; Kuna, A.; Pytlik, L.

    2008-03-01

    In the complex metallic alloys (CMA) it is often found that some parts of the unit cell form well-defined nanoscale building blocks, called clusters, which are characterized by a specific local symmetry and separated from the 'matrix' crystal lattice by a partially disordered interface zone. The interior of the cluster is usually a close packed structure, the structure of which is not always exactly known, because of the partial disorder in the outer coordination shells. In many CMA's the clusters form a high-symmetry superlattice structure, what usually leads to a giant cubic or pseudo cubic unit cell. The present paper shows a possibility to analyze the changes in local symmetry of the clusters (objects decorating the superlattice nodes) during transformations of the global crystal symmetry. The symmetry analysis method applied to tensor objects, attributed to the clusters, provides information about the symmetry relations between the objects located in different nodes as well as the local symmetry of individual objects (local principal axes, local anisotropy etc.)

  4. Links between patterns of racial socialization and discrimination experiences and psychological adjustment: a cluster analysis.

    PubMed

    Ajayi, Alex A; Syed, Moin

    2014-10-01

    This study used a person-oriented analytic approach to identify meaningful patterns of barriers-focused racial socialization and perceived racial discrimination experiences in a sample of 295 late adolescents. Using cluster analysis, three distinct groups were identified: Low Barrier Socialization-Low Discrimination, High Barrier Socialization-Low Discrimination, and High Barrier Socialization-High Discrimination clusters. These groups were substantively unique in terms of the frequency of racial socialization messages about bias preparation and out-group mistrust its members received and their actual perceived discrimination experiences. Further, individuals in the High Barrier Socialization-High Discrimination cluster reported significantly higher depressive symptoms than those in the Low Barrier Socialization-Low Discrimination and High Barrier Socialization-Low Discrimination clusters. However, no differences in adjustment were observed between the Low Barrier Socialization-Low Discrimination and High Barrier Socialization-Low Discrimination clusters. Overall, the findings highlight important individual differences in how young people of color experience their race and how these differences have significant implications on psychological adjustment.

  5. A clustering analysis of eddies' spatial distribution in the South China Sea

    NASA Astrophysics Data System (ADS)

    Yi, J.; Du, Y.; Wang, X.; He, Z.; Zhou, C.

    2013-02-01

    Spatial variation is important for studying the mesoscale eddies in the South China Sea (SCS). To investigate such spatial variations, this study made a clustering analysis on eddies' distribution using the K-means approach. Results showed that clustering tendency of anticyclonic eddies (AEs) and cyclonic eddies (CEs) were weak but not random, and the number of clusters were proved greater than four. Finer clustering results showed 10 regions where AEs densely populated and 6 regions for CEs in the SCS. Previous studies confirmed these partitions and possible generation mechanisms were related. Comparisons between AEs and CEs revealed that patterns of AE are relatively more aggregated than those of CE, and specific distinctions were summarized: (1) to the southwest of Luzon Island, AEs and CEs are generated spatially apart; AEs are likely located north of 14° N and closer to shore, while CEs are to the south and further offshore. (2) The central SCS and Nansha Trough are mostly dominated by AEs. (3) Along 112° E, clusters of AEs and CEs are located sequentially apart, and the pairs off Vietnam represent the dipole structures. (4) To the southwest of the Dongsha Islands, AEs are concentrated to the east of CEs. Overlaps of AEs and CEs in the northeastern and southern SCS were further examined considering seasonal variations. The northeastern overlap represented near-concentric distributions while the southern one was a mixed effect of seasonal variations, complex circulations and topography influences.

  6. Links between patterns of racial socialization and discrimination experiences and psychological adjustment: a cluster analysis.

    PubMed

    Ajayi, Alex A; Syed, Moin

    2014-10-01

    This study used a person-oriented analytic approach to identify meaningful patterns of barriers-focused racial socialization and perceived racial discrimination experiences in a sample of 295 late adolescents. Using cluster analysis, three distinct groups were identified: Low Barrier Socialization-Low Discrimination, High Barrier Socialization-Low Discrimination, and High Barrier Socialization-High Discrimination clusters. These groups were substantively unique in terms of the frequency of racial socialization messages about bias preparation and out-group mistrust its members received and their actual perceived discrimination experiences. Further, individuals in the High Barrier Socialization-High Discrimination cluster reported significantly higher depressive symptoms than those in the Low Barrier Socialization-Low Discrimination and High Barrier Socialization-Low Discrimination clusters. However, no differences in adjustment were observed between the Low Barrier Socialization-Low Discrimination and High Barrier Socialization-Low Discrimination clusters. Overall, the findings highlight important individual differences in how young people of color experience their race and how these differences have significant implications on psychological adjustment. PMID:25124381

  7. A cluster analysis of the neurons of the rat interpeduncular nucleus.

    PubMed Central

    Gioia, M; Vizzotto, L; Bianchi, R

    1994-01-01

    The morphometric characteristics of the neurons of the interpeduncular nucleus (IPN) in the rat were investigated by cluster analysis in order to identify neuronal groups which are morphometrically homogeneous, and to define their position and density in the IPN subnuclei. Two clusters of cells were detected. Cluster 1 neurons had a larger perikaryal size with a mean cross-sectional area of 170 microns2 and a high nuclear/cytoplasmic ratio. They were located mainly in the pars dorsalis (37%) and pars medialis (34%) rather than in the pars lateralis (29%). Cluster 1 neurons were also more frequent at the rostral (31%) and caudal (57%) poles than in the central part of the IPN. Cluster 2 cells showed a smaller mean perikaryal area (110 microns2), a small nucleus and abundant cytoplasm. They were equally distributed throughout the whole IPN. These findings suggest the existence of a magnocellular region at the rostral pole of the IPN which has not been described previously. The presence of IPN regions endowed with specific cytoarchitectural characteristics is discussed with respect to the complex neurochemical organisation of the nucleus. Images Fig. 1 Fig. 2 Fig. 4 PMID:7649781

  8. Descriptive characteristics and cluster analysis of male veteran hazardous drinkers in an alcohol moderation intervention.

    PubMed

    Walker, Robrina; Hunt, Yvonne M; Olivier, Jake; Grothe, Karen B; Dubbert, Patricia M; Burke, Randy S; Cushman, William C

    2012-01-01

    Current efforts underway to develop the fifth edition of the Diagnostic and Statistical Manual (DSM-5) have reignited discussions for classifying the substance use disorders. This study's aim was to contribute to the understanding of abusive alcohol use and its validity as a diagnosis. Cluster analysis was used to identify relatively homogeneous groups of hazardous, nondependent drinkers by using data collected from the Prevention and Treatment of Hypertension Study (PATHS), a multisite trial that examined the ability of a cognitive-behavioral-based alcohol reduction intervention, compared to a control condition, to reduce alcohol use. Participants for this study (N = 511) were male military veterans. Variables theoretically associated with alcohol use (eg, demographic, tobacco use, and mental health) were used to create the clusters and a priori, empirically based external criteria were used to assess discriminant validity. Bivariate correlations among cluster variables were generally consistent with previous findings in the literature. Analyses of internal and discriminant validity of the identified clusters were largely nonsignificant, suggesting meaningful differences between clusters could not be identified. Although the typology literature has contributed supportive validity for the alcohol dependence diagnosis, this study's results do not lend supportive validity for the construct of alcohol abuse. PMID:22691012

  9. Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland.

    PubMed

    Sasidharan, Lekshmi; Wu, Kun-Feng; Menendez, Monica

    2015-12-01

    One of the major challenges in traffic safety analyses is the heterogeneous nature of safety data, due to the sundry factors involved in it. This heterogeneity often leads to difficulties in interpreting results and conclusions due to unrevealed relationships. Understanding the underlying relationship between injury severities and influential factors is critical for the selection of appropriate safety countermeasures. A method commonly employed to address systematic heterogeneity is to focus on any subgroup of data based on the research purpose. However, this need not ensure homogeneity in the data. In this paper, latent class cluster analysis is applied to identify homogenous subgroups for a specific crash type-pedestrian crashes. The manuscript employs data from police reported pedestrian (2009-2012) crashes in Switzerland. The analyses demonstrate that dividing pedestrian severity data into seven clusters helps in reducing the systematic heterogeneity of the data and to understand the hidden relationships between crash severity levels and socio-demographic, environmental, vehicle, temporal, traffic factors, and main reason for the crash. The pedestrian crash injury severity models were developed for the whole data and individual clusters, and were compared using receiver operating characteristics curve, for which results favored clustering. Overall, the study suggests that latent class clustered regression approach is suitable for reducing heterogeneity and revealing important hidden relationships in traffic safety analyses.

  10. Cluster analysis of the DrugBank chemical space using molecular quantum numbers.

    PubMed

    Awale, Mahendra; Reymond, Jean-Louis

    2012-09-15

    DrugBank (>6000 approved and experimental drugs) was analyzed using molecular quantum numbers (MQNs), which are 42 integer value descriptors of molecular structure counting atoms, bonds, polar groups and topological features. Principal component analysis of MQN-space showed that drugs differ mostly by size (PC1, 67% variance) and structural rigidity and polarity (PC2, 18% variance). Twenty-eight groups of target specific drugs were recovered by proximity sorting in MQN-space as efficiently as by substructure fingerprint (SF) similarity, but in different order allowing for lead-hopping relationships not seen in SF similarity. Clustering by MQN- or SF-similarity produced very different types of clusters. Each of the 28 drug groups spread over different clusters in both MQN- and SF-clustering, and most clusters contained drugs from different target specific groups, showing that structure-based classifications only partially overlap with bioactivity. An MQN-browsable version of DrugBank is available at www.gdb.unibe.ch. PMID:22465859

  11. Cluster-based analysis for personalized stress evaluation using physiological signals.

    PubMed

    Xu, Qianli; Nwe, Tin Lay; Guan, Cuntai

    2015-01-01

    Technology development in wearable sensors and biosignal processing has made it possible to detect human stress from the physiological features. However, the intersubject difference in stress responses presents a major challenge for reliable and accurate stress estimation. This research proposes a novel cluster-based analysis method to measure perceived stress using physiological signals, which accounts for the intersubject differences. The physiological data are collected when human subjects undergo a series of task-rest cycles, incurring varying levels of stress that is indicated by an index of the State Trait Anxiety Inventory. Next, a quantitative measurement of stress is developed by analyzing the physiological features in two steps: 1) a k -means clustering process to divide subjects into different categories (clusters), and 2) cluster-wise stress evaluation using the general regression neural network. Experimental results show a significant improvement in evaluation accuracy as compared to traditional methods without clustering. The proposed method is useful in developing intelligent, personalized products for human stress management. PMID:25561450

  12. Spatiotemporal Clustering Analysis and Risk Assessments of Human Cutaneous Anthrax in China, 2005–2012

    PubMed Central

    Qian, Quan; Haque, Ubydul; Soares Magalhaes, Ricardo J.; Li, Shen-Long; Tong, Shi-Lu; Li, Cheng-Yi; Sun, Hai-Long; Sun, Yan-Song

    2015-01-01

    Objective To investigate the epidemic characteristics of human cutaneous anthrax (CA) in China, detect the spatiotemporal clusters at the county level for preemptive public health interventions, and evaluate the differences in the epidemiological characteristics within and outside clusters. Methods CA cases reported during 2005–2012 from the national surveillance system were evaluated at the county level using space-time scan statistic. Comparative analysis of the epidemic characteristics within and outside identified clusters was performed using using the χ2 test or Kruskal-Wallis test. Results The group of 30–39 years had the highest incidence of CA, and the fatality rate increased with age, with persons ≥70 years showing a fatality rate of 4.04%. Seasonality analysis showed that most of CA cases occurred between May/June and September/October of each year. The primary spatiotemporal cluster contained 19 counties from June 2006 to May 2010, and it was mainly located straddling the borders of Sichuan, Gansu, and Qinghai provinces. In these high-risk areas, CA cases were predominantly found among younger, local, males, shepherds, who were living on agriculture and stockbreeding and characterized with high morbidity, low mortality and a shorter period from illness onset to diagnosis. Conclusion CA was geographically and persistently clustered in the Southwestern China during 2005–2012, with notable differences in the epidemic characteristics within and outside spatiotemporal clusters; this demonstrates the necessity for CA interventions such as enhanced surveillance, health education, mandatory and standard decontamination or disinfection procedures to be geographically targeted to the areas identified in this study. PMID:26208355

  13. Chemical analysis of giant stars in the young open cluster NGC 3114

    NASA Astrophysics Data System (ADS)

    Santrich, O. J. Katime; Pereira, C. B.; Drake, N. A.

    2013-06-01

    Context. Open clusters are very useful targets for examining possible trends in galactocentric distance and age, especially when young and old open clusters are compared. Aims: We carried out a detailed spectroscopic analysis to derive the chemical composition of seven red giants in the young open cluster NGC 3114. Abundances of C, N, O, Li, Na, Mg, Al, Ca, Si, Ti, Ni, Cr, Y, Zr, La, Ce, and Nd were obtained, as well as the carbon isotopic ratio. Methods: The atmospheric parameters of the studied stars and their chemical abundances were determined using high-resolution optical spectroscopy. We employed the local-thermodynamic-equilibrium model atmospheres of Kurucz and the spectral analysis code MOOG. The abundances of the light elements were derived using the spectral synthesis technique. Results: We found that NGC 3114 has a mean metallicity of [Fe/H] = -0.01 ± 0.03. The isochrone fit yielded a turn-off mass of 4.2 M⊙. The [N/C] ratio is in good agreement with the models predicted by first dredge-up. We found that two stars, HD 87479 and HD 304864, have high rotational velocities of 15.0 km s-1 and 11.0 km s-1; HD 87526 is a halo star and is not a member of NGC 3114. Conclusions: The carbon and nitrogen abundance in NGC 3114 agree with the field and cluster giants. The oxygen abundance in NGC 3114 is lower compared to the field giants. The [O/Fe] ratio is similar to the giants in young clusters. We detected sodium enrichment in the analyzed cluster giants. As far as the other elements are concerned, their [X/Fe] ratios follow the same trend seen in giants with the same metallicity. Based on observations made with the 2.2 m telescope at the European Southern Observatory (La Silla, Chile).Tables 2 and 5 are available in electronic form at http://www.aanda.org

  14. Phenotype Clustering of Breast Epithelial Cells in Confocal Imagesbased on Nuclear Protein Distribution Analysis

    SciTech Connect

    Long, Fuhui; Peng, Hanchuan; Sudar, Damir; Levievre, Sophie A.; Knowles, David W.

    2006-09-05

    Background: The distribution of the chromatin-associatedproteins plays a key role in directing nuclear function. Previously, wedeveloped an image-based method to quantify the nuclear distributions ofproteins and showed that these distributions depended on the phenotype ofhuman mammary epithelial cells. Here we describe a method that creates ahierarchical tree of the given cell phenotypes and calculates thestatistical significance between them, based on the clustering analysisof nuclear protein distributions. Results: Nuclear distributions ofnuclear mitotic apparatus protein were previously obtained fornon-neoplastic S1 and malignant T4-2 human mammary epithelial cellscultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 andthe number of days in cultured. A probabilistic ensemble approach wasused to define a set of consensus clusters from the results of multipletraditional cluster analysis techniques applied to the nucleardistribution data. Cluster histograms were constructed to show how cellsin any one phenotype were distributed across the consensus clusters.Grouping various phenotypes allowed us to build phenotype trees andcalculate the statistical difference between each group. The resultsshowed that non-neoplastic S1 cells could be distinguished from malignantT4-2 cells with 94.19 percent accuracy; that proliferating S1 cells couldbe distinguished from differentiated S1 cells with 92.86 percentaccuracy; and showed no significant difference between the variousphenotypes of T4-2 cells corresponding to increasing tumor sizes.Conclusion: This work presents a cluster analysis method that canidentify significant cell phenotypes, based on the nuclear distributionof specific proteins, with high accuracy.

  15. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    NASA Astrophysics Data System (ADS)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  16. An assessment of climatological synoptic typing by principal component analysis and kmeans clustering

    NASA Astrophysics Data System (ADS)

    Cuell, Charles; Bonsal, Barrie

    2009-10-01

    A common method of automated synoptic typing for climatological investigations involves data reduction by principal component analysis followed by the application of a clustering method. The number of eigenvectors kept in the principal component analysis is usually determined by a threshold value of relative variance retained, typically 85% to 95%, under the implicit assumption that varying this relative variance will not affect the resultant synoptic catalogue. This assumption is tested using daily 500-mb geopotential heights over northwest Canada during the winter period (December to February) from 1948 to 2006. Results show that the synoptic catalogue and associated surface climatological characteristics undergo changes for values of relative variance retained over 99%, indicating the typical thresholds are too low and calling into question the validity of performing principal component analysis prior to objective clustering.

  17. Clustering Analysis of OFFICER'S Behaviours in London Police Foot Patrol Activities

    NASA Astrophysics Data System (ADS)

    Shen, J.; Cheng, T.

    2015-07-01

    In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers' patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI) to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  18. Cluster analysis of European surface ozone observations for evaluation of MACC reanalysis data

    NASA Astrophysics Data System (ADS)

    Lyapina, Olga; Schultz, Martin G.; Hense, Andreas

    2016-06-01

    The high density of European surface ozone monitoring sites provides unique opportunities for the investigation of regional ozone representativeness and for the evaluation of chemistry climate models. The regional representativeness of European ozone measurements is examined through a cluster analysis (CA) of 4 years of 3-hourly ozone data from 1492 European surface monitoring stations in the Airbase database; the time resolution corresponds to the output frequency of the model that is compared to the data in this study. K-means clustering is implemented for seasonal-diurnal variations (i) in absolute mixing ratio units and (ii) normalized by the overall mean ozone mixing ratio at each site. Statistical tests suggest that each CA can distinguish between four and five different ozone pollution regimes. The individual clusters reveal differences in seasonal-diurnal cycles, showing typical patterns of the ozone behavior for more polluted stations or more rural background. The robustness of the clustering was tested with a series of k-means runs decreasing randomly the size of the initial data set or lengths of the time series. Except for the Po Valley, the clustering does not provide a regional differentiation, as the member stations within each cluster are generally distributed all over Europe. The typical seasonal, diurnal, and weekly cycles of each cluster are compared to the output of the multi-year global reanalysis produced within the Monitoring of Atmospheric Composition and Climate (MACC) project. While the MACC reanalysis generally captures the shape of the diurnal cycles and the diurnal amplitudes, it is not able to reproduce the seasonal cycles very well and it exhibits a high bias up to 12 nmol mol-1. The bias decreases from more polluted clusters to cleaner ones. Also, the seasonal and weekly cycles and frequency distributions of ozone mixing ratios are better described for clusters with relatively clean signatures. Due to relative sparsity of CO and NOx

  19. Arthropod monitoring for fine-scale habitat analysis: A case study of the El Segundo sand dunes

    SciTech Connect

    Mattoni, R.; Longcore, T.; Novotny, V.

    2000-04-01

    Arthropod communities from several habitats on and adjacent to the El Segundo dunes (Los Angeles County, CA) were sampled using pitfall and yellow pan traps to evaluate their possible use as indicators of restoration success. Communities were ordinated and clustered using correspondence analysis, detrended correspondence analysis, two-way indicator species analysis, and Ward's method of agglomerative clustering. The results showed high repeatability among replicates within any sampling arena that permits discrimination of (1) degraded and relatively undisturbed habitat, (2) different dune habitat types, and (3) annual change. Canonical correspondence analysis showed a significant effect of disturbance history on community composition that explained 5--20% of the variation. Replicates of pitfall and yellow pan traps on single sites clustered together reliably when species abundance was considered, whereas clusters using only species incidence did not group replicates as consistently. The broad taxonomic approach seems appropriate for habitat evaluation and monitoring of restoration projects as an alternative to assessments geared to single species or even single families.

  20. Expanded Natural Product Diversity Revealed by Analysis of Lanthipeptide-Like Gene Clusters in Actinobacteria

    PubMed Central

    Zhang, Qi; Doroghazi, James R.; Zhao, Xiling; Walker, Mark C.

    2015-01-01

    Lanthionine-containing peptides (lanthipeptides) are a rapidly growing family of polycyclic peptide natural products belonging to the large class of ribosomally synthesized and posttranslationally modified peptides (RiPPs). Lanthipeptides are widely distributed in taxonomically distant species, and their currently known biosynthetic systems and biological activities are diverse. Building on the recent natural product gene cluster family (GCF) project, we report here large-scale analysis of lanthipeptide-like biosynthetic gene clusters from Actinobacteria. Our analysis suggests that lanthipeptide biosynthetic pathways, and by extrapolation the natural products themselves, are much more diverse than currently appreciated and contain many different posttranslational modifications. Furthermore, lanthionine synthetases are much more diverse in sequence and domain topology than currently characterized systems, and they are used by the biosynthetic machineries for natural products other than lanthipeptides. The gene cluster families described here significantly expand the chemical diversity and biosynthetic repertoire of lanthionine-related natural products. Biosynthesis of these novel natural products likely involves unusual and unprecedented biochemistries, as illustrated by several examples discussed in this study. In addition, class IV lanthipeptide gene clusters are shown not to be silent, setting the stage to investigate their biological activities. PMID:25888176

  1. Joint Analysis of Cluster Observations. II. Chandra/XMM-Newton X-Ray and Weak Lensing Scaling Relations for a Sample of 50 Rich Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Mahdavi, Andisheh; Hoekstra, Henk; Babul, Arif; Bildfell, Chris; Jeltema, Tesla; Henry, J. Patrick

    2013-04-01

    We present a study of multiwavelength X-ray and weak lensing scaling relations for a sample of 50 clusters of galaxies. Our analysis combines Chandra and XMM-Newton data using an energy-dependent cross-calibration. After considering a number of scaling relations, we find that gas mass is the most robust estimator of weak lensing mass, yielding 15% ± 6% intrinsic scatter at r500WL (the pseudo-pressure YX yields a consistent scatter of 22% ± 5%). The scatter does not change when measured within a fixed physical radius of 1 Mpc. Clusters with small brightest cluster galaxy (BCG) to X-ray peak offsets constitute a very regular population whose members have the same gas mass fractions and whose even smaller (<10%) deviations from regularity can be ascribed to line of sight geometrical effects alone. Cool-core clusters, while a somewhat different population, also show the same (<10%) scatter in the gas mass-lensing mass relation. There is a good correlation and a hint of bimodality in the plane defined by BCG offset and central entropy (or central cooling time). The pseudo-pressure YX does not discriminate between the more relaxed and less relaxed populations, making it perhaps the more even-handed mass proxy for surveys. Overall, hydrostatic masses underestimate weak lensing masses by 10% on the average at r500WL; but cool-core clusters are consistent with no bias, while non-cool-core clusters have a large and constant 15%-20% bias between r2500WL and r500WL, in agreement with N-body simulations incorporating unthermalized gas. For non-cool-core clusters, the bias correlates well with BCG ellipticity. We also examine centroid shift variance and power ratios to quantify substructure; these quantities do not correlate with residuals in the scaling relations. Individual clusters have for the most part forgotten the source of their departures from self-similarity.

  2. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. II. NGC 5024, NGC 5272, and NGC 6352

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.

    2016-07-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ˜0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.

  3. Spatial cluster analysis of nanoscopically mapped serotonin receptors for classification of fixed brain tissue

    NASA Astrophysics Data System (ADS)

    Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw

    2014-01-01

    We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.

  4. Detection of Significant Groups in Hierarchical Clustering by Resampling

    PubMed Central

    Sebastiani, Paola; Perls, Thomas T.

    2016-01-01

    Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and “tree-cutting” procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method. PMID:27551289

  5. Detection of Significant Groups in Hierarchical Clustering by Resampling.

    PubMed

    Sebastiani, Paola; Perls, Thomas T

    2016-01-01

    Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and "tree-cutting" procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method. PMID:27551289

  6. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures.

    PubMed

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-14

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga.

  7. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures.

    PubMed

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-14

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga. PMID:27421419

  8. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures

    NASA Astrophysics Data System (ADS)

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-01

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga.

  9. Validation of disease states in schizophrenia: comparison of cluster analysis between US and European populations

    PubMed Central

    Thokagevistk, Katia; Millier, Aurélie; Lenert, Leslie; Sadikhov, Shamil; Moreno, Santiago; Toumi, Mondher

    2016-01-01

    Background There is controversy as to whether use of statistical clustering methods to identify common disease patterns in schizophrenia identifies patterns generalizable across countries. Objective The goal of this study was to compare disease states identified in a published study (Mohr/Lenert, 2004) considering US patients to disease states in a European cohort (EuroSC) considering English, French, and German patients. Methods Using methods paralleling those in Mohr/Lenert, we conducted a principal component analysis (PCA) on Positive and Negative Syndrome Scale items in the EuroSC data set (n=1,208), followed by k-means cluster analyses and a search for an optimal k. The optimal model structure was compared to Mohr/Lenert by assigning discrete severity levels to each cluster in each factor based on the cluster center. A harmonized model was created and patients were assigned to health states using both approaches; agreement rates in state assignment were then calculated. Results Five factors accounting for 56% of total variance were obtained from PCA. These factors corresponded to positive symptoms (Factor 1), negative symptoms (Factor 2), cognitive impairment (Factor 3), hostility/aggression (Factor 4), and mood disorder (Factor 5) (as in Mohr/Lenert). The optimal number of cluster states was six. The kappa statistic (95% confidence interval) for agreement in state assignment was 0.686 (0.670–0.703). Conclusion The patterns of schizophrenia effects identified using clustering in two different data sets were reasonably similar. Results suggest the Mohr/Lenert health state model is potentially generalizable to other populations. PMID:27386054

  10. The rich cluster of galaxies ABCG 85. I. X-ray analysis.

    NASA Astrophysics Data System (ADS)

    Pislar, V.; Durret, F.; Gerbal, D.; Lima Neto, G. B.; Slezak, E.

    1997-06-01

    We present an X-ray analysis of the rich cluster ABCG 85 based on ROSAT PSPC data. By applying an improved wavelet analysis, we show that our view of this cluster is notably changed from what was previously believed (a main region and a south blob). The main emission comes from the central part of the main body of the cluster on which is superimposed that of a foreground group of galaxies. The foreground group and the main cluster are separated (if redshifts are cosmological) by 46h_50_^-1^Mpc. The southern blob is clearly not a group: it is resolved into X-ray emitting galaxies (in particular the second more luminous galaxy of the main cluster). Several X-ray features are identified with bright galaxies. We performed a spectral analysis and derived the temperature (T), metallicity (Z) and hydrogen column density (N_H_). The global quantities are: T=4keV (in agreement with the velocity dispersion of 760km/s) and Z=0.2Zsun_. We cannot derive accurate gradients for these quantities with our data, but there is strong evidence that the temperature is lower (~2.8keV) and the metallicity much higher (Z~0.8Zsun_) in the very centre (within about 50h_50_^-1^kpc). We present a pixel by pixel method to model the physical properties of the X-ray gas and derive its density distribution. We apply classical methods to estimate the dynamical, gas and stellar masses, as well as the cooling time and cooling flow characteristics. At the limiting radius of the image (1.4h_50_^-1^Mpc), we find M_dyn_~(2.1-2.9)x10^14^h_50_^-1^Msun_, M_gas_/M_dyn_~0.18h_50_^-1.5^. The stellar mass is 6.7x10^12^Msun_, giving a mass to light ratio of M/L_V_~300h_50_. The cooling time is estimated for different models, leading to a cooling radius of 30-80kpc depending on the adopted cluster age; the mass deposit rate is 20-70Msun_/yr, lower than previous determinations. These results are discussed (cooling flow paradigm in relation with high Z, `baryonic crisis' etc.) in connection with current ideas on

  11. Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

    NASA Astrophysics Data System (ADS)

    Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

    2012-12-01

    The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system

  12. Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data

    PubMed Central

    Varshavsky, Roy; Horn, David; Linial, Michal

    2008-01-01

    Background A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. Methodology/Principal Findings We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. Conclusions Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide

  13. Network analysis identifies protein clusters of functional importance in juvenile idiopathic arthritis

    PubMed Central

    2014-01-01

    Introduction Our objective was to utilise network analysis to identify protein clusters of greatest potential functional relevance in the pathogenesis of oligoarticular and rheumatoid factor negative (RF-ve) polyarticular juvenile idiopathic arthritis (JIA). Methods JIA genetic association data were used to build an interactome network model in BioGRID 3.2.99. The top 10% of this protein:protein JIA Interactome was used to generate a minimal essential network (MEN). Reactome FI Cytoscape 2.83 Plugin and the Disease Association Protein-Protein Link Evaluator (Dapple) algorithm were used to assess the functionality of the biological pathways within the MEN and to statistically rank the proteins. JIA gene expression data were integrated with the MEN and clusters of functionally important proteins derived using MCODE. Results A JIA interactome of 2,479 proteins was built from 348 JIA associated genes. The MEN, representing the most functionally related components of the network, comprised of seven clusters, with distinct functional characteristics. Four gene expression datasets from peripheral blood mononuclear cells (PBMC), neutrophils and synovial fluid monocytes, were mapped onto the MEN and a list of genes enriched for functional significance identified. This analysis revealed the genes of greatest potential functional importance to be PTPN2 and STAT1 for oligoarticular JIA and KSR1 for RF-ve polyarticular JIA. Clusters of 23 and 14 related proteins were derived for oligoarticular and RF-ve polyarticular JIA respectively. Conclusions This first report of the application of network biology to JIA, integrating genetic association findings and gene expression data, has prioritised protein clusters for functional validation and identified new pathways for targeted pharmacological intervention. PMID:24886659

  14. Accounting for Limited Detection Efficiency and Localization Precision in Cluster Analysis in Single Molecule Localization Microscopy

    PubMed Central

    Shivanandan, Arun; Unnikrishnan, Jayakrishnan; Radenovic, Aleksandra

    2015-01-01

    Single Molecule Localization Microscopy techniques like PhotoActivated Localization Microscopy, with their sub-diffraction limit spatial resolution, have been popularly used to characterize the spatial organization of membrane proteins, by means of quantitative cluster analysis. However, such quantitative studies remain challenged by the techniques’ inherent sources of errors such as a limited detection efficiency of less than 60%, due to incomplete photo-conversion, and a limited localization precision in the range of 10 – 30nm, varying across the detected molecules, mainly depending on the number of photons collected from each. We provide analytical methods to estimate the effect of these errors in cluster analysis and to correct for them. These methods, based on the Ripley’s L(r) – r or Pair Correlation Function popularly used by the community, can facilitate potentially breakthrough results in quantitative biology by providing a more accurate and precise quantification of protein spatial organization. PMID:25794150

  15. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles

    PubMed Central

    Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G. Michael; O’Connor, Christopher; Patel, Chetan B.

    2016-01-01

    Background Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. Objective To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. Methods We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. Results We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1–3 had 45–70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. Conclusions By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis

  16. Seismic clusters analysis in North-Eastern Italy by the nearest-neighbor approach

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2016-04-01

    The main features of earthquake clusters in the Friuli Venezia Giulia Region (North Eastern Italy) are explored, with the aim to get some new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters of small-to-medium magnitude events, as opposed to selected clusters analyzed in earlier studies. To characterize the features of seismicity for FVG, we take advantage of updated information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics, Centre of Seismological Research, since 1977. A preliminary reappraisal of the earthquake bulletins is carried out, in order to identify possible missing events and to remove spurious records (e.g. duplicates and explosions). The area of sufficient completeness is outlined; for this purpose, different techniques are applied, including a comparative analysis with global ISC data, which are available in the region for large and moderate size earthquakes. Various techniques are considered to estimate the average parameters that characterize the earthquake occurrence in the region, including the b-value and the fractal dimension of epicenters distribution. Specifically, besides the classical Gutenberg-Richter Law, the Unified Scaling Law for Earthquakes, USLE, is applied. Using the updated and revised OGS data, a new formal method for detection of earthquake clusters, based on nearest-neighbor distances of events in space-time-energy domain, is applied. The bimodality of the distribution, which characterizes the earthquake nearest-neighbor distances, is used to decompose the seismic catalog into sequences of individual clusters and background seismicity. Accordingly, the method allows for a data-driven identification of main shocks (first event with the largest magnitude in the cluster), foreshocks and aftershocks. Average robust estimates of the USLE parameters (particularly, b

  17. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    NASA Astrophysics Data System (ADS)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  18. A Detailed Statistical Analysis of the Mass Profiles of Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Host, Ole; Hansen, Steen H.

    2011-07-01

    The distribution of mass in the halos of galaxies and galaxy clusters has been probed observationally, theoretically, and in numerical simulations, yet there is still confusion about which of several suggested parameterized models is the better representation, and whether these models are universal. We use the temperature and density profiles of the intracluster medium as measured by X-ray observations of 11 relaxed galaxy clusters to investigate mass models for the halo using a thorough Bayesian statistical analysis. We make careful comparisons between two- and three-parameter models, including the issue of a universal third parameter. We find that, of the two-parameter models, the Navarro-Frenk-White (NFW) is the best representation, but we also find moderate statistical evidence that a generalized three-parameter NFW model with a freely varying inner slope is preferred, despite penalizing against the extra degree of freedom. There is a strong indication that this inner slope needs to be determined for each cluster individually, i.e., some clusters have central cores and others have steep cusps. The mass-concentration relation of our sample is in reasonable agreement with predictions based on numerical simulations.

  19. A DETAILED STATISTICAL ANALYSIS OF THE MASS PROFILES OF GALAXY CLUSTERS

    SciTech Connect

    Host, Ole; Hansen, Steen H.

    2011-07-20

    The distribution of mass in the halos of galaxies and galaxy clusters has been probed observationally, theoretically, and in numerical simulations, yet there is still confusion about which of several suggested parameterized models is the better representation, and whether these models are universal. We use the temperature and density profiles of the intracluster medium as measured by X-ray observations of 11 relaxed galaxy clusters to investigate mass models for the halo using a thorough Bayesian statistical analysis. We make careful comparisons between two- and three-parameter models, including the issue of a universal third parameter. We find that, of the two-parameter models, the Navarro-Frenk-White (NFW) is the best representation, but we also find moderate statistical evidence that a generalized three-parameter NFW model with a freely varying inner slope is preferred, despite penalizing against the extra degree of freedom. There is a strong indication that this inner slope needs to be determined for each cluster individually, i.e., some clusters have central cores and others have steep cusps. The mass-concentration relation of our sample is in reasonable agreement with predictions based on numerical simulations.

  20. Towards combined analysis of the most distant massive galaxy clusters with XMM and Chandra

    NASA Astrophysics Data System (ADS)

    Bartalucci, I.

    2016-06-01

    We present a detailed study of the gas and dark matter properties of the 5 most massive and distant, z ˜ 1, clusters detected via the Sunyaev-Zel'Dovich effect. These massive objects represent an ideal laboratory to test our models of structure evolution in a mass regime driven mainly by gravity. This work presents a new method to study these objects, where informations coming from XMM-Newton and Chandra instruments are efficiently combined. The combination of Chandra fine spatial resolution and XMM-Newton effective area allows us to efficiently investigate the properties of the Intra Cluster medium in the core and probe cluster outskirts. The resulting combined density profiles are used to fully characterize the thermodynamic and physical properties of the gas. Evolution properties are investigated from comparison with the REXCESS local galaxy cluster sample. In the context of the joint analysis of future Chandra and XMM large programs, we discuss the current limitations of this method and future prospects.

  1. Sequencing and transcriptional analysis of the biosynthesis gene cluster of putrescine-producing Lactococcus lactis.

    PubMed

    Ladero, Victor; Rattray, Fergal P; Mayo, Baltasar; Martín, María Cruz; Fernández, María; Alvarez, Miguel A

    2011-09-01

    Lactococcus lactis is a prokaryotic microorganism with great importance as a culture starter and has become the model species among the lactic acid bacteria. The long and safe history of use of L. lactis in dairy fermentations has resulted in the classification of this species as GRAS (General Regarded As Safe) or QPS (Qualified Presumption of Safety). However, our group has identified several strains of L. lactis subsp. lactis and L. lactis subsp. cremoris that are able to produce putrescine from agmatine via the agmatine deiminase (AGDI) pathway. Putrescine is a biogenic amine that confers undesirable flavor characteristics and may even have toxic effects. The AGDI cluster of L. lactis is composed of a putative regulatory gene, aguR, followed by the genes (aguB, aguD, aguA, and aguC) encoding the catabolic enzymes. These genes are transcribed as an operon that is induced in the presence of agmatine. In some strains, an insertion (IS) element interrupts the transcription of the cluster, which results in a non-putrescine-producing phenotype. Based on this knowledge, a PCR-based test was developed in order to differentiate nonproducing L. lactis strains from those with a functional AGDI cluster. The analysis of the AGDI cluster and their flanking regions revealed that the capacity to produce putrescine via the AGDI pathway could be a specific characteristic that was lost during the adaptation to the milk environment by a process of reductive genome evolution.

  2. A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

    USGS Publications Warehouse

    Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

    2012-01-01

    Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (p<0.0001). Conclusions: This study reveals that geographic hotspots of tractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.

  3. Combination of meta-analysis and graph clustering to identify prognostic markers of ESCC.

    PubMed

    Gao, Hongyun; Wang, Lishan; Cui, Shitao; Wang, Mingsong

    2012-04-01

    Esophageal squamous cell carcinoma (ESCC) is one of the most malignant gastrointestinal cancers and occurs at a high frequency rate in China and other Asian countries. Recently, several molecular markers were identified for predicting ESCC. Notwithstanding, additional prognostic markers, with a clear understanding of their underlying roles, are still required. Through bioinformatics, a graph-clustering method by DPClus was used to detect co-expressed modules. The aim was to identify a set of discriminating genes that could be used for predicting ESCC through graph-clustering and GO-term analysis. The results showed that CXCL12, CYP2C9, TGM3, MAL, S100A9, EMP-1 and SPRR3 were highly associated with ESCC development. In our study, all their predicted roles were in line with previous reports, whereby the assumption that a combination of meta-analysis, graph-clustering and GO-term analysis is effective for both identifying differentially expressed genes, and reflecting on their functions in ESCC.

  4. A Cluster Analysis of Constant Ambient Air Monitoring Data from the Kanto Region of Japan

    PubMed Central

    Iizuka, Atsushi; Shirato, Shintaro; Mizukoshi, Atsushi; Noguchi, Miyuki; Yamasaki, Akihiro; Yanagisawa, Yukio

    2014-01-01

    This study demonstrates an application of cluster analysis to constant ambient air monitoring data of four pollutants in the Kanto region: NOx, photochemical oxidant (Ox), suspended particulate matter, and non-methane hydrocarbons. Constant ambient air monitoring can provide important information about the surrounding atmospheric pollution. However, at the same time, ambient air monitoring can place a significant financial burden on some autonomous communities. Thus, it has been necessary to reduce both the number of monitoring stations and the number of chemicals monitored. To achieve this, it is necessary to identify those monitoring stations and pollutants that are least significant, while minimizing the loss of data quality and mitigating the effects on the determination of any spatial and temporal trends of the pollutants. Through employing cluster analysis, it was established that the ambient monitoring stations in the Kanto region could be clustered topologically for NOx and Ox into eight groups. From the results of this analysis, it was possible to identify the similarities in site characteristics and pollutant behaviors. PMID:24995597

  5. Space-time FSI modeling and dynamical analysis of spacecraft parachutes and parachute clusters

    NASA Astrophysics Data System (ADS)

    Takizawa, Kenji; Spielman, Timothy; Tezduyar, Tayfun E.

    2011-09-01

    Computer modeling of spacecraft parachutes, which are quite often used in clusters of two or three large parachutes, involves fluid-structure interaction (FSI) between the parachute canopy and the air, geometric complexities created by the construction of the parachute from "rings" and "sails" with hundreds of gaps and slits, and the contact between the parachutes. The Team for Advanced Flow Simulation and Modeling {({T bigstar AFSM})} has successfully addressed the computational challenges related to the FSI and geometric complexities, and recently started addressing the challenges related to the contact between the parachutes of a cluster. The core numerical technology is the stabilized space-time FSI technique developed and improved over the years by the {{T bigstar AFSM}} . The special technique used in dealing with the geometric complexities is the Homogenized Modeling of Geometric Porosity, which was also developed and improved in recent years by the {{T bigstar AFSM}} . In this paper we describe the technique developed by the {{T bigstar AFSM}} for modeling, in the context of an FSI problem, the contact between two structural surfaces. We show how we use this technique in dealing with the contact between parachutes. We present the results obtained with the FSI computation of parachute clusters, the related dynamical analysis, and a special decomposition technique for parachute descent speed to make that analysis more informative. We also present a special technique for extracting from a parachute FSI computation model parameters, such as added mass, that can be used in fast, approximate engineering analysis models for parachute dynamics.

  6. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    ERIC Educational Resources Information Center

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  7. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  8. Analysis of dynamic cerebral contrast-enhanced perfusion MRI time-series based on unsupervised clustering methods

    NASA Astrophysics Data System (ADS)

    Lange, Oliver; Meyer-Baese, Anke; Wismuller, Axel; Hurdal, Monica

    2005-03-01

    We employ unsupervised clustering techniques for the analysis of dynamic contrast-enhanced perfusion MRI time-series in patients with and without stroke. "Neural gas" network, fuzzy clustering based on deterministic annealing, self-organizing maps, and fuzzy c-means clustering enable self-organized data-driven segmentation w.r.t.fine-grained differences of signal amplitude and dynamics, thus identifying asymmetries and local abnormalities of brain perfusion. We conclude that clustering is a useful extension to conventional perfusion parameter maps.

  9. A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

    SciTech Connect

    Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.; Pebay, Philippe Pierre; Gentile, Ann C.; Thompson, David C.; Roe, Diana C.; De Sapio, Vincent; Brandt, James M.

    2010-08-01

    The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in job queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.

  10. Analysis of the project synthesis goal cluster orientation and inquiry emphasis of elementary science textbooks

    NASA Astrophysics Data System (ADS)

    Staver, John R.; Bay, Mary

    The purpose of this descriptive study was to examine selected units of commonly used elementary science texts, using the Project Synthesis goal clusters as a framework for part of the examination. An inquiry classification scheme was used for the remaining segment. Four questions were answered: (1) To what extent do elementary science textbooks focus on each Project Synthesis goal cluster? (2) In which part of the text is such information found? (3) To what extent are the activities and experiments merely verifications of information already introduced in the text? (4) If inquiry is present in an activity, then what is the level of such inquiry?Eleven science textbook series, which comprise approximately 90 percent of the national market, were selected for analysis. Two units, one primary (K-3) and one intermediate (4-6), were selected for analysis by first identifying units common to most series, then randomly selecting one primary and one intermediate unit for analysis.Each randomly selected unit was carefully read, using the sentence as the unit of analysis. Each declarative and interrogative sentence in the body of the text was classified as: (1) academic; (2) personal; (3) career; or (4) societal in its focus. Each illustration, except those used in evaluation items, was similarly classified. Each activity/experiment and each miscellaneous sentence in end-of-chapter segments labelled review, summary, evaluation, etc., were similarly classified. Finally, each activity/experiment, as a whole, was categorized according to a four-category inquiry scheme (confirmation, structured inquiry, guided inquiry, open inquiry).In general, results of the analysis are: (1) most text prose focuses on academic science; (2) most remaining text prose focuses on the personal goal cluster; (3) the career and societal goal clusters receive only minor attention; (4) text illustrations exhibit a pattern similar to text prose; (5) text activities/experiments are academic in orientation

  11. Clustering-initiated factor analysis application for tissue classification in dynamic brain positron emission tomography

    PubMed Central

    Boutchko, Rostyslav; Mitra, Debasis; Baker, Suzanne L; Jagust, William J; Gullberg, Grant T

    2015-01-01

    The goal is to quantify the fraction of tissues that exhibit specific tracer binding in dynamic brain positron emission tomography (PET). It is achieved using a new method of dynamic image processing: clustering-initiated factor analysis (CIFA). Standard processing of such data relies on region of interest analysis and approximate models of the tracer kinetics and of tissue properties, which can degrade accuracy and reproducibility of the analysis. Clustering-initiated factor analysis allows accurate determination of the time–activity curves and spatial distributions for tissues that exhibit significant radiotracer concentration at any stage of the emission scan, including the arterial input function. We used this approach in the analysis of PET images obtained using 11C-Pittsburgh Compound B in which specific binding reflects the presence of β-amyloid. The fraction of the specific binding tissues determined using our approach correlated with that computed using the Logan graphical analysis. We believe that CIFA can be an accurate and convenient tool for measuring specific binding tissue concentration and for analyzing tracer kinetics from dynamic images for a variety of PET tracers. As an illustration, we show that four-factor CIFA allows extraction of two blood curves and the corresponding distributions of arterial and venous blood from PET images even with a coarse temporal resolution. PMID:25899294

  12. Pareto-optimal clustering scheme using data aggregation for wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Azad, Puneet; Sharma, Vidushi

    2015-07-01

    The presence of cluster heads (CHs) in a clustered wireless sensor network (WSN) leads to improved data aggregation and enhanced network lifetime. Thus, the selection of appropriate CHs in WSNs is a challenging task, which needs to be addressed. A multicriterion decision-making approach for the selection of CHs is presented using Pareto-optimal theory and technique for order preference by similarity to ideal solution (TOPSIS) methods. CHs are selected using three criteria including energy, cluster density and distance from the sink. The overall network lifetime in this method with 50% data aggregation after simulations is 81% higher than that of distributed hierarchical agglomerative clustering in similar environment and with same set of parameters. Optimum number of clusters is estimated using TOPSIS technique and found to be 9-11 for effective energy usage in WSNs.

  13. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations and numerical simulations

    NASA Technical Reports Server (NTRS)

    Darouzet, Fabien; DeKeyser, Johan; Decreau, Pierrette; Gallagher, Dennis; Pierrard, Viviane; Lemaire, Joseph; Dandouras, Iannis; Matsui, Hiroshi; Dunlop, Malcolm; Andre, Mats

    2005-01-01

    Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles can be derived from the plasma frequency and/or from the spacecraft potential (note that the electron spectrometer is usually not operating inside the plasmasphere); ion velocity is also measured onboard these satellites (but ion density is not reliable because of instrumental limitations). The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 minutes; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations for 3 plume events and compare CLUSTER in-situ data (panel A) with global images of the plasmasphere obtained from IMAGE (panel B), and with numerical simulations for the formation of plumes based on a model that includes the interchange instability mechanism (panel C). In particular, we study the geometry and the orientation of plasmaspheric plumes by using a four-point analysis method, the spatial gradient. We also compare several aspects of their motion as determined by different methods: (i) inner and outer plume boundary velocity calculated from time delays of this boundary observed by the wave experiment WHISPER on the four spacecraft, (ii) ion velocity derived from the ion spectrometer CIS onboard CLUSTER, (iii) drift velocity measured by the electron drift instrument ED1 onboard CLUSTER and (iv) global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  14. Identification of responders to inhaled corticosteroids in a chronic obstructive pulmonary disease population using cluster analysis

    PubMed Central

    Hinds, David R; DiSantostefano, Rachael L; Le, Hoa V; Pascoe, Steven

    2016-01-01

    Objectives To identify clusters of patients who may benefit from treatment with an inhaled corticosteroid (ICS)/long-acting β2 agonist (LABA) versus LABA alone, in terms of exacerbation reduction, and to validate previously identified clusters of patients with chronic obstructive pulmonary disease (COPD) (based on diuretic use and reversibility). Design Post hoc supervised cluster analysis using a modified recursive partitioning algorithm of two 1-year randomised, controlled trials of fluticasone furoate (FF)/vilanterol (VI) versus VI alone, with the primary end points of the annual rate of moderate-to-severe exacerbations. Setting Global. Participants 3255 patients with COPD (intent-to-treat populations) with a history of exacerbations in the past year. Interventions FF/VI 50/25 µg, 100/25 µg or 200/25 µg, or VI 25 µg; all one time per day. Outcome measures Mean annual COPD exacerbation rate to identify clusters of patients who benefit from adding an ICS (FF) to VI bronchodilator therapy. Results Three clusters were identified, including two groups that benefit from FF/VI versus VI: patients with blood eosinophils >2.4% (RR=0.68, 95% CI 0.58 to 0.79), or blood eosinophils ≤2.4% and smoking history ≤46 pack-years, experienced a reduced rate of exacerbations with FF/VI versus VI (RR=0.78, 95% CI 0.63 to 0.96), whereas those with blood eosinophils ≤2.4% and smoking history >46 pack-years were identified as non-responders (RR=1.22, 95% CI 0.94 to 1.58). Clusters of patients previously identified in the fluticasone propionate/salmeterol (SAL) versus SAL trials of similar design were not validated; all clusters of patients tended to benefit from FF/VI versus VI alone irrespective of diuretic use and reversibility. Conclusions In patients with COPD with a history of exacerbations, those with greater blood eosinophils or a lower smoking history may benefit more from ICS/LABA versus LABA alone as measured by a reduced rate of exacerbations. In terms of

  15. Three-Dimensional Measurement and Cluster Analysis for Determining the Size Ranges of Chinese Temporomandibular Joint Replacement Prosthesis

    PubMed Central

    Zhang, Lu-Zhu; Meng, Shuai-Shuai; He, Dong-Mei; Fu, Yu-Zhuo; Liu, Ting; Wang, Fei-Yu; Dong, Min-Jun; Chang, Yu-Si

    2016-01-01

    Abstract The aim of this study was to investigate the osseous characteristics of Chinese temporomandibular joint (TMJ) and detect the size clusters for total joint prostheses design. Computer tomography (CT) data from 448 Chinese adults (226 male and 222 female, aged from 20 to 83 years, mean age 39.3 years) with 896 normal TMJs were chosen from the Department of Radiology in the Shanghai 9th People's Hospital. Proplan CMF 1.4 software was used to reconstruct the skulls. Three-dimensional (3D) measurements of the TMJ fossa and condyle-ramus units with 13 parameters were performed. Size clusters for prostheses design were determined by hierarchical cluster analyses, nonhierarchical (K-means) cluster analysis, and discriminant analysis. The glenoid fossa was grouped into 3 clusters, and the condyle-ramus units were grouped into 4 clusters. Discriminant analyses were capable of correctly classifying 97.24% of the glenoid fossa and 94.98% of the condyle-ramus units. The means and standard deviations for the parameter values in each cluster were determined. Fossa depth and angles between the condyle and ramus were important parameters for Chinese TMJ prostheses design. 3D measurements and cluster analysis of the osseous morphology of the TMJ provided an anatomical reference and identified the dimensions of the minimum numbers of prosthesis sizes required for Chinese TMJ replacement. PMID:26937929

  16. AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

    NASA Astrophysics Data System (ADS)

    Federici, Memmo; Martino, Bruno Luigi; Ubertini, Pietro

    2012-07-01

    In this paper we describe a new computing system array, designed, built and now used at the Space Astrophysics and Planetary Institute (IAPS) in Rome, Italy, for the INTEGRAL Space Observatory scientific data analysis. This new system has become necessary in order to reduce the processing time of the INTEGRAL data accumulated during the more than 9 years of in-orbit operation. In order to fulfill the scientific data analysis requirements with a moderately limited investment the starting approach has been to use a `cluster' array of commercial quad-CPU computers, featuring the extremely large scientific and calibration data archive on line.

  17. Diversity of Xiphinema americanum-group Species and Hierarchical Cluster Analysis of Morphometrics.

    PubMed

    Lamberti, F; Ciancio, A

    1993-09-01

    Of the 39 species composing the Xiphinema americanum group, 14 were described originally from North America and two others have been reported from this region. Many species are very similar morphologically and can be distinguished only by a difficult comparison of various combinations of some morphometric characters. Study of morphometrics of 49 populations, including the type populations of the 39 species attributed to this group, by principal component analysis and hierarchical cluster analysis placed the populations into five subgroups, proposed here as the X. brevicolle subgroup (seven species), the X. americanum subgroup (17 species), the X. taylori subgroup (two species), the X. pachtaicum subgroup (eight species), and the X. lambertii subgroup (five species).

  18. Study of cluster analysis used in explosives classification with laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Wang, Q. Q.; He, L. A.; Zhao, Y.; Peng, Z.; Liu, L.

    2016-06-01

    Supervised learning methods (such as partial least squares regression-discriminant analysis, SIMCA, etc) are widely used in explosives recognition. The correct classification rate may be lowered if a sample or substrate is not included in the training dataset. Unsupervised learning methods (such as hierarchical clustering analysis, K-means, etc) have the potential to solve this problem. In this paper we analyzed results of using as input variables the intensities of seven lines and then five intensity ratios of the seven lines. It was demonstrated that unsupervised learning methods had the ability to achieve a better classification result.

  19. Multichannel biomedical time series clustering via hierarchical probabilistic latent semantic analysis.

    PubMed

    Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary

    2014-11-01

    Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management.

  20. A comprehensive comparison of different clustering methods for reliability analysis of microarray data.

    PubMed

    Kafieh, Rahele; Mehridehnavi, Alireza

    2013-01-01

    In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task.

  1. A Comprehensive Comparison of Different Clustering Methods for Reliability Analysis of Microarray Data

    PubMed Central

    Kafieh, Rahele; Mehridehnavi, Alireza

    2013-01-01

    In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task. PMID:24083134

  2. Cluster Analysis of Physical and Cognitive Ageing Patterns in Older People from Shanghai.

    PubMed

    Bandelow, Stephan; Xu, Xin; Xiao, Shifu; Hogervorst, Eef

    2016-01-01

    This study investigated the relationship between education, cognitive and physical function in older age, and their respective impacts on activities of daily living (ADL). Data on 148 older participants from a community-based sample recruited in Shanghai, China, included the following measures: age, education, ADL, grip strength, balance, gait speed, global cognition and verbal memory. The majority of participants in the present cohort were cognitively and physically healthy and reported no problems with ADL. Twenty-eight percent of participants needed help with ADL, with the majority of this group being over 80 years of age. Significant predictors of reductions in functional independence included age, balance, global cognitive function (MMSE) and the gait measures. Cluster analysis revealed a protective effect of education on cognitive function that did not appear to extend to physical function. Consistency of such phenotypes of ageing clusters in other cohort studies may provide helpful models for dementia and frailty prevention measures. PMID:26907351

  3. Numerical Analysis of Base Flowfield for a Four-Engine Clustered Nozzle Configuration

    NASA Technical Reports Server (NTRS)

    Wang, Ten-See

    1995-01-01

    Excessive base heating has been a problem for many launch vehicles. For certain designs such as the direct dump of turbine exhaust inside and at the lip of the nozzle, the potential burning of the turbine exhaust in the base region can be of great concern. Accurate prediction of the base environment at altitudes is therefore very important during the vehicle design phase. Otherwise, undesirable consequences may occur. In this study, the turbulent base flowfield of a cold flow experimental investigation for a four-engine clustered nozzle was numerically benchmarked using a pressure-based computational fluid dynamics (CFD) method. This is a necessary step before the benchmarking of hot flow and combustion flow tests can be considered. Since the medium was unheated air, reasonable prediction of the base pressure distribution at high altitude was the main goal. Several physical phenomena pertaining to the multiengine clustered nozzle base flow physics were deduced from the analysis.

  4. Cluster Analysis of Physical and Cognitive Ageing Patterns in Older People from Shanghai

    PubMed Central

    Bandelow, Stephan; Xu, Xin; Xiao, Shifu; Hogervorst, Eef

    2016-01-01

    This study investigated the relationship between education, cognitive and physical function in older age, and their respective impacts on activities of daily living (ADL). Data on 148 older participants from a community-based sample recruited in Shanghai, China, included the following measures: age, education, ADL, grip strength, balance, gait speed, global cognition and verbal memory. The majority of participants in the present cohort were cognitively and physically healthy and reported no problems with ADL. Twenty-eight percent of participants needed help with ADL, with the majority of this group being over 80 years of age. Significant predictors of reductions in functional independence included age, balance, global cognitive function (MMSE) and the gait measures. Cluster analysis revealed a protective effect of education on cognitive function that did not appear to extend to physical function. Consistency of such phenotypes of ageing clusters in other cohort studies may provide helpful models for dementia and frailty prevention measures. PMID:26907351

  5. Supercomputer and cluster performance modeling and analysis efforts:2004-2006.

    SciTech Connect

    Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold Edward; Stevenson, Joel O.; Benner, Robert E., Jr.; Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude

    2007-02-01

    This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

  6. Using Image Processing Techniques for Cluster Analysis, and Droplet Formation in Phase Separating Fluids

    NASA Astrophysics Data System (ADS)

    Smith, Gregory; Oprisan, Ana; Hegseth, John; Oprisan, Sorinel; Lecoutre, Carole; Garrabos, Yves; Beysens, Daniel

    2009-03-01

    A series of experiments were performed using the Alice II apparatus in microgravity to study phase separation near critical temperature. Using image analysis techniques, we were able to obtain quantitative information regarding the morphology of gas-liquid interface near critical point of pure SF6 fluid in microgravity. Growth laws for liquid and gas clusters were extracted based on image segmentation both with thresholding and k-means clustering. By measuring the image features we analyzed the formation of spherical droplets during late stage of phase separation for a series of full view images. The growth of a wetting layer around the border of the cell containing the fluid was also investigated using image processing techniques.

  7. Cluster Analysis of Physical and Cognitive Ageing Patterns in Older People from Shanghai.

    PubMed

    Bandelow, Stephan; Xu, Xin; Xiao, Shifu; Hogervorst, Eef

    2016-01-01

    This study investigated the relationship between education, cognitive and physical function in older age, and their respective impacts on activities of daily living (ADL). Data on 148 older participants from a community-based sample recruited in Shanghai, China, included the following measures: age, education, ADL, grip strength, balance, gait speed, global cognition and verbal memory. The majority of participants in the present cohort were cognitively and physically healthy and reported no problems with ADL. Twenty-eight percent of participants needed help with ADL, with the majority of this group being over 80 years of age. Significant predictors of reductions in functional independence included age, balance, global cognitive function (MMSE) and the gait measures. Cluster analysis revealed a protective effect of education on cognitive function that did not appear to extend to physical function. Consistency of such phenotypes of ageing clusters in other cohort studies may provide helpful models for dementia and frailty prevention measures.

  8. Cluster analysis and quality assessment of logged water at an irrigation project, eastern Saudi Arabia.

    PubMed

    Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid

    2008-01-01

    A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.

  9. Cloning and Analysis of the Planosporicin Lantibiotic Biosynthetic Gene Cluster of Planomonospora alba

    PubMed Central

    Sherwood, Emma J.; Hesketh, Andrew R.

    2013-01-01

    The increasing prevalence of antibiotic resistance in bacterial pathogens has renewed focus on natural products with antimicrobial properties. Lantibiotics are ribosomally synthesized peptide antibiotics that are posttranslationally modified to introduce (methyl)lanthionine bridges. Actinomycetes are renowned for their ability to produce a large variety of antibiotics, many with clinical applications, but are known to make only a few lantibiotics. One such compound is planosporicin produced by Planomonospora alba, which inhibits cell wall biosynthesis in Gram-positive pathogens. Planosporicin is a type AI lantibiotic structurally similar to those which bind lipid II, the immediate precursor for cell wall biosynthesis. The gene cluster responsible for planosporicin biosynthesis was identified by genome mining and subsequently isolated from a P. alba cosmid library. A minimal cluster of 15 genes sufficient for planosporicin production was defined by heterologous expression in Nonomuraea sp. strain ATCC 39727, while deletion of the gene encoding the precursor peptide from P. alba, which abolished planosporicin production, was also used to confirm the identity of the gene cluster. Deletion of genes encoding likely biosynthetic enzymes identified through bioinformatic analysis revealed that they, too, are essential for planosporicin production in the native host. Reverse transcription-PCR (RT-PCR) analysis indicated that the planosporicin gene cluster is transcribed in three operons. Expression of one of these, pspEF, which encodes an ABC transporter, in Streptomyces coelicolor A3(2) conferred some degree of planosporicin resistance on the heterologous host. The inability to delete these genes from P. alba suggests that they play an essential role in immunity in the natural producer. PMID:23475977

  10. Cloning and analysis of the planosporicin lantibiotic biosynthetic gene cluster of Planomonospora alba.

    PubMed

    Sherwood, Emma J; Hesketh, Andrew R; Bibb, Mervyn J

    2013-05-01

    The increasing prevalence of antibiotic resistance in bacterial pathogens has renewed focus on natural products with antimicrobial properties. Lantibiotics are ribosomally synthesized peptide antibiotics that are posttranslationally modified to introduce (methyl)lanthionine bridges. Actinomycetes are renowned for their ability to produce a large variety of antibiotics, many with clinical applications, but are known to make only a few lantibiotics. One such compound is planosporicin produced by Planomonospora alba, which inhibits cell wall biosynthesis in Gram-positive pathogens. Planosporicin is a type AI lantibiotic structurally similar to those which bind lipid II, the immediate precursor for cell wall biosynthesis. The gene cluster responsible for planosporicin biosynthesis was identified by genome mining and subsequently isolated from a P. alba cosmid library. A minimal cluster of 15 genes sufficient for planosporicin production was defined by heterologous expression in Nonomuraea sp. strain ATCC 39727, while deletion of the gene encoding the precursor peptide from P. alba, which abolished planosporicin production, was also used to confirm the identity of the gene cluster. Deletion of genes encoding likely biosynthetic enzymes identified through bioinformatic analysis revealed that they, too, are essential for planosporicin production in the native host. Reverse transcription-PCR (RT-PCR) analysis indicated that the planosporicin gene cluster is transcribed in three operons. Expression of one of these, pspEF, which encodes an ABC transporter, in Streptomyces coelicolor A3(2) conferred some degree of planosporicin resistance on the heterologous host. The inability to delete these genes from P. alba suggests that they play an essential role in immunity in the natural producer.

  11. Automated regional registration and characterization of corresponding microcalcification clusters on temporal pairs of mammograms for interval change analysis

    SciTech Connect

    Filev, Peter; Hadjiiski, Lubomir; Chan, Heang-Ping; Sahiner, Berkman; Ge Jun; Helvie, Mark A.; Roubidoux, Marilyn; Zhou Chuan

    2008-12-15

    A computerized regional registration and characterization system for analysis of microcalcification clusters on serial mammograms is being developed in our laboratory. The system consists of two stages. In the first stage, based on the location of a detected cluster on the current mammogram, a regional registration procedure identifies the local area on the prior that may contain the corresponding cluster. A search program is used to detect cluster candidates within the local area. The detected cluster on the current image is then paired with the cluster candidates on the prior image to form true (TP-TP) or false (TP-FP) pairs. Automatically extracted features were used in a newly designed correspondence classifier to reduce the number of false pairs. In the second stage, a temporal classifier, based on both current and prior information, is used if a cluster has been detected on the prior image, and a current classifier, based on current information alone, is used if no prior cluster has been detected. The data set used in this study consisted of 261 serial pairs containing biopsy-proven calcification clusters. An MQSA radiologist identified the corresponding clusters on the mammograms. On the priors, the radiologist rated the subtlety of 30 clusters (out of the 261 clusters) as 9 or 10 on a scale of 1 (very obvious) to 10 (very subtle). Leave-one-case-out resampling was used for feature selection and classification in both the correspondence and malignant/benign classification schemes. The search program detected 91.2%(238/261) of the clusters on the priors with an average of 0.42 FPs/image. The correspondence classifier identified 86.6%(226/261) of the TP-TP pairs with 20 false matches (0.08 FPs/image) relative to the entire set of 261 image pairs. In the malignant/benign classification stage the temporal classifier achieved a test A{sub z} of 0.81 for the 246 pairs which contained a detection on the prior. In addition, a classifier was designed by using the

  12. Emergy-based comparative analysis on industrial clusters: economic and technological development zone of Shenyang area, China.

    PubMed

    Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi

    2014-09-01

    In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.

  13. A process flood typology along an Alpine transect: classification based on cluster analysis

    NASA Astrophysics Data System (ADS)

    Zoccatelli, Davide; Parajka, Juraj; Gaal, Ladislav; Blöschl, Günter; Borga, Marco

    2015-04-01

    Flood classification according with their causative processes helps to understand how flood regimes change across climates. The aim of this work is to create a flood classification scheme along a longitudinal Alpine transect spanning 200 km in a North-South direction. The investigation is focused on the analysis of floods that have similar properties and can be defined as a type. After the definition of flood types we analyzed their properties, their spatial organization and the relation with the topography of the transect. Precipitation and temperature follow a sharp gradient across the transect, with both precipitation and temperature low around the main alpine ridge. Along this gradient the causative processes of floods are changing, modifying the flood regimes of catchments. The three main floods each year on 33 alpine basins (from 50 to 500 km2) are isolated from about 20 years of hourly discharge. An hydrological model simulates the catchment conditions at the begin of each event. For each flood we created a set of indexes to describe hydrograph properties, meteorological inputs and catchment conditions. A cluster analysis on these indexes defined how many flood types can be found in our data and what are their unique properties. Successively a classification tree analysis defined the best criteria to identify those clusters. Results indicate that transect floods are best divided in three clusters, that can be related with Snowmelt, Rain and Flash Floods. The successive classification tree analysis showed that a good classification can also be achieved using few criteria, but that the application of an hydrological model is useful to identify snowmelt events. The distribution of these flood types in space and time across the Alps is reported, and it is in agreement with the processes involved. This approach proved, across different climates, to be able to identify groups of floods that could be related with the driving processes, and to define and evaluate

  14. Cluster and Principal Component Analysis of Human Glioblastoma Multiforme (GBM) Tumor Proteome

    PubMed Central

    Pooladi, Mehdi; Rezaei-Tavirani, Mostafa; Hashemi, Mehrdad; Hesami-Tackallou, Saeed; Khaghani-Razi-Abad, Solmaz; Moradi, Afshin; Zali, Ali Reza; Mousavi, Masoumeh; Firozi-Dalvand, Leila; Rakhshan, Azadeh; Zamanian Azodi, Mona

    2014-01-01

    Background Glioblastoma Multiforme (GBM) or grade IV astrocytoma is the most common and lethal adult malignant brain tumor. Several of the molecular alterations detected in gliomas may have diagnostic and/or prognostic implications. Proteomics has been widely applied in various areas of science, ranging from the deciphering of molecular pathogen nests of discuses. Methods In this study proteins were extracted from the tumor and normal brain tissues and then the protein purity was evaluated by Bradford test and spectrophotometry. In this study, proteins were separated by 2-Dimensional Gel (2DG) electrophoresis method and the spots were then analyzed and compared using statistical data and specific software. Protein clustering analysis was performed on the list of proteins deemed significantly altered in glioblastoma tumors (t-test and one-way ANOVA; P< 0.05). Results The 2D gel showed totally 876 spots. We reported, 172 spots were exhibited differently in expression level (fold > 2) for glioblastoma. On each analytical 2D gel, an average of 876 spots was observed. In this study, 188 spots exhibited up regulation of expression level, whereas the remaining 232 spots were decreased in glioblastoma tumor relative to normal tissue. Results demonstrate that functional clustering (up and down regulated) and Principal Component Analysis (PCA) has considerable merits in aiding the interpretation of proteomic data. Conclusion 2D gel electrophoresis is the core of proteomics which permitted the separation of thousands of proteins. High resolution 2DE can resolve up to 5,000 proteins simultaneously. Using cluster analysis, we can also form groups of related variables, similar to what is practiced in factor analysis. PMID:25250155

  15. The use of the wavelet cluster analysis for asteroid family determination

    NASA Technical Reports Server (NTRS)

    Benjoya, Phillippe; Slezak, E.; Froeschle, Claude

    1992-01-01

    The asteroid family determination has been analysis method dependent for a longtime. A new cluster analysis based on the wavelet transform has allowed an automatic definition of families with a degree of significance versus randomness. Actually this method is rather general and can be applied to any kind of structural analysis. We will rather concentrate on the main features of the method. The analysis has been performed on the set of 4100 asteroid proper elements computed by Milani and Knezevic (see Milani and Knezevic 1990). Twenty one families have been found and influence of the chosen metric has been tested. The results have beem compared to Zappala et al.'s ones (see Zappala et al 1990) obtained by the use of a completely different method applied to the same set of data. For the first time, a good overlapping has been found between both method results, not only for the big well known families but also for the smallest ones.

  16. JOINT ANALYSIS OF CLUSTER OBSERVATIONS. II. CHANDRA/XMM-NEWTON X-RAY AND WEAK LENSING SCALING RELATIONS FOR A SAMPLE OF 50 RICH CLUSTERS OF GALAXIES

    SciTech Connect

    Mahdavi, Andisheh; Hoekstra, Henk; Babul, Arif; Bildfell, Chris; Jeltema, Tesla; Henry, J. Patrick

    2013-04-20

    We present a study of multiwavelength X-ray and weak lensing scaling relations for a sample of 50 clusters of galaxies. Our analysis combines Chandra and XMM-Newton data using an energy-dependent cross-calibration. After considering a number of scaling relations, we find that gas mass is the most robust estimator of weak lensing mass, yielding 15% {+-} 6% intrinsic scatter at r{sub 500}{sup WL} (the pseudo-pressure Y{sub X} yields a consistent scatter of 22% {+-} 5%). The scatter does not change when measured within a fixed physical radius of 1 Mpc. Clusters with small brightest cluster galaxy (BCG) to X-ray peak offsets constitute a very regular population whose members have the same gas mass fractions and whose even smaller (<10%) deviations from regularity can be ascribed to line of sight geometrical effects alone. Cool-core clusters, while a somewhat different population, also show the same (<10%) scatter in the gas mass-lensing mass relation. There is a good correlation and a hint of bimodality in the plane defined by BCG offset and central entropy (or central cooling time). The pseudo-pressure Y{sub X} does not discriminate between the more relaxed and less relaxed populations, making it perhaps the more even-handed mass proxy for surveys. Overall, hydrostatic masses underestimate weak lensing masses by 10% on the average at r{sub 500}{sup WL}; but cool-core clusters are consistent with no bias, while non-cool-core clusters have a large and constant 15%-20% bias between r{sub 2500}{sup WL} and r{sub 500}{sup WL}, in agreement with N-body simulations incorporating unthermalized gas. For non-cool-core clusters, the bias correlates well with BCG ellipticity. We also examine centroid shift variance and power ratios to quantify substructure; these quantities do not correlate with residuals in the scaling relations. Individual clusters have for the most part forgotten the source of their departures from self-similarity.

  17. Clustering drug-drug interaction networks with energy model layouts: community analysis and drug repurposing

    PubMed Central

    Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai

    2016-01-01

    Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720

  18. Selecting background galaxies in weak-lensing analysis of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Formicola, I.; Radovich, M.; Meneghetti, M.; Mazzotta, P.; Grado, A.; Giocoli, C.

    2016-05-01

    In this paper, we present a new method to select the faint, background galaxies used to derive the mass of galaxy clusters by weak lensing. The method is based on the simultaneous analysis of the shear signal, that should be consistent with zero for the foreground, unlensed galaxies, and of the colours of the galaxies: photometric data from the COSMic evOlution Survey are used to train the colour selection. In order to validate this methodology, we test it against a set of state-of-the-art image simulations of mock galaxy clusters in different redshift [0.23-0.45] and mass [0.5-1.55 × 1015 M⊙] ranges, mimicking medium-deep multicolour imaging observations [e.g. Subaru, Large Binocular Telescope]. The performance of our method in terms of contamination by unlensed sources is comparable to a selection based on photometric redshifts, which however requires a good spectral coverage and is thus much more observationally demanding. The application of our method to simulations gives an average ratio between estimated and true masses of ˜0.98 ± 0.09. As a further test, we finally apply our method to real data, and compare our results with other weak-lensing mass estimates in the literature: for this purpose, we choose the cluster Abell 2219 (z = 0.228), for which multiband (BVRi) data are publicly available.

  19. Individual organisms as units of analysis: Bayesian-clustering alternatives in population genetics.

    PubMed

    Mank, Judith E; Avise, John C

    2004-12-01

    Population genetic analyses traditionally focus on the frequencies of alleles or genotypes in 'populations' that are delimited a priori. However, there are potential drawbacks of amalgamating genetic data into such composite attributes of assemblages of specimens: genetic information on individual specimens is lost or submerged as an inherent part of the analysis. A potential also exists for circular reasoning when a population's initial identification and subsequent genetic characterization are coupled. In principle, these problems are circumvented by some newer methods of population identification and individual assignment based on statistical clustering of specimen genotypes. Here we evaluate a recent method in this genre--Bayesian clustering--using four genotypic data sets involving different types of molecular markers in non-model organisms from nature. As expected, measures of population genetic structure (F(ST) and phiST) tended to be significantly greater in Bayesian a posteriori data treatments than in analyses where populations were delimited a priori. In the four biological contexts examined, which involved both geographic population structures and hybrid zones, Bayesian clustering was able to recover differentiated populations, and Bayesian assignments were able to identify likely population sources of specific individuals.

  20. Quasichemical analysis of the cluster-pair approximation for the thermodynamics of proton hydration

    NASA Astrophysics Data System (ADS)

    Pollard, Travis; Beck, Thomas L.

    2014-06-01

    A theoretical analysis of the cluster-pair approximation (CPA) is presented based on the quasichemical theory of solutions. The sought single-ion hydration free energy of the proton includes an interfacial potential contribution by definition. It is shown, however, that the CPA involves an extra-thermodynamic assumption that does not guarantee uniform convergence to a bulk free energy value with increasing cluster size. A numerical test of the CPA is performed using the classical polarizable AMOEBA force field and supporting quantum chemical calculations. The enthalpy and free energy differences are computed for the kosmotropic Na+/F- ion pair in water clusters of size n = 5, 25, 105. Additional calculations are performed for the chaotropic Rb+/I- ion pair. A small shift in the proton hydration free energy and a larger shift in the hydration enthalpy, relative to the CPA values, are predicted based on the n = 105 simulations. The shifts arise from a combination of sequential hydration and interfacial potential effects. The AMOEBA and quantum chemical results suggest an electrochemical surface potential of water in the range -0.4 to -0.5 V. The physical content of single-ion free energies and implications for ion-water force field development are also discussed.

  1. Cluster Analysis of Atmospheric Dynamics and Pollution Transport in a Coastal Area

    NASA Astrophysics Data System (ADS)

    Sokolov, Anton; Dmitriev, Egor; Maksimovich, Elena; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmentin, Marc; Locoge, Nadine

    2016-06-01

    Summertime atmospheric dynamics in the coastal zone of the industrialized Dunkerque agglomeration in northern France was characterized by a cluster analysis of back trajectories in the context of pollution transport. The MESO-NH atmospheric model was used to simulate the local dynamics at multiple scales with horizontal resolution down to 500 m, and for the online calculation of the Lagrangian backward trajectories with 30-min temporal resolution. Airmass transport was performed along six principal pathways obtained by the weighted k-means clustering technique. Four of these centroids corresponded to a range of wind speeds over the English Channel: two for wind directions from the north-east and two from the south-west. Another pathway corresponded to a south-westerly continental transport. The backward trajectories of the largest and most dispersed sixth cluster contained low wind speeds, including sea-breeze circulations. Based on analyses of meteorological data and pollution measurements, the principal atmospheric pathways were related to local air-contamination events. Continuous air quality and meteorological data were collected during the Benzene-Toluene-Ethylbenzene-Xylene 2006 campaign. The sites of the pollution measurements served as the endpoints for the backward trajectories. Pollutant transport pathways corresponding to the highest air contamination were defined.

  2. Comparison of similarity coefficients used for cluster analysis based on RAPD markers in wild olives.

    PubMed

    Sesli, M; Yegenoglu, E D

    2010-11-16

    Five different similarity coefficients (Jaccard, Sorensen-Dice, simple matching, Rogers and Tanimoto, and Russel and Rao) were evaluated and 10 wild olives analyzed with RAPD markers. The influence of the similarity coefficients on wild olives clustering was investigated. Forty-five primers were used on samples from 10 wild olives (Wild 1 and 2 obtained from Mugla province; Wild 3, 4, 5, 6, 7, and 8 from Manisa province and Wild 9 and 10 from Izmir province of Turkey). The similarity matrices obtained from RAPD markers were compared by the Mantel test. Cluster analysis was made with UPGMA dendrograms, and the consensus fork indexes between all pairs of dendrograms were calculated. The Jaccard and Sorensen-Dice coefficients gave the same results, due to the fact that both exclude negative co-occurrences. The dendrograms using the simple matching and Rogers and Tanimoto coefficients were similar; Wild 4 (Akhisar, Manisa) and Wild 9 (Bornova, Izmir) olives had the closest genetic similarities. This occurred because these coefficients include negative co-occurrences. The Russel and Rao coefficients produced different results, because they include negative co-occurrences in the denominator. We concluded that the coefficients that do not include negative co-occurrences are more efficient for studies of wild olives clustering based on RAPD markers.

  3. Quasichemical analysis of the cluster-pair approximation for the thermodynamics of proton hydration

    SciTech Connect

    Pollard, Travis; Beck, Thomas L.

    2014-06-14

    A theoretical analysis of the cluster-pair approximation (CPA) is presented based on the quasichemical theory of solutions. The sought single-ion hydration free energy of the proton includes an interfacial potential contribution by definition. It is shown, however, that the CPA involves an extra-thermodynamic assumption that does not guarantee uniform convergence to a bulk free energy value with increasing cluster size. A numerical test of the CPA is performed using the classical polarizable AMOEBA force field and supporting quantum chemical calculations. The enthalpy and free energy differences are computed for the kosmotropic Na{sup +}/F{sup −} ion pair in water clusters of size n = 5, 25, 105. Additional calculations are performed for the chaotropic Rb{sup +}/I{sup −} ion pair. A small shift in the proton hydration free energy and a larger shift in the hydration enthalpy, relative to the CPA values, are predicted based on the n = 105 simulations. The shifts arise from a combination of sequential hydration and interfacial potential effects. The AMOEBA and quantum chemical results suggest an electrochemical surface potential of water in the range −0.4 to −0.5 V. The physical content of single-ion free energies and implications for ion-water force field development are also discussed.

  4. Cluster Analysis of Atmospheric Dynamics and Pollution Transport in a Coastal Area

    NASA Astrophysics Data System (ADS)

    Sokolov, Anton; Dmitriev, Egor; Maksimovich, Elena; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmentin, Marc; Locoge, Nadine

    2016-11-01

    Summertime atmospheric dynamics in the coastal zone of the industrialized Dunkerque agglomeration in northern France was characterized by a cluster analysis of back trajectories in the context of pollution transport. The MESO-NH atmospheric model was used to simulate the local dynamics at multiple scales with horizontal resolution down to 500 m, and for the online calculation of the Lagrangian backward trajectories with 30-min temporal resolution. Airmass transport was performed along six principal pathways obtained by the weighted k-means clustering technique. Four of these centroids corresponded to a range of wind speeds over the English Channel: two for wind directions from the north-east and two from the south-west. Another pathway corresponded to a south-westerly continental transport. The backward trajectories of the largest and most dispersed sixth cluster contained low wind speeds, including sea-breeze circulations. Based on analyses of meteorological data and pollution measurements, the principal atmospheric pathways were related to local air-contamination events. Continuous air quality and meteorological data were collected during the Benzene-Toluene-Ethylbenzene-Xylene 2006 campaign. The sites of the pollution measurements served as the endpoints for the backward trajectories. Pollutant transport pathways corresponding to the highest air contamination were defined.

  5. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

    PubMed

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes- neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.

  6. Eating Behaviours of British University Students: A Cluster Analysis on a Neglected Issue

    PubMed Central

    Tanton, Jina; Dodd, Lorna J.; Woodfield, Lorayne; Mabhala, Mzwandile

    2015-01-01

    Unhealthy diet is a primary risk factor for noncommunicable diseases. University student populations are known to engage in health risking lifestyle behaviours including risky eating behaviours. The purpose of this study was to examine eating behaviour patterns in a population of British university students using a two-step cluster analysis. Consumption prevalence of snack, convenience, and fast foods in addition to fruit and vegetables was measured using a self-report “Student Eating Behaviours” questionnaire on 345 undergraduate university students. Four clusters were identified: “risky eating behaviours,” “mixed eating behaviours,” “moderate eating behaviours,” and “favourable eating behaviours.” Nineteen percent of students were categorised as having “favourable eating behaviours” whilst just under a third of students were categorised within the two most risky clusters. Riskier eating behaviour patterns were associated with living on campus and Christian faith. The findings of this study highlight the importance of university microenvironments on eating behaviours in university student populations. Religion as a mediator of eating behaviours is a novel finding. PMID:26550495

  7. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

    PubMed

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes- neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992

  8. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

    PubMed Central

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992

  9. Clustering drug-drug interaction networks with energy model layouts: community analysis and drug repurposing.

    PubMed

    Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai

    2016-01-01

    Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720

  10. Eating Behaviours of British University Students: A Cluster Analysis on a Neglected Issue.

    PubMed

    Tanton, Jina; Dodd, Lorna J; Woodfield, Lorayne; Mabhala, Mzwandile

    2015-01-01

    Unhealthy diet is a primary risk factor for noncommunicable diseases. University student populations are known to engage in health risking lifestyle behaviours including risky eating behaviours. The purpose of this study was to examine eating behaviour patterns in a population of British university students using a two-step cluster analysis. Consumption prevalence of snack, convenience, and fast foods in addition to fruit and vegetables was measured using a self-report "Student Eating Behaviours" questionnaire on 345 undergraduate university students. Four clusters were identified: "risky eating behaviours," "mixed eating behaviours," "moderate eating behaviours," and "favourable eating behaviours." Nineteen percent of students were categorised as having "favourable eating behaviours" whilst just under a third of students were categorised within the two most risky clusters. Riskier eating behaviour patterns were associated with living on campus and Christian faith. The findings of this study highlight the importance of university microenvironments on eating behaviours in university student populations. Religion as a mediator of eating behaviours is a novel finding.

  11. Selection of key ambient particulate variables for epidemiological studies - applying cluster and heatmap analyses as tools for data reduction.

    PubMed

    Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef

    2012-10-01

    The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses.

  12. ANALYSIS OF DETACHED ECLIPSING BINARIES NEAR THE TURNOFF OF THE OPEN CLUSTER NGC 7142

    SciTech Connect

    Sandquist, Eric L.; Serio, Andrew W.; Orosz, Jerome; Shetrone, Matthew E-mail: aserio@gemini.edu E-mail: shetrone@astro.as.utexas.edu

    2013-08-01

    We analyze extensive BVR{sub C}I{sub C} photometry and radial velocity measurements for three double-lined deeply eclipsing binary stars in the field of the old open cluster NGC 7142. The short period (P = 1.9096825 days) detached binary V375 Cep is a high probability cluster member, and has a total eclipse of the secondary star. The characteristics of the primary star (M = 1.288 {+-} 0.017 M{sub Sun }) at the cluster turnoff indicate an age of 3.6 Gyr (with a random uncertainty of 0.25 Gyr), consistent with earlier analysis of the color-magnitude diagram. The secondary star (M = 0.871 {+-} 0.008 M{sub Sun }) is not expected to have evolved significantly, but its radius is more than 10% larger than predicted by models. Because this binary system has a known age, it is useful for testing the idea that radius inflation can occur in short period binaries for stars with significant convective envelopes due to the inhibition of energy transport by magnetic fields. The brighter star in the binary also produces a precision estimate of the distance modulus, independent of reddening estimates: (m - M){sub V} = 12.86 {+-} 0.07. The other two eclipsing binary systems are not cluster members, although one of the systems (V2) could only be conclusively ruled out as a present or former member once the stellar characteristics were determined. That binary is within 0. Degree-Sign 5 of edge-on, is in a fairly long-period eccentric binary, and contains two almost indistinguishable stars. The other binary (V1) has a small but nonzero eccentricity (e = 0.038) in spite of having an orbital period under 5 days.

  13. Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products.

    PubMed

    Li, Yong Fuga; Tsai, Kathleen J S; Harvey, Colin J B; Li, James Jian; Ary, Beatrice E; Berlew, Erin E; Boehman, Brenna L; Findley, David M; Friant, Alexandra G; Gardner, Christopher A; Gould, Michael P; Ha, Jae H; Lilley, Brenna K; McKinstry, Emily L; Nawal, Saadia; Parry, Robert C; Rothchild, Kristina W; Silbert, Samantha D; Tentilucci, Michael D; Thurston, Alana M; Wai, Rebecca B; Yoon, Yongjin; Aiyar, Raeka S; Medema, Marnix H; Hillenmeyer, Maureen E; Charkoudian, Louise K

    2016-04-01

    Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, "activity-guided" approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent "genome mining" approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias toward the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and

  14. Molecular analysis of SCARECROW genes expressed in white lupin cluster roots.

    PubMed

    Sbabou, Laila; Bucciarelli, Bruna; Miller, Susan; Liu, Junqi; Berhada, Fatiha; Filali-Maltouf, Abdelkarim; Allan, Deborah; Vance, Carroll

    2010-03-01

    The Scarecrow (SCR) transcription factor plays a crucial role in root cell radial patterning and is required for maintenance of the quiescent centre and differentiation of the endodermis. In response to phosphorus (P) deficiency, white lupin (Lupinus albus L.) root surface area increases some 50-fold to 70-fold due to the development of cluster (proteoid) roots. Previously it was reported that SCR-like expressed sequence tags (ESTs) were expressed during early cluster root development. Here the cloning of two white lupin SCR genes, LaSCR1 and LaSCR2, is reported. The predicted amino acid sequences of both LaSCR gene products are highly similar to AtSCR and contain C-terminal conserved GRAS family domains. LaSCR1 and LaSCR2 transcript accumulation localized to the endodermis of both normal and cluster roots as shown by in situ hybridization and gene promoter::reporter staining. Transcript analysis as evaluated by quantitative real-time-PCR (qRT-PCR) and RNA gel hybridization indicated that the two LaSCR genes are expressed predominantly in roots. Expression of LaSCR genes was not directly responsive to the P status of the plant but was a function of cluster root development. Suppression of LaSCR1 in transformed roots of lupin and Medicago via RNAi (RNA interference) delivered through Agrobacterium rhizogenes resulted in decreased root numbers, reflecting the potential role of LaSCR1 in maintaining root growth in these species. The results suggest that the functional orthologues of AtSCR have been characterized.

  15. Coxiella burnetii Transcriptional Analysis Reveals Serendipity Clusters of Regulation in Intracellular Bacteria

    PubMed Central

    Leroy, Quentin; Lebrigand, Kevin; Armougom, Fabrice; Barbry, Pascal; Thiéry, Richard; Raoult, Didier

    2010-01-01

    Coxiella burnetii, the causative agent of the zoonotic disease Q fever, is mainly transmitted to humans through an aerosol route. A spore-like form allows C. burnetii to resist different environmental conditions. Because of this, analysis of the survival strategies used by this bacterium to adapt to new environmental conditions is critical for our understanding of C. burnetii pathogenicity. Here, we report the early transcriptional response of C. burnetii under temperature stresses. Our data show that C. burnetii exhibited minor changes in gene regulation under short exposure to heat or cold shock. While small differences were observed, C. burnetii seemed to respond similarly to cold and heat shock. The expression profiles obtained using microarrays produced in-house were confirmed by quantitative RT-PCR. Under temperature stresses, 190 genes were differentially expressed in at least one condition, with a fold change of up to 4. Globally, the differentially expressed genes in C. burnetii were associated with bacterial division, (p)ppGpp synthesis, wall and membrane biogenesis and, especially, lipopolysaccharide and peptidoglycan synthesis. These findings could be associated with growth arrest and witnessed transformation of the bacteria to a spore-like form. Unexpectedly, clusters of neighboring genes were differentially expressed. These clusters do not belong to operons or genetic networks; they have no evident associated functions and are not under the control of the same promoters. We also found undescribed but comparable clusters of regulation in previously reported transcriptomic analyses of intracellular bacteria, including Rickettsia sp. and Listeria monocytogenes. The transcriptomic patterns of C. burnetii observed under temperature stresses permits the recognition of unpredicted clusters of regulation for which the trigger mechanism remains unidentified but which may be the result of a new mechanism of epigenetic regulation. PMID:21203564

  16. Using cluster analysis to identify patterns in students' responses to contextually different conceptual problems

    NASA Astrophysics Data System (ADS)

    Stewart, John; Miller, Mayo; Audo, Christine; Stewart, Gay

    2012-12-01

    This study examined the evolution of student responses to seven contextually different versions of two Force Concept Inventory questions in an introductory physics course at the University of Arkansas. The consistency in answering the closely related questions evolved little over the seven-question exam. A model for the state of student knowledge involving the probability of selecting one of the multiple-choice answers was developed. Criteria for using clustering algorithms to extract model parameters were explored and it was found that the overlap between the probability distributions of the model vectors was an important parameter in characterizing the cluster models. The course data were then clustered and the extracted model showed that students largely fit into two groups both pre- and postinstruction: one that answered all questions correctly with high probability and one that selected the distracter representing the same misconception with high probability. For the course studied, 14% of the students were left with persistent misconceptions post instruction on a static force problem and 30% on a dynamic Newton’s third law problem. These students selected the answer representing the predominant misconception slightly more consistently postinstruction, indicating that the course studied had been ineffective at moving this subgroup of students nearer a Newtonian force concept and had instead moved them slightly farther away from a correct conceptual understanding of these two problems. The consistency in answering pairs of problems with varied physical contexts is shown to be an important supplementary statistic to the score on the problems and suggests that the inclusion of such problem pairs in future conceptual inventories would be efficacious. Multiple, contextually varied questions further probe the structure of students’ knowledge. To allow working instructors to make use of the additional insight gained from cluster analysis, it is our hope that the

  17. Statistical clustering techniques for the analysis of long molecular dynamics trajectories: analysis of 2.2-ns trajectories of YPGDV.

    PubMed

    Karpen, M E; Tobias, D J; Brooks, C L

    1993-01-19

    The microscopic interactions and mechanisms leading to nascent protein folding events are generally unknown. While such short time-scale events are difficult to study experimentally, molecular dynamics simulations of peptides can provide a useful model for studying events related to protein folding initiation. Recently, two extremely long molecular dynamics simulations (2.2 ns each) were carried out on the pentapeptide Tyr-Pro-Gly-Asp-Val [Tobias, D. J., Mertz, J. E., & Brooks, C. L., III (1991) Biochemistry 30, 6054-6058] that forms stable reverse turns in solution. Tobias et al. examined folding events in this large system (approximately 30,000 conformations) using traditional methods of trajectory analysis. The shear magnitude of this problem prompted us to develop an automated approach, based on self-organizing neural nets, to extract the key features of the molecular dynamics trajectory. The neural net is used to perform conformational clustering, which reduces the complexity of a system while minimizing the loss of information. The conformations were grouped together using distances in dihedral angle space as a measure of conformational similarity. The resulting clusters represent "conformational states", and transitions between these states were examined to identify mechanisms of conformational change. Many conformational changes involved the rotation of only a single dihedral angle, but concerted angle changes were also found. Most of the conformational information in the 30,000 samples from the full trajectories was retained in the relatively few resultant clusters, providing a powerful tool for analysis of an expanding base of large molecular simulations.

  18. Similarity and Cluster Analysis of Intermediate Deep Events in the Southeastern Aegean

    NASA Astrophysics Data System (ADS)

    Ruscic, Marija; Becker, Dirk; Brüstle, Andrea; Meier, Thomas

    2016-04-01

    In order to gain a better understanding of geodynamic processes in the Hellenic subduction zone (HSZ), in particular in the eastern part of the HSZ, we analyze a cluster of intermediate deep events in the region of Nisyros volcano. The cluster recorded during the deployment of the temporary seismic network EGELADOS consists of 159 events at 80 to 200 km depth with local magnitudes ranging from magnitude 0.2 to magnitude 4.1. The network itself consisted of 56 onshore and 23 offshore broadband stations completed by 19 permanent stations from NOA, GEOFON and MedNet. It was deployed from September 2005 to March 2007 and it covered the entire HSZ. Here, both spatial and temporal clustering of the recorded events is studied by using the three component similarity analysis. The waveform cross-correlation was performed for all event combinations using data recorded on 45 onshore stations. The results are shown as a function of frequency for individual stations and as averaged values over the network. The cross-correlation coefficients at the single stations show a decreasing similarity with increasing epicentral distance as well as the effect of local heterogeneities at particular stations, causing noticeable differences in waveform similarities. Event relocation was performed by using the double-difference earthquake relocation software HypoDD and the results are compared with previously obtained single event locations which were calculated using nonlinear location tool NonLinLoc and station corrections. For the relocation, both differential travel times obtained by separate cross-correlation of P- and S-waveforms and manual readings of onset times are used. It is shown that after the relocation the inter-event distance for highly similar events has been reduced. By comparing the results of the cluster analysis with results obtained from the synthetic catalogs, where the event rate, portion and occurrence time of the aftershocks is varied, it is shown that the event

  19. MC2: Dynamical Analysis of the Merging Galaxy Cluster MACS J1149.5+2223

    NASA Astrophysics Data System (ADS)

    Golovich, Nathan; Dawson, William A.; Wittman, David; Ogrean, Georgiana; van Weeren, Reinout; Bonafede, Annalisa

    2016-11-01

    We present an analysis of the merging cluster MACS J1149.5+2223 using archival imaging from Subaru/Suprime-Cam and multi-object spectroscopy from Keck/DEIMOS and Gemini/GMOS. We employ two- and three-dimensional substructure tests and determine that MACS J1149.5+2223 is composed of two separate mergers among three subclusters occurring ∼1 Gyr apart. The primary merger gives rise to elongated X-ray morphology and a radio relic in the southeast. The brightest cluster galaxy is a member of the northern subcluster of the primary merger. This subcluster is very massive ({16.7}-1.60+1.25× {10}14 {M}ȯ ). The southern subcluster is also very massive ({10.8}-3.54+3.37× {10}14 {M}ȯ ), yet it lacks an associated X-ray surface brightness peak, and it has been unidentified previously despite the detailed study of this Frontier Field cluster. A secondary merger is occurring in the north along the line of sight (LOS) with a third, less massive subcluster ({1.20}-0.34+0.19× {10}14 {M}ȯ ). We perform a Monte Carlo dynamical analysis on the main merger and estimate a collision speed at pericenter of {2770}-310+610 km s‑1. We show the merger to be returning from apocenter with core passage occurring {1.16}-0.25+0.50 Gyr before the observed state. We identify the LOS merging subcluster in a strong lensing analysis in the literature and show that it is likely bound to MACS J1149 despite having reached an extreme collision velocity of ∼4000 km s‑1.

  20. Discovery of exacerbating cases in chronic hepatitis based on cluster analysis of time-series platelet count data

    NASA Astrophysics Data System (ADS)

    Hirano, Shoji; Tsumoto, Shusaku

    2007-04-01

    This paper reports the results of temporal analysis of platelet (PLT) data in chronic hepatitis dataset. First we briefly introduce a cluster analysis system for temporal data that we have developed. Second, we show the results of cluster analysis of PLT sequences. Third, we show the results of PLT value-based temporal analysis aiming at finding years for reaching F4, years elapsed between stages, and their relationships with virus types and fibrotic stages. The results of cluster analysis indicate that the temporal courses of PLT can be grouped into several patterns each of which presents similarity in average PLT level and increase/decrease trends. The results of value-based analysis suggests that liver fibrosis may proceed faster in the exacerbating cases.

  1. [Achene morphology cluster analysis of Taraxacum F. H. Wigg. from northeast China and molecule systematics evidence determined by SRAP].

    PubMed

    Li, Hai-juan; Zhao, Xin; Jia, Qing-fei; Li, Tian-lai; Ning, Wei

    2012-08-01

    The achenes morphological and micro-morphological characteristics of six species of genus Taraxacum from northeastern China as well as SRAP cluster analysis were observed for their classification evidences. The achenes were observed by microscope and EPMA. Cluster analysis was given on the basis of the size, shape, cone proportion, color and surface sculpture of achenes. The Taraxacum inter-species achene shape characteristic difference is obvious, particularly spinulose distribution and size, achene color and achene size; with the Taraxacum plant achene shape the cluster method T. antungense Kitag. and the T. urbanum Kitag. should combine for the identical kind; the achene morphology cluster analysis and the SRAP tagged molecule systematics's cluster result retrieves in the table with "the Chinese flora". The class group to divide the result is consistent. Taraxacum plant achene shape characteristic stable conservative, may carry on the inter-species division and the sibship analysis according to the achene shape characteristic combination difference; the achene morphology cluster analysis as well as the SRAP tagged molecule systematics confirmation support dandelion classification result of "the Chinese flora".

  2. PepServe: a web server for peptide analysis, clustering and visualization

    PubMed Central

    Alexandridou, Anastasia; Dovrolis, Nikolas; Tsangaris, George Th.; Nikita, Konstantina; Spyrou, George

    2011-01-01

    Peptides, either as protein fragments or as naturally occurring entities are characterized by their sequence and function features. Many times the researchers need to massively manage peptide lists concerning protein identification, biomarker discovery, bioactivity, immune response or other functionalities. We present a web server that manages peptide lists in terms of feature analysis as well as interactive clustering and visualization of the given peptides. PepServe is a useful tool in the understanding of the peptide feature distribution among a group of peptides. The PepServe web application is freely available at http://bioserver-1.bioacademy.gr/Bioserver/PepServe/. PMID:21572105

  3. Topology analysis of emerging bipole clusters producing violent solar events observed by SDO

    NASA Astrophysics Data System (ADS)

    Schmieder, Brigitte; Demoulin, Pascal; Mandrini, Cristina H.; Guo, Yang

    2012-07-01

    During the rising phase of Solar Cycle 24, tremendous magnetic solar activity occurs on the Sun with fast and compact emergence of magnetic flux leading to burts of flares (C to M and even X class) . We have investigated the violent events occuring in the cluster of two active regions AR 11121 and AR11123 observed in November by SDO. In less than two days the magnetic field increases by a factor of 10 with the emergence of groups of bipoles. A topology analysis demonstrates the formation of multiple separatrices and quasi-separatrix layers explaining possible mechanisms for destabilization of the magnetic structures such as filaments and coronal loops.

  4. CLASH: Weak-lensing shear-and-magnification analysis of 20 galaxy clusters

    SciTech Connect

    Umetsu, Keiichi; Czakon, Nicole; Medezinski, Elinor; Lemze, Doron; Ford, Holland; Nonino, Mario; Balestra, Italo; Biviano, Andrea; Merten, Julian; Postman, Marc; Koekemoer, Anton; Meneghetti, Massimo; Donahue, Megan; Molino, Alberto; Benítez, Narciso; Seitz, Stella; Gruen, Daniel; Broadhurst, Tom; Grillo, Claudio; Melchior, Peter; and others

    2014-11-10

    We present a joint shear-and-magnification weak-lensing analysis of a sample of 16 X-ray-regular and 4 high-magnification galaxy clusters at 0.19 ≲ z ≲ 0.69 selected from the Cluster Lensing And Supernova survey with Hubble (CLASH). Our analysis uses wide-field multi-color imaging, taken primarily with Suprime-Cam on the Subaru Telescope. From a stacked-shear-only analysis of the X-ray-selected subsample, we detect the ensemble-averaged lensing signal with a total signal-to-noise ratio of ≅ 25 in the radial range of 200-3500 kpc h {sup –1}, providing integrated constraints on the halo profile shape and concentration-mass relation. The stacked tangential-shear signal is well described by a family of standard density profiles predicted for dark-matter-dominated halos in gravitational equilibrium, namely, the Navarro-Frenk-White (NFW), truncated variants of NFW, and Einasto models. For the NFW model, we measure a mean concentration of c{sub 200c}=4.01{sub −0.32}{sup +0.35} at an effective halo mass of M{sub 200c}=1.34{sub −0.09}{sup +0.10}×10{sup 15} M{sub ⊙}. We show that this is in excellent agreement with Λ cold dark matter (ΛCDM) predictions when the CLASH X-ray selection function and projection effects are taken into account. The best-fit Einasto shape parameter is α{sub E}=0.191{sub −0.068}{sup +0.071}, which is consistent with the NFW-equivalent Einasto parameter of ∼0.18. We reconstruct projected mass density profiles of all CLASH clusters from a joint likelihood analysis of shear-and-magnification data and measure cluster masses at several characteristic radii assuming an NFW density profile. We also derive an ensemble-averaged total projected mass profile of the X-ray-selected subsample by stacking their individual mass profiles. The stacked total mass profile, constrained by the shear+magnification data, is shown to be consistent with our shear-based halo-model predictions, including the effects of surrounding large-scale structure as

  5. Regression Models for Demand Reduction based on Cluster Analysis of Load Profiles

    SciTech Connect

    Yamaguchi, Nobuyuki; Han, Junqiao; Ghatikar, Girish; Piette, Mary Ann; Asano, Hiroshi; Kiliccote, Sila

    2009-06-28

    This paper provides new regression models for demand reduction of Demand Response programs for the purpose of ex ante evaluation of the programs and screening for recruiting customer enrollment into the programs. The proposed regression models employ load sensitivity to outside air temperature and representative load pattern derived from cluster analysis of customer baseline load as explanatory variables. The proposed models examined their performances from the viewpoint of validity of explanatory variables and fitness of regressions, using actual load profile data of Pacific Gas and Electric Company's commercial and industrial customers who participated in the 2008 Critical Peak Pricing program including Manual and Automated Demand Response.

  6. Portraying persons who inject drugs recently infected with hepatitis C accessing antiviral treatment: a cluster analysis.

    PubMed

    Bamvita, Jean-Marie; Roy, Elise; Zang, Geng; Jutras-Aswad, Didier; Artenie, Andreea Adelina; Levesque, Annie; Bruneau, Julie

    2014-01-01

    Objectives. To empirically determine a categorization of people who inject drug (PWIDs) recently infected with hepatitis C virus (HCV), in order to identify profiles most likely associated with early HCV treatment uptake. Methods. The study population was composed of HIV-negative PWIDs with a documented recent HCV infection. Eligibility criteria included being 18 years old or over, and having injected drugs in the previous 6 months preceding the estimated date of HCV exposure. Participant classification was carried out using a TwoStep cluster analysis. Results. From September 2007 to December 2011, 76 participants were included in the study. 60 participants were eligible for HCV treatment. Twenty-one participants initiated HCV treatment. The cluster analysis yielded 4 classes: class 1: Lukewarm health seekers dismissing HCV treatment offer; class 2: multisubstance users willing to shake off the hell; class 3: PWIDs unlinked to health service use; class 4: health seeker PWIDs willing to reverse the fate. Conclusion. Profiles generated by our analysis suggest that prior health care utilization, a key element for treatment uptake, differs between older and younger PWIDs. Such profiles could inform the development of targeted strategies to improve health outcomes and reduce HCV infection among PWIDs. PMID:25349730

  7. Investigating properties of a set of variable AGN with cluster analysis

    NASA Astrophysics Data System (ADS)

    Nair, A. D.

    1997-05-01

    Optical and gamma-ray properties of a sample of active galactic nuclei monitored at the Rosemary Hill Observatory are analysed using cluster analysis. Cluster analysis can be used to analyse large amounts of data with many variables and investigate linear or non-linear relationships in the data. It is found that the time-scale of variation is not related to the amplitude of variability. For BLLacs and optically violent variable (OVV) quasars the variability is proportional to the redshift and absolute magnitude, but this is not true for quasars in this sample. The analysis shows that gamma-ray-loud AGN tend to be associated with superluminal sources with OVV-like characteristics. The gamma-ray fluxes, for both OVV quasars and BLLacs, are proportional to the apparent transverse velocity, and this may point to beaming as the dominant cause for the gamma-ray flux. A large majority of the OVV quasars that display a large amplitude of variability are gamma- ray-loud, but this is not true for BL Lacs.

  8. Performance analysis of the Alliant FX/8 multiprocessor using statistical clustering

    NASA Technical Reports Server (NTRS)

    Dimpsey, Robert Tod

    1988-01-01

    Results for two distinct, real, scientific workloads executed on an Alliant FX/8 are discussed. A combination of user concurrency and system overhead measurements was taken for both workloads. Preliminary analysis shows that the first sampled workload is comprised of consistently high user concurrency, low system overhead, and little paging. The second sample has much less user concurrency, but significant paging and system overhead. Statistical cluster analysis is used to extract a state transition model to jointly characterize user concurrency and system overhead. A skewness factor is introduced and used to bring out the effects of unbalanced clustering when determining states with important transitions. The results from the models show that during the collection of the first sample, the system was operating in states of high user concurrency approximately 75 percent of the time. The second workload sample shows the system in high user concurrency states only 26 percent of the time. In addition, it is ascertained that high system overhead is usually accompanied by low user concurrency. The analysis also shows a high predictability of system behavior for both workloads.

  9. Analysis of equatorial noise using data from the Cluster and Themis missions

    NASA Astrophysics Data System (ADS)

    Hrbackova, Zuzana; Santolik, Ondrej; Pickett, Jolene; Gurnett, Donald; Cornilleau-Wehrlin, Nicole; Lecontel, Olivier; Krupar, Vratislav

    We report the results of the analysis of equatorial noise (EN) using data from the Cluster and Themis spacecraft missions. EN is an intense electromagnetic wave emission that propagates close to the geomagnetic equator between the local proton cyclotron and local lower hybrid frequencies. Recent studies have shown that these waves might play a significant role in the acceleration of electrons to relativistic energies in the outer Van Allen radiation belt. The orbit of the Cluster mission has changed over the last two years, providing us with a larger and statistically more meaningful database from which to carry out our study of EN occurrence. We use onboard analyzed data from the STAFF-SA instrument and high-time resolution waveform data from the WBD instrument collected between 2002 and 2009. We present the results obtained by a systematic analysis of the fine spectral structures of the EN emissions observed by WBD. The frequencies of emission peaks have been visually selected from high-resolution spectrograms. We show histograms of the positions of the source regions of EN. The five spacecraft of the Themis mission have search coil magnetometers onboard which measure wave fluctuations in the frequency bandwidth from 0.1 Hz to 4 kHz. We present the results of the fine spectral analysis of these measurements.

  10. An empirical comparison of several clustered data approaches under confounding due to cluster effects in the analysis of complications of coronary angioplasty.

    PubMed

    Berlin, J A; Kimmel, S E; Ten Have, T R; Sammel, M D

    1999-06-01

    In the analysis of binary response data from many types of large studies, the data are likely to have arisen from multiple centers, resulting in a within-center correlation for the response. Such correlation, or clustering, occurs when outcomes within centers tend to be more similar to each other than to outcomes in other centers. In studies where there is also variability among centers with respect to the exposure of interest, analysis of the exposure-outcome association may be confounded, even after accounting for within-center correlations. We apply several analytic methods to compare the risk of major complications associated with two strategies, staged and combined procedures, for performing percutaneous transluminal coronary angioplasty (PTCA), a mechanical means of relieving blockage of blood vessels due to atherosclerosis. Combined procedures are used in some centers as a cost-cutting strategy. We performed a number of population-averaged and cluster-specific (conditional) analyses, which (a) make no adjustments for center effects of any kind; (b) make adjustments for the effect of center on only the response; or (c) make adjustments for both the effect of center on the response and the relationship between center and exposure. The method used for this third approach decomposes the procedure type variable into within-center and among-center components, resulting in two odds ratio estimates. The naive analysis, ignoring clusters, gave a highly significant effect of procedure type (OR = 1.6). Population average models gave marginally to very nonsignificant estimates of the OR for treatment type ranging from 1.6 to 1.2 with adjustment only for the effect of centers on response. These results depended on the assumed correlation structure. Conditional (cluster-specific) models and other methods that decomposed the treatment type variable into among- and within-center components all found no within-center effect of procedure type (OR = 1.02, consistently) and a

  11. Are transient environmental agents involved in the cause of primary biliary cirrhosis? Evidence from space-time clustering analysis.

    PubMed

    McNally, Richard J Q; Ducker, Samantha; James, Oliver F W

    2009-10-01

    The cause of primary biliary cirrhosis (PBC) is unclear. Both genetic and environmental factors are likely to contribute. Some studies have suggested that one or more infectious agents may be involved. To examine whether infections may contribute to the cause of PBC, we have analyzed for space-time clustering using population-based data from northeast England over a defined period (1987-2003). Space-time clustering is observed when excess cases of a disease are found within limited geographical areas at limited periods of time. If present, it is suggestive of the involvement of one or more environmental components in the cause of a disease and is especially supportive of infections. A second-order procedure based on K-functions was used to test for global space-time clustering using residential addresses at the time of diagnosis. The Knox method determined the spatiotemporal range over which global clustering was strongest. K-function tests were repeated using nearest neighbor thresholds to adjust for variations in population density. Individual space-time clusters were identified using Kulldorff's scan statistic. Analysis of 1015 cases showed highly statistically significant space-time clustering (P < 0.001). Clustering was most marked for cases diagnosed within 1-4 months of one another. A number of specific space-time clusters were identified. In conclusion, these novel results suggest that transient environmental agents may play a role in the cause of PBC.

  12. Improved initialisation of model-based clustering using Gaussian hierarchical partitions

    PubMed Central

    Scrucca, Luca; Raftery, Adrian E.

    2015-01-01

    Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among the several approaches available in the literature, model-based agglomerative hierarchical clustering is used to provide initial partitions in the popular mclust R package. This choice is computationally convenient and often yields good clustering partitions. However, in certain circumstances, poor initial partitions may cause the EM algorithm to converge to a local maximum of the likelihood function. We propose several simple and fast refinements based on data transformations and illustrate them through data examples. PMID:26949421

  13. A 2163: Merger events in the hottest Abell galaxy cluster. I. Dynamical analysis from optical data

    NASA Astrophysics Data System (ADS)

    Maurogordato, S.; Cappi, A.; Ferrari, C.; Benoist, C.; Mars, G.; Soucail, G.; Arnaud, M.; Pratt, G. W.; Bourdin, H.; Sauvageot, J.-L.

    2008-04-01

    Context: A 2163 is among the richest and most distant Abell clusters, presenting outstanding properties in different wavelength domains. X-ray observations have revealed a distorted gas morphology and strong features have been detected in the temperature map, suggesting that merging processes are important in this cluster. However, the merging scenario is not yet well-defined. Aims: We have undertaken a complementary optical analysis, aiming to understand the dynamics of the system, to constrain the merging scenario and to test its effect on the properties of galaxies. Methods: We present a detailed optical analysis of A 2163 based on new multicolor wide-field imaging and medium-to-high resolution spectroscopy of several hundred galaxies. Results: The projected galaxy density distribution shows strong subclustering with two dominant structures: a main central component (A), and a northern component (B), visible both in optical and in X-ray, with two other substructures detected at high significance in the optical. At magnitudes fainter than R=19, the galaxy distribution shows a clear elongation approximately with the east-west axis extending over 4~h70-1 Mpc, while a nearly perpendicular bridge of galaxies along the north-south axis appears to connect (B) to (A). The (A) component shows a bimodal morphology, and the positions of its two density peaks depend on galaxy luminosity: at magnitudes fainter than R = 19, the axis joining the peaks shows a counterclockwise rotation (from NE/SW to E-W) centered on the position of the X-ray maximum. Our final spectroscopic catalog of 512 objects includes 476 new galaxy redshifts. We have identified 361 galaxies as cluster members; among them, 326 have high precision redshift measurements, which allow us to perform a detailed dynamical analysis of unprecedented accuracy. The cluster mean redshift and velocity dispersion are respectively z= 0.2005 ± 0.0003 and 1434 ± 60 km s-1. We spectroscopically confirm that the northern

  14. Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel-based "mouse pup syllable classification calculator".

    PubMed

    Grimsley, Jasmine M S; Gadziola, Marie A; Wenstrup, Jeffrey J

    2012-01-01

    Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified 10 syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.

  15. Analysis of Massive Emigration from Poland: The Model-Based Clustering Approach

    NASA Astrophysics Data System (ADS)

    Witek, Ewa

    The model-based approach assumes that data is generated by a finite mixture of probability distributions such as multivariate normal distributions. In finite mixture models, each component of probability distribution corresponds to a cluster. The problem of determining the number of clusters and choosing an appropriate clustering method becomes the problem of statistical model choice. Hence, the model-based approach provides a key advantage over heuristic clustering algorithms, because it selects both the correct model and the number of clusters.

  16. Statistical parametric mapping and cluster counting analysis of [18F] FDG-PET imaging in traumatic brain injury.

    PubMed

    Zhang, Jing; Mitsis, Effie M; Chu, Kingwai; Newmark, Randall E; Hazlett, Erin A; Buchsbaum, Monte S

    2010-01-01

    In this study we investigated regional cerebral glucose metabolism abnormalities of [(18)F] fluorodeoxyglucose (FDG) positron emission tomography (PET) imaging in traumatic brain injury (TBI). PET images of 81 TBI patients and 68 normal controls were acquired and a word list learning task was administered during the uptake period. The TBI group included 35 patients with positive structural imaging (CT or MRI) findings soon after injury, 40 patients with negative findings, and 6 cases without structural imaging. Statistical parametric mapping (SPM) analysis was applied with several levels of spatial smoothing. Cluster counting analysis was performed for each subject to identify abnormal clusters with contiguous voxel values that deviated by two standard deviations or more from the mean of the normal controls, and to count the number of clusters in 10 size categories. SPM maps demonstrated that the 81 patients had significantly lower FDG uptake than normal controls, widely across the cortex (including bilateral frontal and temporal regions), and in the thalamus. Cluster counting results indicated that TBI patients had a higher proportion of larger clusters than controls. These large low-FDG-uptake clusters of the TBI patients were closer to the brain edge than those of controls. These results suggest that deficits of cerebral metabolism in TBI are spread over multiple brain areas, that they are closer to the cortical surface than clusters in controls, and that group spatial patterns of abnormal cerebral metabolism may be similar in TBI patients with cognitive deficits with and without obvious acute abnormalities identified on structural imaging.

  17. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method.

    PubMed

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-07-01

    The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways and calculating marginal STR distributions. We found that the Y-STR haplotypes from e.g. Finland were relatively homogeneous as opposed to the relatively heterogeneous Y-STR haplotypes from e.g. Lublin, Eastern Poland and Berlin, Germany. We demonstrated that the observed distributions of alleles at each locus were similar to the expected ones. We also compared pairwise distances between geographically separated samples from Africa with those obtained using the AMOVA method and found good agreement.

  18. Similarity and Cluster Analysis of Intermediate Deep Events in the Southeastern Aegean

    NASA Astrophysics Data System (ADS)

    Ruscic, M.; Meier, T. M.; Becker, D.; Brüstle, A.

    2015-12-01

    In order to gain a better understanding of geodynamic processes in the Hellenic subduction zone (HSZ), in particular in the eastern part of the HSZ, we analyze a cluster of intermediate deep events in the region of Nisyros volcano. The events were recorded by the temporary seismic network EGELADOS deployed from September 2005 to March 2007. The network covered the entire Hellenic subduction zone and it consisted of 23 offshore and 56 onshore broadband stations completed by 19 permanent stations from NOA, GEOFON and MedNet. The cluster of intermediate deep seismicity consists of 159 events with local magnitudes ranging from magnitude 0.2 to magnitude 4.1 at depths from 80 to 200 km. The events occur close to the top of the slab at an about 30 km thick zone. The spatio-temporal clustering is studied using three component similarity analysis.Single event locations obtained using the nonlinear location tool NonLinLoc are compared to relative relocations calculated using the double-difference earthquake relocation software HypoDD. The relocation is performed with both manual readings of onset times as well as with differential traveltimes obtained by separate cross-correlation of P- and S-waveforms. The three-component waveform cross-correlation was performed for all the events using data from 45 stations. The results of the similarity analysis are shown as a function of frequency for individual stations and averaged over the network. Average similarities between waveforms of all event pairs reveal a low number of highly similar events but a large number of moderate similarities. Interestingly, the single station similarities between the event pairs show (1) in general decreasing similarity with increasing epicentral distance, (2) reduced similarities for paths crossing boundaries of slab segments, and (3) the influence of strong local heterogeneity leading to a considerable reduction of waveform similarities e.g. in the center of the Santorini volcano.

  19. Unsupervised change detection in satellite images using fuzzy c-means clustering and principal component analysis

    NASA Astrophysics Data System (ADS)

    Kesikoğlu, M. H.; Atasever, Ü. H.; Özkan, C.

    2013-10-01

    Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.

  20. RNA-seq analysis identifies an intricate regulatory network controlling cluster root development in white lupin

    PubMed Central

    2014-01-01

    Background Highly adapted plant species are able to alter their root architecture to improve nutrient uptake and thrive in environments with limited nutrient supply. Cluster roots (CRs) are specialised structures of dense lateral roots formed by several plant species for the effective mining of nutrient rich soil patches through a combination of increased surface area and exudation of carboxylates. White lupin is becoming a model-species allowing for the discovery of gene networks involved in CR development. A greater understanding of the underlying molecular mechanisms driving these developmental processes is important for the generation of smarter plants for a world with diminishing resources to improve food security. Results RNA-seq analyses for three developmental stages of the CR formed under phosphorus-limited conditions and two of non-cluster roots have been performed for white lupin. In total 133,045,174 high-quality paired-end reads were used for a de novo assembly of the root transcriptome and merged with LAGI01 (Lupinus albus gene index) to generate an improved LAGI02 with 65,097 functionally annotated contigs. This was followed by comparative gene expression analysis. We show marked differences in the transcriptional response across the various cluster root stages to adjust to phosphate limitation by increasing uptake capacity and adjusting metabolic pathways. Several transcription factors such as PLT, SCR, PHB, PHV or AUX/IAA with a known role in the control of meristem activity and developmental processes show an increased expression in the tip of the CR. Genes involved in hormonal responses (PIN, LAX, YUC) and cell cycle control (CYCA/B, CDK) are also differentially expressed. In addition, we identify primary transcripts of miRNAs with established function in the root meristem. Conclusions Our gene expression analysis shows an intricate network of transcription factors and plant hormones controlling CR initiation and formation. In addition

  1. Cluster analysis on the bulk elemental compositions of Antarctic stony meteorites

    NASA Astrophysics Data System (ADS)

    Miyamoto, Hideaki; Niihara, Takafumi; Kuritani, Takeshi; Hong, Peng K.; Dohm, James M.; Sugita, Seiji

    2016-05-01

    Remote sensing observations by recent successful missions to small bodies have revealed the difficulty in classifying the materials which cover their surfaces into a conventional classification of meteorites. Although reflectance spectroscopy is a powerful tool for this purpose, it is influenced by many factors, such as space weathering, lighting conditions, and surface physical conditions (e.g., particle size and style of mixing). Thus, complementary information, such as elemental compositions, which can be obtained by X-ray fluorescence (XRF) and gamma-ray spectrometers (GRS), have been considered very important. However, classifying planetary materials solely based on elemental compositions has not been investigated extensively. In this study, we perform principal component and cluster analyses on 12 major and minor elements of the bulk compositions of 500 meteorites reported in the National Institute of Polar Research (NIPR), Japan database. Our unique approach, which includes using hierarchical cluster analysis, indicates that meteorites can be classified into about 10 groups purely by their bulk elemental compositions. We suggest that Si, Fe, Mg, Ca, and Na are the optimal set of elements, as this set has been used successfully to classify meteorites of the NIPR database with more than 94% accuracy. Principal components analysis indicates that elemental compositions of meteorites form eight clusters in the three-dimensional space of the components. The three major principal components (PC1, PC2, and PC3) are interpreted as (1) degree of differentiations of the source body (i.e., primitive versus differentiated), (2) degree of thermal effects, and (3) degree of chemical fractionation, respectively.

  2. [Research on distribution of patents' holders for Chinese herbal compounds in treating cardiovascular and cerebrovascular based on cluster analysis].

    PubMed

    YANG, Xu-Jie; XIAO, Shi-Ying

    2015-09-01

    To discuss the distribution of patents' holders for Chinese herbal compounds in treating cardiovascular and cerebrovascular, the patents' holders for Chinese herbal compounds in treating cardiovascular and cerebrovascular were cluster analyzed by means of simple statistics and cluster analysis. Clustering variables were composed of patent applications, patent maintained number, related papers' quantity, etc. Chinese herbal compound patents' holders were divided into four categories according to their different scientific research and patent strength. It is the magic weapon for Chinese herbal compound patents' holders that have scientific research patents' transforming and make coordination of patent protection and scientific innovation.

  3. Survey on granularity clustering.

    PubMed

    Ding, Shifei; Du, Mingjing; Zhu, Hong

    2015-12-01

    With the rapid development of uncertain artificial intelligent and the arrival of big data era, conventional clustering analysis and granular computing fail to satisfy the requirements of intelligent information processing in this new case. There is the essential relationship between granular computing and clustering analysis, so some researchers try to combine granular computing with clustering analysis. In the idea of granularity, the researchers expand the researches in clustering analysis and look for the best clustering results with the help of the basic theories and methods of granular computing. Granularity clustering method which is proposed and studied has attracted more and more attention. This paper firstly summarizes the background of granularity clustering and the intrinsic connection between granular computing and clustering analysis, and then mainly reviews the research status and various methods of granularity clustering. Finally, we analyze existing problem and propose further research.

  4. Cosmological constraints from a combination of galaxy clustering and lensing - II. Fisher matrix analysis

    NASA Astrophysics Data System (ADS)

    More, Surhud; van den Bosch, Frank C.; Cacciato, Marcello; More, Anupreeta; Mo, Houjun; Yang, Xiaohu

    2013-04-01

    We quantify the accuracy with which the cosmological parameters characterizing the energy density of matter (Ωm), the amplitude of the power spectrum of matter fluctuations (σ8), the energy density of neutrinos (Ων) and the dark energy equation of state (w0) can be constrained using data from large galaxy redshift surveys. We advocate a joint analysis of the abundance of galaxies, galaxy clustering, and the galaxy-galaxy weak-lensing signal in order to simultaneously constrain the halo occupation statistics (i.e. galaxy bias) and the cosmological parameters of interest. We parametrize the halo occupation distribution of galaxies in terms of the conditional luminosity function and use the analytical framework of the halo model described in Cacciato et al. (our companion Paper III), to predict the relevant observables. By performing a Fisher matrix analysis, we show that a joint analysis of these observables, even with the precision with which they are currently measured from the Sloan Digital Sky Survey, can be used to obtain tight constraints on the cosmological parameters, fully marginalized over uncertainties in galaxy bias. We demonstrate that the cosmological constraints from such an analysis are nearly uncorrelated with the halo occupation distribution constraints, thus, minimizing the systematic impact of any imperfections in modelling the halo occupation statistics on the cosmological constraints. In fact, we demonstrate that the constraints from such an analysis are both complementary to and competitive with existing constraints on these parameters from a number of other techniques, such as cluster abundances, cosmic shear and/or baryon acoustic oscillations, thus paving the way to test the concordance cosmological model.

  5. Application of statistics filter method and clustering analysis in fault diagnosis of roller bearings

    NASA Astrophysics Data System (ADS)

    Song, L. Y.; Wang, H. Q.; Gao, J. J.; Yang, J. F.; Liu, W. B.; Chen, P.

    2012-05-01

    Condition diagnosis of roller bearings depends largely on the feature analysis of vibration signals. Spectrum statistics filter (SSF) method could adaptively reduce the noise. This method is based on hypothesis testing in the frequency domain to eliminate the identical component between the reference signal and the primary signal. This paper presents a statistical parameter namely similarity factor to evaluate the filtering performance. The performance of the method is compared with the classical method, band pass filter (BPF). Results show that statistics filter is preferable to BPF in vibration signal processing. Moreover, the significance level awould be optimized by genetic algorithms. However, it is very difficult to identify fault states only from time domain waveform or frequency spectrum when the effect of the noise is so strong or fault feature is not obvious. Pattern recognition is then applied to fault diagnosis in this study through system clustering method. This paper processes experiment rig data that after statistics filter, and the accuracy of clustering analysis increases substantially.

  6. Unifying Blind Separation and Clustering for Resting-State EEG/MEG Functional Connectivity Analysis.

    PubMed

    Hirayama, Jun-Ichiro; Ogawa, Takeshi; Hyvärinen, Aapo

    2015-07-01

    Unsupervised analysis of the dynamics (nonstationarity) of functional brain connectivity during rest has recently received a lot of attention in the neuroimaging and neuroengineering communities. Most studies have used functional magnetic resonance imaging, but electroencephalography (EEG) and magnetoencephalography (MEG) also hold great promise for analyzing nonstationary functional connectivity with high temporal resolution. Previous EEG/MEG analyses divided the problem into two consecutive stages: the separation of neural sources and then the connectivity analysis of the separated sources. Such nonoptimal division into two stages may bias the result because of the different prior assumptions made about the data in the two stages. We propose a unified method for separating EEG/MEG sources and learning their functional connectivity (coactivation) patterns. We combine blind source separation (BSS) with unsupervised clustering of the activity levels of the sources in a single probabilistic model. A BSS is performed on the Hilbert transforms of band-limited EEG/MEG signals, and coactivation patterns are learned by a mixture model of source envelopes. Simulation studies show that the unified approach often outperforms conventional two-stage methods, indicating further the benefit of using Hilbert transforms to deal with oscillatory sources. Experiments on resting-state EEG data, acquired in conjunction with a cued motor imagery or nonimagery task, also show that the states (clusters) obtained by the proposed method often correlate better with physiologically meaningful quantities than those obtained by a two-stage method. PMID:25973547

  7. Eating or meeting? Cluster analysis reveals intricacies of white shark (Carcharodon carcharias) migration and offshore behavior.

    PubMed

    Jorgensen, Salvador J; Arnoldi, Natalie S; Estess, Ethan E; Chapple, Taylor K; Rückert, Martin; Anderson, Scot D; Block, Barbara A

    2012-01-01

    Elucidating how mobile ocean predators utilize the pelagic environment is vital to understanding the dynamics of oceanic species and ecosystems. Pop-up archival transmitting (PAT) tags have emerged as an important tool to describe animal migrations in oceanic environments where direct observation is not feasible. Available PAT tag data, however, are for the most part limited to geographic position, swimming depth and environmental temperature, making effective behavioral observation challenging. However, novel analysis approaches have the potential to extend the interpretive power of these limited observations. Here we developed an approach based on clustering analysis of PAT daily time-at-depth histogram records to distinguish behavioral modes in white sharks (Carcharodon carcharias). We found four dominant and distinctive behavioral clusters matching previously described behavioral patterns, including two distinctive offshore diving modes. Once validated, we mapped behavior mode occurrence in space and time. Our results demonstrate spatial, temporal and sex-based structure in the diving behavior of white sharks in the northeastern Pacific previously unrecognized including behavioral and migratory patterns resembling those of species with lek mating systems. We discuss our findings, in combination with available life history and environmental data, and propose specific testable hypotheses to distinguish between mating and foraging in northeastern Pacific white sharks that can provide a framework for future work. Our methodology can be applied to similar datasets from other species to further define behaviors during unobservable phases. PMID:23144707

  8. Joint Analysis of Galaxy-Galaxy Lensing and Galaxy Clustering: Methodology and Forecasts for DES

    SciTech Connect

    Park, Y.

    2015-07-19

    The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. Our analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we study how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. Finally, we conclude that DES data will provide powerful constraints on the evolution of structure growth in the universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that covered over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.

  9. Dietary Patterns Derived by Cluster Analysis are Associated with Cognitive Function among Korean Older Adults

    PubMed Central

    Kim, Jihye; Yu, Areum; Choi, Bo Youl; Nam, Jung Hyun; Kim, Mi Kyung; Oh, Dong Hoon; Yang, Yoon Jung

    2015-01-01

    The objective of this study was to investigate major dietary patterns among older Korean adults through cluster analysis and to determine an association between dietary patterns and cognitive function. This is a cross-sectional study. The data from the Korean Multi-Rural Communities Cohort Study was used. Participants included 765 participants aged 60 years and over. A quantitative food frequency questionnaire with 106 items was used to investigate dietary intake. The Korean version of the MMSE-KC (Mini-Mental Status Examination–Korean version) was used to assess cognitive function. Two major dietary patterns were identified using K-means cluster analysis. The “MFDF” dietary pattern indicated high consumption of Multigrain rice, Fish, Dairy products, Fruits and fruit juices, while the “WNC” dietary pattern referred to higher intakes of White rice, Noodles, and Coffee. Means of the total MMSE-KC and orientation score of the participants in the MFDF dietary pattern were higher than those of the WNC dietary pattern. Compared with the WNC dietary pattern, the MFDF dietary pattern showed a lower risk of cognitive impairment after adjusting for covariates (OR 0.64, 95% CI 0.44–0.94). The MFDF dietary pattern, with high consumption of multigrain rice, fish, dairy products, and fruits may be related to better cognition among Korean older adults. PMID:26035243

  10. Eating or meeting? Cluster analysis reveals intricacies of white shark (Carcharodon carcharias) migration and offshore behavior.

    PubMed

    Jorgensen, Salvador J; Arnoldi, Natalie S; Estess, Ethan E; Chapple, Taylor K; Rückert, Martin; Anderson, Scot D; Block, Barbara A

    2012-01-01

    Elucidating how mobile ocean predators utilize the pelagic environment is vital to understanding the dynamics of oceanic species and ecosystems. Pop-up archival transmitting (PAT) tags have emerged as an important tool to describe animal migrations in oceanic environments where direct observation is not feasible. Available PAT tag data, however, are for the most part limited to geographic position, swimming depth and environmental temperature, making effective behavioral observation challenging. However, novel analysis approaches have the potential to extend the interpretive power of these limited observations. Here we developed an approach based on clustering analysis of PAT daily time-at-depth histogram records to distinguish behavioral modes in white sharks (Carcharodon carcharias). We found four dominant and distinctive behavioral clusters matching previously described behavioral patterns, including two distinctive offshore diving modes. Once validated, we mapped behavior mode occurrence in space and time. Our results demonstrate spatial, temporal and sex-based structure in the diving behavior of white sharks in the northeastern Pacific previously unrecognized including behavioral and migratory patterns resembling those of species with lek mating systems. We discuss our findings, in combination with available life history and environmental data, and propose specific testable hypotheses to distinguish between mating and foraging in northeastern Pacific white sharks that can provide a framework for future work. Our methodology can be applied to similar datasets from other species to further define behaviors during unobservable phases.

  11. Eating or Meeting? Cluster Analysis Reveals Intricacies of White Shark (Carcharodon carcharias) Migration and Offshore Behavior

    PubMed Central

    Jorgensen, Salvador J.; Arnoldi, Natalie S.; Estess, Ethan E.; Chapple, Taylor K.; Rückert, Martin; Anderson, Scot D.; Block, Barbara A.

    2012-01-01

    Elucidating how mobile ocean predators utilize the pelagic environment is vital to understanding the dynamics of oceanic species and ecosystems. Pop-up archival transmitting (PAT) tags have emerged as an important tool to describe animal migrations in oceanic environments where direct observation is not feasible. Available PAT tag data, however, are for the most part limited to geographic position, swimming depth and environmental temperature, making effective behavioral observation challenging. However, novel analysis approaches have the potential to extend the interpretive power of these limited observations. Here we developed an approach based on clustering analysis of PAT daily time-at-depth histogram records to distinguish behavioral modes in white sharks (Carcharodon carcharias). We found four dominant and distinctive behavioral clusters matching previously described behavioral patterns, including two distinctive offshore diving modes. Once validated, we mapped behavior mode occurrence in space and time. Our results demonstrate spatial, temporal and sex-based structure in the diving behavior of white sharks in the northeastern Pacific previously unrecognized including behavioral and migratory patterns resembling those of species with lek mating systems. We discuss our findings, in combination with available life history and environmental data, and propose specific testable hypotheses to distinguish between mating and foraging in northeastern Pacific white sharks that can provide a framework for future work. Our methodology can be applied to similar datasets from other species to further define behaviors during unobservable phases. PMID:23144707

  12. THE CLUSTER AGES EXPERIMENT (CASE). VII. ANALYSIS OF TWO ECLIPSING BINARIES IN THE GLOBULAR CLUSTER NGC 6362

    SciTech Connect

    Kaluzny, J.; Rozyczka, M.; Schwarzenberg-Czerny, A.; Mazur, B.; Thompson, I. B.; Dotter, A.; Burley, G. S.; Rucinski, S. M. E-mail: alex@camk.edu.pl E-mail: ian@obs.carnegiescience.edu E-mail: greg.burley@gmail.com

    2015-11-15

    We use photometric and spectroscopic observations of the detached eclipsing binaries V40 and V41 in the globular cluster NGC 6362 to derive masses, radii, and luminosities of the component stars. The orbital periods of these systems are 5.30 and 17.89 days, respectively. The measured masses of the primary and secondary components (M{sub p}, M{sub s}) are (0.8337 ± 0.0063, 0.7947 ± 0.0048) M{sub ⊙} for V40 and (0.8215 ± 0.0058, 0.7280 ± 0.0047) M{sub ⊙} for V41. The measured radii (R{sub p}, R{sub s}) are (1.3253 ± 0.0075, 0.997 ± 0.013) R{sub ⊙} for V40 and (1.0739 ± 0.0048, 0.7307 ± 0.0046) R{sub ⊙} for V41. Based on the derived luminosities, we find that the distance modulus of the cluster is 14.74 ± 0.04 mag—in good agreement with 14.72 mag obtained from color–magnitude diagram (CMD) fitting. We compare the absolute parameters of component stars with theoretical isochrones in mass–radius and mass–luminosity diagrams. For assumed abundances [Fe/H] = −1.07, [α/Fe] = 0.4, and Y = 0.25 we find the most probable age of V40 to be 11.7 ± 0.2 Gyr, compatible with the age of the cluster derived from CMD fitting (12.5 ± 0.5 Gyr). V41 seems to be markedly younger than V40. If independently confirmed, this result will suggest that V41 belongs to the younger of the two stellar populations recently discovered in NGC 6362. The orbits of both systems are eccentric. Given the orbital period and age of V40, its orbit should have been tidally circularized some ∼7 Gyr ago. The observed eccentricity is most likely the result of a relatively recent close stellar encounter.

  13. An application of cluster analysis for determining homogeneous subregions: The agroclimatological point of view. [Rio Grande do Sul, Brazil

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Cappelletti, C. A.

    1982-01-01

    A stratification oriented to crop area and yield estimation problems was performed using an algorithm of clustering. The variables used were a set of agroclimatological characteristics measured in each one of the 232 municipalities of the State of Rio Grande do Sul, Brazil. A nonhierarchical cluster analysis was used and the pseudo F-statistics criterion was implemented for determining the "cut point" in the number of strata.

  14. Spatio-temporal cluster analysis of county-based human West Nile virus incidence in the continental United States

    PubMed Central

    Sugumaran, Ramanathan; Larson, Scott R; DeGroote, John P

    2009-01-01

    Background West Nile virus (WNV) is a vector-borne illness that can severely affect human health. After introduction on the East Coast in 1999, the virus quickly spread and became established across the continental United States. However, there have been significant variations in levels of human WNV incidence spatially and temporally. In order to quantify these variations, we used Kulldorff's spatial scan statistic and Anselin's Local Moran's I statistic to uncover spatial clustering of human WNV incidence at the county level in the continental United States from 2002–2008. These two methods were applied with varying analysis thresholds in order to evaluate sensitivity of clusters identified. Results The spatial scan and Local Moran's I statistics revealed several consistent, important clusters or hot-spots with significant year-to-year variation. In 2002, before the pathogen had spread throughout the country, there were significant regional clusters in the upper Midwest and in Louisiana and Mississippi. The largest and most consistent area of clustering throughout the study period was in the Northern Great Plains region including large portions of Nebraska, South Dakota, and North Dakota, and significant sections of Colorado, Wyoming, and Montana. In 2006, a very strong cluster centered in southwest Idaho was prominent. Both the spatial scan statistic and the Local Moran's I statistic were sensitive to the choice of input parameters. Conclusion Significant spatial clustering of human WNV incidence has been demonstrated in the continental United States from 2002–2008. The two techniques were not always consistent in the location and size of clusters identified. Although there was significant inter-annual variation, consistent areas of clustering, with the most persistent and evident being in the Northern Great Plains, were demonstrated. Given the wide variety of mosquito species responsible and the environmental conditions they require, further spatio

  15. Dengue fever occurrence and vector detection by larval survey, ovitrap and MosquiTRAP: a space-time clusters analysis.

    PubMed

    de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo

    2012-01-01

    The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools. PMID:22848729

  16. Dengue Fever Occurrence and Vector Detection by Larval Survey, Ovitrap and MosquiTRAP: A Space-Time Clusters Analysis

    PubMed Central

    de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo

    2012-01-01

    The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools. PMID:22848729

  17. Dengue fever occurrence and vector detection by larval survey, ovitrap and MosquiTRAP: a space-time clusters analysis.

    PubMed

    de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo

    2012-01-01

    The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools.

  18. Morphometry and Cluster Analysis of Low Shield Volcanoes on Earth and Mars

    NASA Astrophysics Data System (ADS)

    Henderson, A.; Christiansen, E. H.; Radebaugh, J.

    2015-12-01

    Volcanoes are common on all terrestrial planets and their morphology is influenced by eruption mechanisms, volumes, and compositions and temperatures of the magmas; these are in turn influenced by the tectonic setting. In an attempt to better understand the relationship between morphometry and volcanic processes, we compared low-shield volcanoes on Syria Planum, Mars, with basaltic shields of the eastern Snake River Plain (eSRP).We used 133 volcanoes on Syria Planum that are covered by MOLA and HRSC elevation data and 246 eSRP shields covered by the NED. Shields on Syria Planum average 191 +/- 88 m tall, 12 +/- 6 km in diameter, 16 +/- 28 km3 in volume, and have 1.7° +/- 0.8 flank slopes. eSRP shields average 83 +/- 44 m tall, 4 +/- 3 km in diameter, 0.8 +/- 2 km3 in volume, and have 2.5° +/- 1 flank slopes. Bivariate plots of morphometric characteristics show that Syria Planum and eSRP low shields form the extremes of the same morphospace shared with some Icelandic olivine tholeiite shields, but is generally distinct from other terrestrial volcanoes. Cluster analysis of SP and eSRP shields with other terrestrial volcanoes separates these volcanoes into one cluster and the majority of them into the same sub-cluster that is distinct from other terrestrial volcanoes. Principal component and cluster analysis of Syria Planum and eSRP shields using height, area, volume, slope, and eccentricity shows that Syria Planum and eSRP low-shields are similar in shape (slope and eccentricity). Apparently, these low shields formed by similar processes involving Hawaiian-type eruptions of low viscosity (mafic) lavas with fissure controlled eruptions, narrowing to central vents. Initially high eruption rates and long, tube-fed lava flows shifted to the development of small lava lakes that repeatedly overflowed, and on some with late fountaining to form steeper spatter ramparts. However, Syria Planum shields are systematically larger than those on the eastern Snake River Plain. The

  19. Diversity of Xiphinema americanum-group Species and Hierarchical Cluster Analysis of Morphometrics

    PubMed Central

    Lamberti, Franco; Ciancio, Aurelio

    1993-01-01

    Of the 39 species composing the Xiphinema americanum group, 14 were described originally from North America and two others have been reported from this region. Many species are very similar morphologically and can be distinguished only by a difficult comparison of various combinations of some morphometric characters. Study of morphometrics of 49 populations, including the type populations of the 39 species attributed to this group, by principal component analysis and hierarchical cluster analysis placed the populations into five subgroups, proposed here as the X. brevicolle subgroup (seven species), the X. americanum subgroup (17 species), the X. taylori subgroup (two species), the X. pachtaicum subgroup (eight species), and the X. lambertii subgroup (five species). PMID:19279776

  20. Potential emission flux to aerosol pollutants over Bengal Gangetic plain through combined trajectory clustering and aerosol source fields analysis

    NASA Astrophysics Data System (ADS)

    Kumar, D. Bharath; Verma, S.

    2016-09-01

    A hybrid source-receptor analysis was carried out to evaluate the potential emission flux to winter monsoon (WinMon) aerosols over Bengal Gangetic plain urban (Kolkata, Kol) and semi-urban atmospheres (Kharagpur, Kgp). This was done through application of fuzzy c-mean clustering to back-trajectory data combined with emission flux and residence time weighted aerosols analysis. WinMon mean aerosol optical depth (AOD) and angstrom exponent (AE) at Kol (AOD: 0.77; AE: 1.17) were respectively slightly higher than and nearly equal to that at Kgp (AOD: 0.71; AE: 1.18). Out of six source region clusters over Indian subcontinent and two over Indian oceanic region, the cluster mean AOD was the highest when associated with the mean path of air mass originating from the Bay of Bengal and the Arabian sea clusters at Kol and that from the Indo-Gangetic plain (IGP) cluster at Kgp. Spatial distribution of weighted AOD fields showed the highest potential source of aerosols over the IGP, primarily over upper IGP (e.g. Punjab, Haryana), lower IGP (e.g. Uttarpradesh) and eastern region (e.g. west Bengal, Bihar, northeast India) clusters. The emission flux contribution potential (EFCP) of fossil fuel (FF) emissions at surface (SL) of Kol/Kgp, elevated layer (EL) of Kol, and of biomass burning (BB) emissions at SL of Kol were primarily from upper, lower, upper/lower IGP clusters respectively. The EFCP of FF/BB emissions at Kgp-EL/SL, and that of BB at EL of Kol/Kgp were mainly from eastern region and Africa (AFR) clusters respectively. Though the AFR cluster was constituted of significantly high emission flux source potential of dust emissions, the EFCP of dust from northwest India (NWI) was comparable to that from AFR at Kol SL/EL.

  1. Characterization of corresponding microcalcification clusters on temporal pairs of mammograms for interval change analysis: comparison of classifiers

    NASA Astrophysics Data System (ADS)

    Hadjiiski, Lubomir; Drouillard, Douglas; Chan, Heang-Ping; Sahiner, Berkman; Helvie, Mark A.; Roubidoux, Marilyn; Zhou, Chuan

    2006-03-01

    We are developing an automated system for analysis of microcalcification clusters on serial mammograms. Our automated system consists of two stages: (1) automatic registration of corresponding clusters on temporal pairs of mammograms producing true (TP-TP) and false (TP-FP) pairs; and (2) characterization of temporal pairs of clusters as malignant and benign using a temporal classifier. In this study, we focussed on the design of the temporal classifier. Morphological and texture (RLS and GLDS) features are automatically extracted from the detected current and prior cluster locations. Additionally, difference morphological and RLS features are obtained. The automatically detected cluster locations on the temporal pairs may deviate from the optimal locations as selected by expert radiologists. This will introduce "noise" to the extracted features and make the classification task more difficult. Linear discriminant analysis (LDA) and support vector machine (SVM) classifiers were trained to classify the true and false pairs. Leaveone-case-out resampling method was used for feature selection and classifier design. In this study, 175 serial mammogram pairs containing biopsy-proven microcalcification clusters were used. At the first stage of the system, 85% (149/175) of the TP-TP pairs were identified with 15 false matches within the 164 image pairs that had computerdetected clusters on the priors. At the second stage, an average of 7 features were selected (4 difference morphological, 1 difference RLS and 2 current GLDS). The LDA and SVM temporal classifiers achieved test A z of 0.83 and 0.82, respectively, for the classification of the 164 cluster temporal pairs as malignant or benign. In comparison, an MQSA radiologist achieved an A z of 0.72. Both the LDA and SVM classifiers were able to classify the automatically detected temporal pairs of microcalcification clusters with accuracy comparable to that of an experienced radiologist.

  2. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    PubMed

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences.

  3. Quantification and clustering of phenotypic screening data using time-series analysis for chemotherapy of schistosomiasis

    PubMed Central

    2012-01-01

    Background Neglected tropical diseases, especially those caused by helminths, constitute some of the most common infections of the world's poorest people. Development of techniques for automated, high-throughput drug screening against these diseases, especially in whole-organism settings, constitutes one of the great challenges of modern drug discovery. Method We present a method for enabling high-throughput phenotypic drug screening against diseases caused by helminths with a focus on schistosomiasis. The proposed method allows for a quantitative analysis of the systemic impact of a drug molecule on the pathogen as exhibited by the complex continuum of its phenotypic responses. This method consists of two key parts: first, biological image analysis is employed to automatically monitor and quantify shape-, appearance-, and motion-based phenotypes of the parasites. Next, we represent these phenotypes as time-series and show how to compare, cluster, and quantitatively reason about them using techniques of time-series analysis. Results We present results on a number of algorithmic issues pertinent to the time-series representation of phenotypes. These include results on appropriate representation of phenotypic time-series, analysis of different time-series similarity measures for comparing phenotypic responses over time, and techniques for clustering such responses by similarity. Finally, we show how these algorithmic techniques can be used for quantifying the complex continuum of phenotypic responses of parasites. An important corollary is the ability of our method to recognize and rigorously group parasites based on the variability of their phenotypic response to different drugs. Conclusions The methods and results presented in this paper enable automatic and quantitative scoring of high-throughput phenotypic screens focused on helmintic diseases. Furthermore, these methods allow us to analyze and stratify parasites based on their phenotypic response to drugs

  4. Spherical cluster analysis for beam angle optimization in intensity-modulated radiation therapy treatment planning

    NASA Astrophysics Data System (ADS)

    Bangert, Mark; Oelfke, Uwe

    2010-10-01

    An intuitive heuristic to establish beam configurations for intensity-modulated radiation therapy is introduced as an extension of beam ensemble selection strategies applying scalar scoring functions. It is validated by treatment plan comparisons for three intra-cranial, pancreas, and prostate cases each. Based on a patient specific matrix listing the radiological quality of candidate beam directions individually for every target voxel, a set of locally ideal beam angles is generated. The spherical distribution of locally ideal beam angles is characteristic for every treatment site and patient: ideal beam angles typically cluster around distinct orientations. We interpret the cluster centroids, which are identified with a spherical K-means algorithm, as irradiation angles of an intensity-modulated radiation therapy treatment plan. The fluence profiles are subsequently optimized during a conventional inverse planning process. The average computation time for the pre-optimization of a beam ensemble is six minutes on a state-of-the-art work station. The treatment planning study demonstrates the potential benefit of the proposed beam angle optimization strategy. For the three prostate cases under investigation, the standard treatment plans applying nine coplanar equi-spaced beams and treatment plans applying an optimized non-coplanar nine-beam ensemble yield clinically comparable dose distributions. For symmetric patient geometries, the dose distribution formed by nine equi-spaced coplanar beams cannot be improved significantly. For the three pancreas and intra-cranial cases under investigation, the optimized non-coplanar beam ensembles enable better sparing of organs at risk while guaranteeing equivalent target coverage. Beam angle optimization by spherical cluster analysis shows the biggest impact for target volumes located asymmetrically within the patient and close to organs at risk.

  5. pong: fast analysis and visualization of latent clusters in population genetic data

    PubMed Central

    Behr, Aaron A.; Liu, Katherine Z.; Liu-Fang, Gracie; Nakka, Priyanka; Ramachandran, Sohini

    2016-01-01

    Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. Results: We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. Availability and Implementation: pong is freely available and can be installed using the Python package management system pip. pong’s source code is available at https://github.com/abehr/pong. Contact: aaron_behr@alumni.brown.edu or sramachandran@brown.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:27283948

  6. Application of Factor Analysis on the Financial Ratios of Indian Cement Industry and Validation of the Results by Cluster Analysis

    NASA Astrophysics Data System (ADS)

    De, Anupam; Bandyopadhyay, Gautam; Chakraborty, B. N.

    2010-10-01

    Financial ratio analysis is an important and commonly used tool in analyzing financial health of a firm. Quite a large number of financial ratios, which can be categorized in different groups, are used for this analysis. However, to reduce number of ratios to be used for financial analysis and regrouping them into different groups on basis of empirical evidence, Factor Analysis technique is being used successfully by different researches during the last three decades. In this study Factor Analysis has been applied over audited financial data of Indian cement companies for a period of 10 years. The sample companies are listed on the Stock Exchange India (BSE and NSE). Factor Analysis, conducted over 44 variables (financial ratios) grouped in 7 categories, resulted in 11 underlying categories (factors). Each factor is named in an appropriate manner considering the factor loads and constituent variables (ratios). Representative ratios are identified for each such factor. To validate the results of Factor Analysis and to reach final conclusion regarding the representative ratios, Cluster Analysis had been performed.

  7. A cluster analysis of tic symptoms in children and adults with Tourette syndrome: clinical correlates and treatment outcome.

    PubMed

    McGuire, Joseph F; Nyirabahizi, Epiphanie; Kircanski, Katharina; Piacentini, John; Peterson, Alan L; Woods, Douglas W; Wilhelm, Sabine; Walkup, John T; Scahill, Lawrence

    2013-12-30

    Cluster analytic methods have examined the symptom presentation of chronic tic disorders (CTDs), with limited agreement across studies. The present study investigated patterns, clinical correlates, and treatment outcome of tic symptoms. 239 youth and adults with CTDs completed a battery of assessments at baseline to determine diagnoses, tic severity, and clinical characteristics. Participants were randomly assigned to receive either a comprehensive behavioral intervention for tics (CBIT) or psychoeducation and supportive therapy (PST). A cluster analysis was conducted on the baseline Yale Global Tic Severity Scale (YGTSS) symptom checklist to identify the constellations of tic symptoms. Four tic clusters were identified: Impulse Control and Complex Phonic Tics; Complex Motor Tics; Simple Head Motor/Vocal Tics; and Primarily Simple Motor Tics. Frequencies of tic symptoms showed few differences across youth and adults. Tic clusters had small associations with clinical characteristics and showed no associations to the presence of coexisting psychiatric conditions. Cluster membership scores did not predict treatment response to CBIT or tic severity reductions. Tic symptoms distinctly cluster with little difference across youth and adults, or coexisting conditions. This study, which is the first to examine tic clusters and response to treatment, suggested that tic symptom profiles respond equally well to CBIT. Clinical trials.gov. identifiers: NCT00218777; NCT00231985.

  8. Analyzing Patients' Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    PubMed Central

    Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  9. Analyzing patients' values by applying cluster analysis and LRFM model in a pediatric dental clinic in Taiwan.

    PubMed

    Wu, Hsin-Hung; Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  10. SUPERMODEL ANALYSIS OF THE HARD X-RAY EXCESS IN THE COMA CLUSTER

    SciTech Connect

    Fusco-Femiano, R.; Lapi, A.

    2011-05-10

    The Supermodel (SM) provides an accurate description of the thermal contribution by the hot intracluster plasma which is crucial for the analysis of the hard excess. In this paper, the thermal emissivity in the Coma cluster is derived starting from the intracluster gas temperature and density profiles obtained by the SM analysis of X-ray observables: the XMM-Newton temperature profile and the ROSAT brightness distribution. The SM analysis of the BeppoSAX/Phoswich Detector System (PDS) hard X-ray (HXR) spectrum confirms our previous results, namely, an excess at the confidence level (c.l.) of {approx}4.8{sigma} and a nonthermal (NT) flux of (1.30 {+-} 0.40) x 10{sup -11} erg cm{sup -2} erg cm{sup -1} in the energy range 20-80 keV. A recent joint XMM-Newton/Suzaku analysis reports an upper limit of {approx}6 x 10{sup -12} erg cm{sup -2} erg cm{sup -1} in the energy range 20-80 keV for the NT flux with an average gas temperature of 8.45 {+-} 0.06 keV and an excess of NT radiation at a c.l. above 4{sigma}, without including systematic effects, for an average XMM-Newton temperature of 8.2 keV in the Suzaku/HXD-PIN FOV, in agreement with our earlier PDS analysis. Here we present a further evidence of the compatibility between the Suzaku and BeppoSAX spectra, obtained by our SM analysis of the PDS data, when the smaller size of the HXD-PIN FOV and the two different average temperatures derived by XMM-Newton and by the joint XMM-Newton/Suzaku analysis are taken into account. The consistency of the PDS and HXD-PIN spectra reaffirms the presence of an NT component in the HXR spectrum of the Coma cluster. The SM analysis of the PDS data reports an excess at c.l. above 4{sigma} also for the higher average temperature of 8.45 keV thanks to the PDS FOV being considerably greater than the HXD-PIN FOV.

  11. Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Professional Nurse (Associate Degree). Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lake County Area Vocational Center, Grayslake, IL.

    This document contains a task analysis for health occupations (professional nurse) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

  12. THE CLUSTER AGES EXPERIMENT (CASE). V. ANALYSIS OF THREE ECLIPSING BINARIES IN THE GLOBULAR CLUSTER M4

    SciTech Connect

    Kaluzny, J.; Rozyczka, M.; Krzeminski, W.; Pych, W.; Thompson, I. B.; Burley, G. S.; Shectman, S. A.; Dotter, A.; Rucinski, S. M. E-mail: mnr@camk.edu.pl E-mail: batka@camk.edu.pl E-mail: ian@obs.carnegiescience.edu E-mail: shec@obs.carnegiescience.edu E-mail: rucinski@astro.utoronto.ca

    2013-02-01

    We use photometric and spectroscopic observations of the eclipsing binaries V65, V66, and V69 in the field of the globular cluster M4 to derive masses, radii, and luminosities of their components. The orbital periods of these systems are 2.29, 8.11, and 48.19 days, respectively. The measured masses of the primary and secondary components (M{sub p} and M{sub s} ) are 0.8035 {+-} 0.0086 and 0.6050 {+-} 0.0044 M{sub Sun} for V65, 0.7842 {+-} 0.0045 and 0.7443 {+-} 0.0042 M{sub Sun} for V66, and 0.7665 {+-} 0.0053 and 0.7278 {+-} 0/0048 M{sub Sun} for V69. The measured radii (R{sub p} and R{sub s} ) are 1.147 {+-} 0.010 and 0.6110 {+-} 0.0092 R{sub Sun} for V66, 0.9347 {+-} 0.0048 and 0.8298 {+-} 0.0053 R{sub Sun} for V66, and 0.8655 {+-} 0.0097 and 0.8074 {+-} 0.0080 R{sub Sun} for V69. The orbits of V65 and V66 are circular, whereas that of V69 has an eccentricity of 0.38. Based on systemic velocities and relative proper motions, we show that all three systems are members of the cluster. We find that the distance to M4 is 1.82 {+-} 0.04 kpc-in good agreement with recent estimates based on entirely different methods. We compare the absolute parameters of V66 and V69 with two sets of theoretical isochrones in mass-radius and mass-luminosity diagrams, and for assumed [Fe/H] = -1.20, [{alpha}/Fe] = 0.4, and Y = 0.25 we find the most probable age of M4 to be between 11.2 and 11.3 Gyr. Color-magnitude diagram (CMD) fitting with the same parameters yields an age close to, or slightly in excess of, 12 Gyr. However, considering the sources of uncertainty involved in CMD fitting, these two methods of age determination are not discrepant. Age and distance determinations can be further improved when infrared eclipse photometry is obtained.

  13. Addressing preference heterogeneity in public health policy by combining Cluster Analysis and Multi-Criteria Decision Analysis: Proof of Method.

    PubMed

    Kaltoft, Mette Kjer; Turner, Robin; Cunich, Michelle; Salkeld, Glenn; Nielsen, Jesper Bo; Dowie, Jack

    2015-01-01

    The use of subgroups based on biological-clinical and socio-demographic variables to deal with population heterogeneity is well-established in public policy. The use of subgroups based on preferences is rare, except when religion based, and controversial. If it were decided to treat subgroup preferences as valid determinants of public policy, a transparent analytical procedure is needed. In this proof of method study we show how public preferences could be incorporated into policy decisions in a way that respects both the multi-criterial nature of those decisions, and the heterogeneity of the population in relation to the importance assigned to relevant criteria. It involves combining Cluster Analysis (CA), to generate the subgroup sets of preferences, with Multi-Criteria Decision Analysis (MCDA), to provide the policy framework into which the clustered preferences are entered. We employ three techniques of CA to demonstrate that not only do different techniques produce different clusters, but that choosing among techniques (as well as developing the MCDA structure) is an important task to be undertaken in implementing the approach outlined in any specific policy context. Data for the illustrative, not substantive, application are from a Randomized Controlled Trial of online decision aids for Australian men aged 40-69 years considering Prostate-specific Antigen testing for prostate cancer. We show that such analyses can provide policy-makers with insights into the criterion-specific needs of different subgroups. Implementing CA and MCDA in combination to assist in the development of policies on important health and community issues such as drug coverage, reimbursement, and screening programs, poses major challenges -conceptual, methodological, ethical-political, and practical - but most are exposed by the techniques, not created by them.

  14. Identifying and Tracking Individual Updraft Cores using Cluster Analysis: A TWP-ICE case study

    NASA Astrophysics Data System (ADS)

    Li, X.; Tao, W.; Collis, S. M.; Varble, A.

    2013-12-01

    Cumulus parameterizations in GCMs depend strongly on the vertical velocity structures of convective updraft cores, or plumes. There hasn't been an accurate way of identifying these cores. The majority of previous studies treat the updraft as a single grid column entity, thus missing many intrinsic characteristics, e.g., the size, strength and spatial orientation of an individual core, its life cycle, and the time variations of the entrainment/detrainment rates associated with its life cycle. In this study, we attempt to apply an innovative algorithm based on the centroid-based k-means cluster analysis to improve our understanding of convection and its associated updraft cores. Both 3-D Doppler radar retrievals and cloud-resolving model simulations of a TWP-ICE campaign case during the monsoon period will be used to test and improve this algorithm. This will provide for more in-depth comparisons between CRM simulations and observations that were not possible previously using the traditional piecewise analysis with each updraft column. The first step is to identify the strongest cores (maximum velocity >10 m/s), since they are well defined and produce definite answers when the cluster analysis algorithm is applied. The preliminary results show that the radar retrieved updraft cores are smaller in size and with the maximum velocity located uniformly at higher levels compared with model simulations. Overall, the model simulations produce much stronger cores compared with the radar retrievals. Within the model simulations, the bulk microphysical scheme simulation produces stronger cores than the spectral bin microphysical scheme. Planned researches include using high temporal-resolution simulations to further track the life cycle of individual updraft cores and study their characteristics.

  15. Molecular Reclassification of Crohn's Disease by Cluster Analysis of Genetic Variants

    PubMed Central

    Cleynen, Isabelle; Mahachie John, Jestinah M.; Henckaerts, Liesbet; Van Moerkercke, Wouter; Rutgeerts, Paul; Van Steen, Kristel; Vermeire, Severine

    2010-01-01

    Background Crohn's Disease (CD) has a heterogeneous presentation, and is typically classified according to extent and location of disease. The genetic susceptibility to CD is well known and genome-wide association scans (GWAS) and meta-analysis thereof have identified over 30 susceptibility loci. Except for the association between ileal CD and NOD2 mutations, efforts in trying to link CD genetics to clinical subphenotypes have not been very successful. We hypothesized that the large number of confirmed genetic variants enables (better) classification of CD patients. Methodology/Principal Findings To look for genetic-based subgroups, genotyping results of 46 SNPs identified from CD GWAS were analyzed by Latent Class Analysis (LCA) in CD patients and in healthy controls. Six genetic-based subgroups were identified in CD patients, which were significantly different from the five subgroups found in healthy controls. The identified CD-specific clusters are therefore likely to contribute to disease behavior. We then looked at whether we could relate the genetic-based subgroups to the currently used clinical parameters. Although modest differences in prevalence of disease location and behavior could be observed among the CD clusters, Random Forest analysis showed that patients could not be allocated to one of the 6 genetic-based subgroups based on the typically used clinical parameters alone. This points to a poor relationship between the genetic-based subgroups and the used clinical subphenotypes. Conclusions/Significance This approach serves as a first step to reclassify Crohn's disease. The used technique can be applied to other common complex diseases as well, and will help to complete patient characterization, in order to evolve towards personalized medicine. PMID:20886065

  16. Cluster analysis of Pinus taiwanensis for its ex situ conservation in China.

    PubMed

    Gao, X; Shi, L; Wu, Z

    2015-01-01

    Pinus taiwanensis Hayata is one of the most famous sights in the Huangshan Scenic Resort, China, because of its strong adaptability and ability to survive; however, this endemic species is currently under threat in China. Relationships between different P. taiwanensis populations have been well-documented; however, few studies have been conducted on how to protect this rare pine. In the present study, we propose the ex situ conservation of this species using geographical information system (GIS) cluster and genetic diversity analyses. The GIS cluster method was conducted as a preliminary analysis for establishing a sampling site category based on climatic factors. Genetic diversity was analyzed using morphological and genetic traits. By combining geographical information with genetic data, we demonstrate that growing conditions, morphological traits, and the genetic make-up of the population in the Huangshan Scenic Resort were most similar to conditions on Tianmu Mountain. Therefore, we suggest that Tianmu Mountain is the best choice for the ex situ conservation of P. taiwanensis. Our results provide a molecular basis for the sustainable management, utilization, and conservation of this species in Huangshan Scenic Resort.

  17. Outcome of patients with autoimmune diseases in the intensive care unit: a mixed cluster analysis

    PubMed Central

    Bernal-Macías, Santiago; Reyes-Beltrán, Benjamín; Molano-González, Nicolás; Augusto Vega, Daniel; Bichernall, Claudia; Díaz, Luis Aurelio; Rojas-Villarraga, Adriana; Anaya, Juan-Manuel

    2015-01-01

    Objectives The interest on autoimmune diseases (ADs) and their outcome at the intensive care unit (ICU) has increased due to the clinical challenge for diagnosis and management as well as for prognosis. The current work presents a-year experience on these topics in a tertiary hospital. Methods The mixed-cluster methodology based on multivariate descriptive methods such as principal component analysis and multiple correspondence analyses was performed to summarize sets of related variables with strong associations and common clinical context. Results Fifty adult patients with ADs with a mean age of 46.7±17.55 years were assessed. The two most common diagnoses were systemic lupus erythematosus and systemic sclerosis, registered in 45% and 20% of patients, respectively. The main causes of admission to ICU were infection and AD flare up, observed in 36% and 24%, respectively. Mortality during ICU stay was 24%. The length of hospital stay before ICU admission, shock, vasopressors, mechanical ventilation, abdominal sepsis, Glasgow score and plasmapheresis were all factors associated with mortality. Two new clinical clusters variables (NCVs) were defined: Time ICU and ICU Support Profile, which were associated with survivor and no survivor variables. Conclusions Identification of single factors and groups of factors from NCVs will allow implementation of early and aggressive therapies in patients with ADs at the ICU in order to avoid fatal outcomes PMID:26688741

  18. Active Tectonics of Southern California Revealed by Cluster Analysis of GPS Velocities

    NASA Astrophysics Data System (ADS)

    Thatcher, W. R.; Savage, J. C.; Simpson, R. W.

    2013-12-01

    We use cluster analysis of the USGS National Seismic Hazard Map GPS velocity field for southern California with standard deviations < 1 mm/yr to determine velocity gradients that locate the most important faults, the elastic strain associated with them, and regions of possible block-like behavior. Seven to ten well resolved clusters are statistically significant and spatially distinct with small overlap. In map view (see figure), the 7 clusters solution shows bands of relatively constant velocity sub-parallel to the San Andreas (SAF) and San Jacinto (SJF) faults and the major faults of the eastern Mojave shear zone (EMSZ). These bands are due both to elastic strain accumulation on the SAF and relative motion across lower slip rate faults in the EMSZ and Los Angeles and Ventura basins. At the largest scale, the 7-cluster map shows two main trends. The blue dots define the SJ and SA faults from northwest of the Salton Sea (SS) to Parkfield (P); the grey/magenta boundary suggests that the defined Eastern California Shear Zone could be extended farther south to the Salton Sea. The short ~80-km-long San Gorgonio Pass-San Bernardino Mountains (SGP) segment of the SAF has a much lower slip rate, ~7 mm/yr of right-lateral oblique convergence. As generally shown by previous GPS studies, right-lateral strike-slip movement rates vary considerably along the SAF. In the Imperial Valley (IV) the rate is ~40 mm/yr; east of the Salton Sea it drops to ~20 mm/yr, with 10-15 mm/yr having been shunted westward to the SJF; north of the Salton Sea ~10-15 mm/yr of strike-slip is transferred to the faults of the eastern Mojave; therefore the east-trending faults of San Gorgonio Pass (SGP) take up only ~5 mm/yr of strike slip and ~equal amounts of north-south shortening; on the Mojave (M) segment of the SAF the slip rate increases to ~15-20 mm/yr in the vicinity of Cajon Pass (CP) because of transfer of SJF slip back onto the San Andreas; northwest of Tejon Pass the rate increases again to

  19. The Asiago-ESO/RASS QSO Survey. III. Clustering Analysis and Theoretical Interpretation

    NASA Astrophysics Data System (ADS)

    Grazian, Andrea; Negrello, Mattia; Moscardini, Lauro; Cristiani, Stefano; Haehnelt, Martin G.; Matarrese, Sabino; Omizzolo, Alessandro; Vanzella, Eros

    2004-02-01

    This is the third paper in a series describing the Asiago-ESO/RASS QSO Survey (AERQS), a project aimed at the construction of an all-sky statistically well-defined sample of relatively bright quasi-stellar objects (QSOs; B<=15) at z<=0.3. We present here the clustering analysis of the full spectroscopically identified database (392 active galactic nuclei [AGNs]). The clustering signal at 0.02clustered in a way similar to radio galaxies, extremely red objects (EROs), and early-type galaxies in general, although with a marginally smaller amplitude. The comparison with recent results from the Two Degree Field (2dF) QSO Redshift Survey (2QZ) shows that the correlation function of QSOs is constant in redshift or marginally increasing toward low redshift. We discuss this behavior with physically motivated models, deriving interesting constraints on the typical mass of the dark matter halos hosting QSOs, MDMH~1012.7h-1Msolar (1012.0-1013.5h-1Msolar at 1 σ confidence level). Finally, we use the clustering data to infer the physical properties of local AGNs, obtaining MBH~2×108h-1Msolar (1×107-3×109h-1Msolar) for the mass of the active black holes, τAGN~8×106 yr (2×106-5×107 yr) for their lifetime and η~0.14 for their efficiency (always for a ΛCDM model). Based on observations collected at the European Southern Observatory, Chile (ESO P66.A-0277 and ESO P67.A-0537), with the Steward Observatory in Arizona and the National Telescope Galileo (TNG) during period AO3.

  20. Three-dimensional Multi-probe Analysis of the Galaxy Cluster A1689

    NASA Astrophysics Data System (ADS)

    Umetsu, Keiichi; Sereno, Mauro; Medezinski, Elinor; Nonino, Mario; Mroczkowski, Tony; Diego, Jose M.; Ettori, Stefano; Okabe, Nobuhiro; Broadhurst, Tom; Lemze, Doron

    2015-06-01

    We perform a three-dimensional multi-probe analysis of the rich galaxy cluster A1689, one of the most powerful known lenses on the sky, by combining improved weak-lensing data from new wide-field {{BVR}}Ci\\prime z\\prime Subaru/Suprime-Cam observations with strong-lensing, X-ray, and Sunyaev–Zel’dovich effect (SZE) data sets. We reconstruct the projected matter distribution from a joint weak-lensing analysis of two-dimensional shear and azimuthally integrated magnification constraints, the combination of which allows us to break the mass-sheet degeneracy. The resulting mass distribution reveals elongation with an axis ratio of ∼0.7 in projection, aligned well with the distributions of cluster galaxies and intracluster gas. When assuming a spherical halo, our full weak-lensing analysis yields a projected halo concentration of {c}200c2D=8.9+/- 1.1 ({c}{vir}2D∼ 11), consistent with and improved from earlier weak-lensing work. We find excellent consistency between independent weak and strong lensing in the region of overlap. In a parametric triaxial framework, we constrain the intrinsic structure and geometry of the matter and gas distributions, by combining weak/strong lensing and X-ray/SZE data with minimal geometric assumptions. We show that the data favor a triaxial geometry with minor–major axis ratio 0.39±0.15 and major axis closely aligned with the line of sight (22°±10°). We obtain a halo mass {M}200c=(1.2+/- 0.2)× {10}15 {M}ȯ {h}-1 and a halo concentration {c}200c=8.4+/- 1.3, which overlaps with the ≳ 1σ tail of the predicted distribution. The shape of the gas is rounder than the underlying matter but quite elongated with minor–major axis ratio 0.60 ± 0.14. The gas mass fraction within 0.9 Mpc is {10}-2+3%, a typical value for high-mass clusters. The thermal gas pressure contributes to ∼60% of the equilibrium pressure, indicating a significant level of non-thermal pressure support. When compared to Planck's hydrostatic mass estimate

  1. Comparative analysis of a conserved zinc finger gene cluster on human chromosome 19q and mouse chromosome 7

    SciTech Connect

    Shannon, M.; Mucenski, M.L.; Stubbs, L.

    1996-04-01

    Several lines of evidence now suggest that many of the zinc-finger-containing (ZNF) genes in the human genome are arranged in clusters. However, little is known about the structure or function of the clusters or about their conservation throughout evolution. Here, we report the analysis of a conserved ZNF gene cluster located in human chromosome 19q13.2 and mouse chromosome 7. Our results indicate that the human cluster consists of at least 10 related Kruppel-associated box (KRAB)-containing ZNF genes organized in tandem over a distance of 350-450 kb. Two cDNA clones representing genes in the murine cluster have been studied in detail. The KRAB A domains of these genes are nearly identical and are highly similar to human 19q13.2-derived KRAB sequences, but DNA-binding ZNF domains and other portions of the genes differ considerably. The two murine genes display distinct expression patterns, but are coexpressed in some adult tissues. These studies pave the way for a systematic analysis of the evolution of structure and function of genes within the numerous clustered ZNF families located on human chromosome 19 and elsewhere in the human and mouse genomes. 32 refs., 7 figs.

  2. VizieR Online Data Catalog: Slug analysis of star clusters in NGC 628 & 7793 (Krumholz+, 2015)

    NASA Astrophysics Data System (ADS)

    Krumholz, M. R.; Adamo, A.; Fumagalli, M.; Wofford, A.; Calzetti, D.; Lee, J. C.; Whitmore, B. C.; Bright, S. N.; Grasha, K.; Gouliermis, D. A.; Kim, H.; Nair, P.; Ryon, J. E.; Smith, L. J.; Thilker, D.; Ubeda, L.; Zackrisson, E.

    2016-02-01

    In this paper we use slug, the Stochastically Lighting Up Galaxies code (da Silva et al. 2012ApJ...745..145D, 2014MNRAS.444.3275D; Krumholz et al. 2015MNRAS.452.1447K), and its post-processing tool for analysis of star cluster properties, cluster_slug, to analyze an initial sample of clusters from the LEGUS (Calzetti et al. 2015AJ....149...51C). A description of the steps required to produce final cluster catalogs of the Legacy Extragalactic UV Survey (LEGUS) targets can be found in Calzetti et al. (2015AJ....149...51C), and in A. Adamo et al. (2015, in preparation). LEGUS is an HST Cycle 21 Treasury program that is imaging 50 nearby galaxies in five broadbands with the WFC3/UVIS, from the NUV to the I band. (1 data file).

  3. Data Mining of University Philanthropic Giving: Cluster-Discriminant Analysis and Pareto Effects

    ERIC Educational Resources Information Center

    Le Blanc, Louis A.; Rucks, Conway T.

    2009-01-01

    A large sample of 33,000 university alumni records were cluster-analyzed to generate six groups relatively unique in their respective attribute values. The attributes used to cluster the former students included average gift to the university's foundation and to the alumni association for the same institution. Cluster detection is useful in this…

  4. Spherical harmonic analysis of particle velocity distribution function: Comparison of moments and anisotropies using Cluster data

    NASA Astrophysics Data System (ADS)

    Viñas, Adolfo F.; Gurgiolo, Chris

    2009-01-01

    This paper presents a spherical harmonic analysis of the plasma velocity distribution function using high-angular, energy, and time resolution Cluster data obtained from the PEACE spectrometer instrument to demonstrate how this analysis models the particle distribution function and its moments and anisotropies. The results show that spherical harmonic analysis produced a robust physical representation model of the velocity distribution function, resolving the main features of the measured distributions. From the spherical harmonic analysis, a minimum set of nine spectral coefficients was obtained from which the moment (up to the heat flux), anisotropy, and asymmetry calculations of the velocity distribution function were obtained. The spherical harmonic method provides a potentially effective ``compression'' technique that can be easily carried out onboard a spacecraft to determine the moments and anisotropies of the particle velocity distribution function for any species. These calculations were implemented using three different approaches, namely, the standard traditional integration, the spherical harmonic (SPH) spectral coefficients integration, and the singular value decomposition (SVD) on the spherical harmonic methods. A comparison among the various methods shows that both SPH and SVD approaches provide remarkable agreement with the standard moment integration method.

  5. Global Multilocus Sequence Typing Analysis of Mycoplasma bovis Isolates Reveals Two Main Population Clusters

    PubMed Central

    Churchward, C. P.; Schnee, C.; Sachse, K.; Lysnyansky, I.; Catania, S.; Iob, L.; Ayling, R. D.; Nicholas, R. A. J.

    2014-01-01

    Mycoplasma bovis is a major bovine pathogen associated with bovine respiratory disease complex and is responsible for substantial economic losses worldwide. M. bovis is also associated with other clinical presentations in cattle, including mastitis, otitis, arthritis, and reproductive disorders. To gain a better understanding of the genetic diversity of this pathogen, a multilocus sequence typing (MLST) scheme was developed and applied to the characterization of 137 M. bovis isolates from diverse geographical origins, obtained from healthy or clinically infected cattle. After in silico analysis, a final set of 7 housekeeping genes was selected (dnaA, metS, recA, tufA, atpA, rpoD, and tkt). MLST analysis demonstrated the presence of 35 different sequence types (STs) distributed in two main clonal complexes (CCs), defined at the double-locus variant level, namely, CC1, which included most of the British and German isolates, and CC2, which was a more heterogeneous and geographically distant group of isolates, including European, Asian, and Australian samples. Index of association analysis confirmed the clonal nature of the investigated M. bovis population, based on MLST data. This scheme has demonstrated high discriminatory power, with the analysis showing the presence of genetically distant and divergent clusters of isolates predominantly associated with geographical origins. PMID:25540400

  6. Spherical Harmonic Analysis of Particle Velocity Distribution Function: Comparison of Moments and Anisotropies using Cluster Data

    NASA Technical Reports Server (NTRS)

    Gurgiolo, Chris; Vinas, Adolfo F.

    2009-01-01

    This paper presents a spherical harmonic analysis of the plasma velocity distribution function using high-angular, energy, and time resolution Cluster data obtained from the PEACE spectrometer instrument to demonstrate how this analysis models the particle distribution function and its moments and anisotropies. The results show that spherical harmonic analysis produced a robust physical representation model of the velocity distribution function, resolving the main features of the measured distributions. From the spherical harmonic analysis, a minimum set of nine spectral coefficients was obtained from which the moment (up to the heat flux), anisotropy, and asymmetry calculations of the velocity distribution function were obtained. The spherical harmonic method provides a potentially effective "compression" technique that can be easily carried out onboard a spacecraft to determine the moments and anisotropies of the particle velocity distribution function for any species. These calculations were implemented using three different approaches, namely, the standard traditional integration, the spherical harmonic (SPH) spectral coefficients integration, and the singular value decomposition (SVD) on the spherical harmonic methods. A comparison among the various methods shows that both SPH and SVD approaches provide remarkable agreement with the standard moment integration method.

  7. The Statistical Analysis of stars with Hα emission in IC 348 Cluster

    NASA Astrophysics Data System (ADS)

    Nikoghosyan, E. H.; Vardanyan, A. V.; Khachatryan, K. G.

    2016-09-01

    In this work the results of the statistical analysis of the ˜200 stars with Hα emission in the IC 348 cluster are presented. The sample is completed up to R < 20.0. The percentage of emission stars increases from bright to fainter objects and to the range of 13.0 ≤ R-AR ≤ 19.0 reaches 80%. The ratio between WTTau and CTTau objects is 64% and 36%. The 70% of X-ray sources are WTTau stars. The age of WTTau and CTTau objects are ˜2·10^6 years. The age of the non emission stars with a mass less solar is ˜2·10^6 years also, but non emission more massive objects are "older", the age of them is ˜7·10^6 years.

  8. Automatic clustering and population analysis of white matter tracts using maximum density paths.

    PubMed

    Prasad, Gautam; Joshi, Shantanu H; Jahanshad, Neda; Villalon-Reina, Julio; Aganj, Iman; Lenglet, Christophe; Sapiro, Guillermo; McMahon, Katie L; de Zubicaray, Greig I; Martin, Nicholas G; Wright, Margaret J; Toga, Arthur W; Thompson, Paul M

    2014-08-15

    We introduce a framework for population analysis of white matter tracts based on diffusion-weighted images of the brain. The framework enables extraction of fibers from high angular resolution diffusion images (HARDI); clustering of the fibers based partly on prior knowledge from an atlas; representation of the fiber bundles compactly using a path following points of highest density (maximum density path; MDP); and registration of these paths together using geodesic curve matching to find local correspondences across a population. We demonstrate our method on 4-Tesla HARDI scans from 565 young adults to compute localized statistics across 50 white matter tracts based on fractional anisotropy (FA). Experimental results show increased sensitivity in the determination of genetic influences on principal fiber tracts compared to the tract-based spatial statistics (TBSS) method. Our results show that the MDP representation reveals important parts of the white matter structure and considerably reduces the dimensionality over comparable fiber matching approaches. PMID:24747738

  9. Stream gradient Hotspot and Cluster Analysis (SL-HCA) for improving the longitudinal profiles metrics

    NASA Astrophysics Data System (ADS)

    Troiani, Francesco; Piacentini, Daniela; Seta Marta, Della

    2016-04-01

    Many researches successfully focused on stream longitudinal profiles analysis through Stream Length-gradient (SL) index for detecting, at different spatial scales, either tectonic structures or hillslope processes. The analysis and interpretation of spatial variability of SL values, both at a regional and local scale, is often complicated due to the concomitance of different factors generating SL anomalies, including the bedrock composition. The creation of lithologically-filtered SL maps is often problematic in areas where homogeneously surveyed geological maps, with a sufficient resolution are unavailable. Moreover, both the SL map classification and the unbiased anomaly detection are rather difficult. For instance, which is the best threshold to define the anomalous SL values? Further, is there a minimum along-channel extent of anomalous SL values for objectively defining over-steeped segments on long-profiles? This research investigates the relevance and potential of a new approach based on Hotspot and Cluster Analysis of SL values (SL-HCA) for detecting knickzones on long-profiles at a regional scale and for fine-tuning the interpretation of their geological-geomorphological meaning. We developed this procedure within a 2800 km2-wide area located in the mountainous sector of the Northern Apennines of Italy. The Getis-Ord Gi∗ statistic is applied for the SL-HCA approach. The value of SL, calculated starting from a 5x5 m Digital Elevation Model, is used as weighting factor and the Gi∗ index is calculated for each 50 m-long channel segment for the whole fluvial system. The outcomes indicate that high positive Gi∗ values imply the clustering of SL anomalies, thus the occurrence of knickzones on the stream long-profiles. Results show that high and very high Gi* values (i.e. values beyond two standard deviations from the mean) correlate well with the principal knickzones detected with existent lithologically-filtered SL maps. Field checks and remote sensing

  10. Finding Clothing That Fit through Cluster Analysis and Objective Interestingness Measures

    NASA Astrophysics Data System (ADS)

    Peña, Isis; Viktor, Herna L.; Paquet, Eric

    Clothes should fit consumers well, be aesthetically pleasing and comfortable. However, repeated studies of customers’ levels of satisfaction indicate that this is often not the case. For example, more robust males often find it difficult to find pants that are the correct length and fit their waists well. What, then, are the typical body profiles of the population? Would it be possible to identify the measurements that are of importance for different sizes and genders? Furthermore, assuming that we have access to an anthropometric database would there be a way to guide the data mining process to discover only those relevant body measurements that are of the most interest for apparel designers? This paper describes our results when addressing these questions through cluster analysis and interestingness measures-based feature selection. We explore a database containing anthropometric measurements as well as 3-D body scans, of a representative sample of the Dutch population.

  11. Temporary disaster debris management site identification using binomial cluster analysis and GIS.

    PubMed

    Grzeda, Stanislaw; Mazzuchi, Thomas A; Sarkani, Shahram

    2014-04-01

    An essential component of disaster planning and preparation is the identification and selection of temporary disaster debris management sites (DMS). However, since DMS identification is a complex process involving numerous variable constraints, many regional, county and municipal jurisdictions initiate this process during the post-disaster response and recovery phases, typically a period of severely stressed resources. Hence, a pre-disaster approach in identifying the most likely sites based on the number of locational constraints would significantly contribute to disaster debris management planning. As disasters vary in their nature, location and extent, an effective approach must facilitate scalability, flexibility and adaptability to variable local requirements, while also being generalisable to other regions and geographical extents. This study demonstrates the use of binomial cluster analysis in potential DMS identification in a case study conducted in Hamilton County, Indiana.

  12. Temporary disaster debris management site identification using binomial cluster analysis and GIS.

    PubMed

    Grzeda, Stanislaw; Mazzuchi, Thomas A; Sarkani, Shahram

    2014-04-01

    An essential component of disaster planning and preparation is the identification and selection of temporary disaster debris management sites (DMS). However, since DMS identification is a complex process involving numerous variable constraints, many regional, county and municipal jurisdictions initiate this process during the post-disaster response and recovery phases, typically a period of severely stressed resources. Hence, a pre-disaster approach in identifying the most likely sites based on the number of locational constraints would significantly contribute to disaster debris management planning. As disasters vary in their nature, location and extent, an effective approach must facilitate scalability, flexibility and adaptability to variable local requirements, while also being generalisable to other regions and geographical extents. This study demonstrates the use of binomial cluster analysis in potential DMS identification in a case study conducted in Hamilton County, Indiana. PMID:24601923

  13. Cluster analysis of the origins of the new influenza A(H1N1) virus.

    PubMed

    Solovyov, A; Palacios, G; Briese, T; Lipkin, W I; Rabadan, R

    2009-05-28

    In March and April 2009, a new strain of influenza A(H1N1) virus has been isolated in Mexico and the United States. Since the initial reports more than 10,000 cases have been reported to the World Health Organization, all around the world. Several hundred isolates have already been sequenced and deposited in public databases. We have studied the genetics of the new strain and identified its closest relatives through a cluster analysis approach. We show that the new virus combines genetic information related to different swine influenza viruses. Segments PB2, PB1, PA, HA, NP and NS are related to swine H1N2 and H3N2 influenza viruses isolated in North America. Segments NA and M are related to swine influenza viruses isolated in Eurasia. PMID:19480812

  14. Automatic Clustering and Population Analysis of White Matter Tracts using Maximum Density Paths

    PubMed Central

    Prasad, Gautam; Joshi, Shantanu H.; Jahanshad, Neda; Villalon-Reina, Julio; Aganj, Iman; Lenglet, Christophe; Sapiro, Guillermo; McMahon, Katie L.; de Zubicaray, Greig I.; Martin, Nicholas G.; Wright, Margaret J.; Toga, Arthur W.; Thompson, Paul M.

    2014-01-01

    We introduce a framework for population analysis of white matter tracts based on diffusion-weighted images of the brain. The framework enables extraction of fibers from high angular resolution diffusion images (HARDI); clustering of the fibers based partly on prior knowledge from an atlas; representation of the fiber bundles compactly using a path following points of highest density (maximum density path; MDP); and registration of these paths together using geodesic curve matching to find local correspondences across a population. We demonstrate our method on 4-Tesla HARDI scans from 565 young adults to compute localized statistics across 50 white matter tracts based on fractional anisotropy (FA). Experimental results show increased sensitivity in the determination of genetic influences on principal fiber tracts compared to the tract-based spatial statistics (TBSS) method. Our results show that the MDP representation reveals important parts of the white matter structure and considerably reduces the dimensionality over comparable fiber matching approaches. PMID:24747738

  15. Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-01-01

    Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 < z < 0.9), all with HST imaging data available. This survey has two main objectives: to constrain dark energy (DE) using weak lensing tomography on galaxy clusters and to build a database (deep multi-band imaging allowing photometric redshift estimates, spectroscopic data, X-ray data) of rich distant clusters to study their properties. Aims: We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first

  16. The governance mechanism of industrial clusters based on the analysis of the demand shock

    NASA Astrophysics Data System (ADS)

    Wang, Shu-xian

    2009-07-01

    Based on a re-study on the organizational characteristics of industry clusters, the paper analyzes the interactive relation between the industry clusters and network organizations. The types of clusters are re-classified in terms of different responses to demand shock. It suggests that the actual firms and virtual firms be regarded as the core type of industry clusters. With the new thoughts of the governance mechanism, the paper makes the classical corporate governance theory applicable to the research on industry clusters governance.erContact

  17. PCR screening and sequence analysis of iol clusters in Lactobacillus casei strains isolated from koumiss.

    PubMed

    Zhang, W; Sun, Z; Sun, T; Zhang, H

    2010-11-01

    The iol cluster (consisting of genes involved in myo-inositol utilization) was investigated in Lactobacillus casei strains isolated from koumiss. Ten strains were tested for the presence of iol cluster by PCR screening; three strains encoded this cluster. Full-sequencing procedure was conducted; the iol cluster was identical to that of L. casei BL23 (GenBank access. no. FM177140) except for an upstream transposase. The iol cluster is not a common feature for L. casei strains isolated from koumiss. PMID:21253906

  18. Application of cluster analysis and autoregressive neural networks for the noise diagnostics of the IBR-2M reactor

    NASA Astrophysics Data System (ADS)

    Pepelyshev, Yu. N.; Tsogtsaikhan, Ts.; Ososkov, G. A.

    2016-09-01

    The pattern recognition methodologies and artificial neural networks were used widely for the IBR-2M pulsed reactor noise diagnostics. The cluster analysis allows a detailed study of the structure and fast reactivity effects of IBR-2M and nonlinear autoregressive neural network (NAR) with local feedback connection allows predicting slow reactivity effects. In this work we present results of a study on pulse energy noise dynamics and prediction of liquid sodium flow rate through the core of the IBR-2M reactor using cluster analysis and an artificial neural network.

  19. Different Patterns of the Urban Heat Island Intensity from Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Silva, F. B.; Longo, K.

    2014-12-01

    This study analyzes the different variability patterns of the Urban Heat Island intensity (UHII) in the Metropolitan Area of Rio de Janeiro (MARJ), one of the largest urban agglomerations in Brazil. The UHII is defined as the difference in the surface air temperature between the urban/suburban and rural/vegetated areas. To choose one or more stations that represent those areas we used the technique of cluster analysis on the air temperature observations from 14 surface weather stations in the MARJ. The cluster analysis aims to classify objects based on their characteristics, gathering similar groups. The results show homogeneity patterns between air temperature observations, with 6 homogeneous groups being defined. Among those groups, one might be a natural choice for the representative urban area (Central station); one corresponds to suburban area (Afonsos station); and another group referred as rural area is compound of three stations (Ecologia, Santa Cruz and Xerém) that are located in vegetated regions. The arithmetic mean of temperature from the three rural stations is taken to represent the rural station temperature. The UHII is determined from these homogeneous groups. The first UHII is estimated from urban and rural temperature areas (Case 1), whilst the second UHII is obtained from suburban and rural temperature areas (Case 2). In Case 1, the maximum UHII occurs in two periods, one in the early morning and the other at night, while the minimum UHII occurs in the afternoon. In Case 2, the maximum UHII is observed during afternoon/night and the minimum during dawn/early morning. This study demonstrates that the stations choice reflects different UHII patterns, evidencing that distinct behaviors of this phenomenon can be identified.

  20. Cluster analysis application identifies muscle characteristics of importance for beef tenderness

    PubMed Central

    2012-01-01

    Background An important controversy in the relationship between beef tenderness and muscle characteristics including biochemical traits exists among meat researchers. The aim of this study is to explain variability in meat tenderness using muscle characteristics and biochemical traits available in the Integrated and Functional Biology of Beef (BIF-Beef) database. The BIF-Beef data warehouse contains characteristic measurements from animal, muscle, carcass, and meat quality derived from numerous experiments. We created three classes for tenderness (high, medium, and low) based on trained taste panel tenderness scores of all meat samples consumed (4,366 observations from 40 different experiments). For each tenderness class, the corresponding means for the mechanical characteristics, muscle fibre type, collagen content, and biochemical traits which may influence tenderness of the muscles were calculated. Results Our results indicated that lower shear force values were associated with more tender meat. In addition, muscles in the highest tenderness cluster had the lowest total and insoluble collagen contents, the highest mitochondrial enzyme activity (isocitrate dehydrogenase), the highest proportion of slow oxidative muscle fibres, the lowest proportion of fast-glycolytic muscle fibres, and the lowest average muscle fibre cross-sectional area. Results were confirmed by correlation analyses, and differences between muscle types in terms of biochemical characteristics and tenderness score were evidenced by Principal Component Analysis (PCA). When the cluster analysis was repeated using only muscle samples from m. Longissimus thoracis (LT), the results were similar; only contrasting previous results by maintaining a relatively constant fibre-type composition between all three tenderness classes. Conclusion Our results show that increased meat tenderness is related to lower shear forces, lower insoluble collagen and total collagen content, lower cross-sectional area of

  1. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    NASA Astrophysics Data System (ADS)

    Wu, Xiongwu; Chen, Yidong; Brooks, Bernard R.; Su, Yan A.

    2004-12-01

    An unsupervised data clustering method, called the local maximum clustering (LMC) method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the[InlineEquation not available: see fulltext.]-mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999).

  2. SEM-EDS investigation on PM10 data collected in Central Italy: Principal Component Analysis and Hierarchical Cluster Analysis

    PubMed Central

    2012-01-01

    Background Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA) were applied on PM10 particle data in order to: identify particle clusters that can be differentiated on the bases of their chemical composition and morphology, investigate the relationship among the chemical and morphological parameters and evaluate differences among the sampling sites. PM10 was collected in 3 different sites in central Italy characterized by different conditions: yard, urban and rural sites. The concentration of 20 chemical parameters (C, O, Na, Mg, Al, Si, P, Cd, Cl, K, Ca, Sn, Ti, Cr, Mn, Fe, Co, Ni, Cu, Zn) were determined by Scanning Electron Microscopy – Energy Dispersive X-ray Spectroscopy (SEM-EDS) and the particle images were processed by an image analysis software in order to measure: Area, Aspect Ratio, Roundness, Fractal Dimension, Box Width, Box Height and Perimeter. Result Results revealed the presence of different clusters of particles, differentiated on the bases of chemical composition and morphological parameters (aluminosilicates, calcium particles, biological particles, soot, cenosphere, sodium chloride, sulphates, metallic particles, iron spherical particles). Aluminosilicates and Calcium particles of rural and urban sites showed a similar nature due to a mainly natural origin, while those of the yard site showed a more heterogeneous composition mainly related to human activity. Biological particles and soot can be differentiated on the bases of the higher loads of Fractal Dimension, which characterizes soot, and content of Na, Mg, Ca, Cl and K which characterize the biological ones. The soot of the urban site showed higher loadings of Roundness and Fractal Dimension than the soot belonging to the yard and rural sites, this was due to the different life time of the particles. The metal particles, characterized mainly by the higher loading of iron, were present in two morphological forms: spherical and angular particles. The first were

  3. A study of area clustering using factor analysis in small area estimation (An analysis of per capita expenditures of subdistricts level in regency and municipality of Bogor)

    NASA Astrophysics Data System (ADS)

    Wahyudi, Notodiputro, Khairil Anwar; Kurnia, Anang; Anisa, Rahma

    2016-02-01

    Empirical Best Linear Unbiased Prediction (EBLUP) is one of indirect estimating methods which used to estimate parameters of small areas. EBLUP methods works in using auxiliary variables of area while adding the area random effects. In estimating non-sampled area, the standard EBLUP can no longer be used due to no information of area random effects. To obtain more proper estimation methods for non sampled area, the standard EBLUP model has to be modified by adding cluster information. The aim of this research was to study clustering methods using factor analysis by means of simulation, provide better cluster information. The criteria used to evaluate the goodness of fit of the methods in the simulation study were the mean percentage of clustering accuracy. The results of the simulation study showed the use of factor analysis in clustering has increased the average percentage of accuracy particularly when using Ward method. The method was taken into account to estimate the per capita expenditures based on Small Area Estimation (SAE) techniques. The method was eventually used to estimate the per capita expenditures from SUSENAS and the quality of the estimates was measured by RMSE. This research has shown that the standard-modified EBLUP model provided with factor analysis better estimates when compared with standard EBLUP model and the standard-modified EBLUP without the factor analysis. Moreover, it was also shown that the clustering information is important in estimating non sampled area.

  4. Analysis of perceived similarity between pairs of microcalcification clusters in mammograms

    SciTech Connect

    Wang, Juan; Jing, Hao; Wernick, Miles N.; Yang, Yongyi; Nishikawa, Robert M.

    2014-05-15

    Purpose: Content-based image retrieval aims to assist radiologists by presenting example images with known pathology that are visually similar to the case being evaluated. In this work, the authors investigate several fundamental issues underlying the similarity ratings between pairs of microcalcification (MC) lesions on mammograms as judged by radiologists: the degree of variability in the similarity ratings, the impact of this variability on agreement between readers in retrieval of similar lesions, and the factors contributing to the readers’ similarity ratings. Methods: The authors conduct a reader study on a set of 1000 image pairs of MC lesions, in which a group of experienced breast radiologists rated the degree of similarity between each image pair. The image pairs are selected, from among possible pairings of 222 cases (110 malignant, 112 benign), based on quantitative image attributes (features) and the results of a preliminary reader study. Next, the authors apply analysis of variance (ANOVA) to quantify the level of variability in the readers’ similarity ratings, and study how the variability in individual reader ratings affects consistency between readers. The authors also measure the extent to which readers agree on images which are most similar to a given query, for which the Dice coefficient is used. To investigate how the similarity ratings potentially relate to the attributes underlying the cases, the authors study the fraction of perceptually similar images that also share the same benign or malignant pathology as the query image; moreover, the authors apply multidimensional scaling (MDS) to embed the cases according to their mutual perceptual similarity in a two-dimensional plot, which allows the authors to examine the manner in which similar lesions relate to one another in terms of benign or malignant pathology and clustered MCs. Results: The ANOVA results show that the coefficient of determination in the reader similarity ratings is 0

  5. Clustering analysis of high-redshift luminous red galaxies in Stripe 82

    NASA Astrophysics Data System (ADS)

    Nikoloudakis, N.; Shanks, T.; Sawangwit, U.

    2013-03-01

    We present a clustering analysis of luminous red galaxies (LRGs) in Stripe 82 from the Sloan Digital Sky Survey (SDSS). We study the angular two-point autocorrelation function, w(θ), of a selected sample of over 130 000 LRG candidates via colour-cut selections in izK with the K-band coverage coming from UKIRT (United Kingdom Infrared Telescope) Infrared Deep Sky Survey (UKIDSS) Large Area Survey (LAS). We have used the cross-correlation technique of Newman to establish the redshift distribution of the LRGs. Cross-correlating them with SDSS quasi-stellar objects (QSOs), MegaZ-LRGs and DEEP Extragalactic Evolutionary Probe 2 (DEEP2) galaxies, implies an average redshift of the LRGs to be z ≈ 1 with space density, ng ≈ 3.20 ± 0.16 × 10-4 h3 Mpc-3. For θ ≤ 10 arcmin (corresponding to ≈10 h-1 Mpc), the LRG w(θ) significantly deviates from a conventional single power law as noted by previous clustering studies of highly biased and luminous galaxies. A double power law with a break at rb ≈ 2.4 h-1 Mpc fits the data better, with best-fitting scale length, r0, 1 = 7.63 ± 0.27 h-1 Mpc and slope γ1 = 2.01 ± 0.02 at small scales and r0, 2 = 9.92 ± 0.40 h-1 Mpc and γ2 = 1.64 ± 0.04 at large scales. Due to the flat slope at large scales, we find that a standard Λ cold dark matter (Λ CDM) linear model is accepted only at 2-3σ, with the best-fitting bias factor, b = 2.74 ± 0.07. We also fitted the halo occupation distribution (HOD) models to compare our measurements with the predictions of the dark matter clustering. The effective halo mass of Stripe 82 LRGs is estimated as Meff = 3.3 ± 0.6 × 1013 h-1 M⊙. But at large scales, the current HOD models did not help explain the power excess in the clustering signal. We then compare the w(θ) results to the results of Sawangwit et al. from three samples of photometrically selected LRGs at lower redshifts to measure clustering evolution. We find that a long-lived model may be a poorer fit than at lower

  6. Image Segmentation By Cluster Analysis Of High Resolution Textured SPOT Images

    NASA Astrophysics Data System (ADS)

    Slimani, M.; Roux, C.; Hillion, A.

    1986-04-01

    Textural analysis is now a commonly used technique in digital image processing. In this paper, we present an application of textural analysis to high resolution SPOT satellite images. The purpose of the methodology is to improve classification results, i.e. image segmentation in remote sensing. Remote sensing techniques, based on high resolution satellite data offer good perspectives for the cartography of littoral environment. Textural information contained in the pan-chromatic channel of ten meters resolution is introduced in order to separate different types of structures. The technique we used is based on statistical pattern recognition models and operates in two steps. A first step, features extraction, is derived by using a stepwise algorithm. Segmentation is then performed by cluster analysis using these extracted. features. The texture features are computed over the immediate neighborhood of the pixel using two methods : the cooccurence matrices method and the grey level difference statistics method. Image segmentation based only on texture features is then performed by pixel classification and finally discussed. In a future paper, we intend to compare the results with aerial data in view of the management of the littoral resources.

  7. Analysis and clustering of natural gas consumption data for thermal energy use forecasting

    NASA Astrophysics Data System (ADS)

    Franco, Alessandro; Fantozzi, Fabio

    2015-11-01

    In this paper, after a brief analysis of the connections between the uses of natural gas and thermal energy use, the natural gas consumption data related to Italian market are analyzed and opportunely clustered in order to compute the typical consumption profile in different days of the week in different seasons and for the different class of users: residential, tertiary and industrial. The analysis of the data shows that natural gas consumption profile is mainly related to seasonality pattern and to the weather conditions (outside temperature, humidity and wind chiller). There is also an important daily pattern related to industrial and civil sector that, at a lower degree than the previous one, does affect the consumption profile and have to be taken into account for defining an effective short and mid term thermal energy forecasting method. A possible mathematical structure of the natural gas consumption profile is provided. Due to the strong link between thermal energy use and natural gas consumption, this analysis could be considered the first step for the development of a model for thermal energy forecasting.

  8. How do autoimmune diseases cluster in families? A systematic review and meta-analysis

    PubMed Central

    2013-01-01

    Background A primary characteristic of complex genetic diseases is that affected individuals tend to cluster in families (that is, familial aggregation). Aggregation of the same autoimmune condition, also referred to as familial autoimmune disease, has been extensively evaluated. However, aggregation of diverse autoimmune diseases, also known as familial autoimmunity, has been overlooked. Therefore, a systematic review and meta-analysis were performed aimed at gathering evidence about this topic. Methods Familial autoimmunity was investigated in five major autoimmune diseases, namely, rheumatoid arthritis, systemic lupus erythematosus, autoimmune thyroid disease, multiple sclerosis and type 1 diabetes mellitus. Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines were followed. Articles were searched in Pubmed and Embase databases. Results Out of a total of 61 articles, 44 were selected for final analysis. Familial autoimmunity was found in all the autoimmune diseases investigated. Aggregation of autoimmune thyroid disease, followed by systemic lupus erythematosus and rheumatoid arthritis, was the most encountered. Conclusions Familial autoimmunity is a frequently seen condition. Further study of familial autoimmunity will help to decipher the common mechanisms of autoimmunity. PMID:23497011

  9. Cluster Analysis of Tumor Suppressor Genes in Canine Leukocytes Identifies Activation State

    PubMed Central

    Daly, Julie-Anne; Mortlock, Sally-Anne; Taylor, Rosanne M.; Williamson, Peter

    2015-01-01

    Cells of the immune system undergo activation and subsequent proliferation in the normal course of an immune response. Infrequently, the molecular and cellular events that underlie the mechanisms of proliferation are dysregulated and may lead to oncogenesis, leading to tumor formation. The most common forms of immunological cancers are lymphomas, which in dogs account for 8%–20% of all cancers, affecting up to 1.2% of the dog population. Key genes involved in negatively regulating proliferation of lymphocytes include a group classified as tumor suppressor genes (TSGs). These genes are also known to be associated with progression of lymphoma in humans, mice, and dogs and are potential candidates for pathological grading and diagnosis. The aim of the present study was to analyze TSG profiles in stimulated leukocytes from dogs to identify genes that discriminate an activated phenotype. A total of 554 TSGs and three gene set collections were analyzed from microarray data. Cluster analysis of three subsets of genes discriminated between stimulated and unstimulated cells. These included 20 most upregulated and downregulated TSGs, TSG in hallmark gene sets significantly enriched in active cells, and a selection of candidate TSGs, p15 (CDKN2B), p18 (CDKN2C), p19 (CDKN1A), p21 (CDKN2A), p27 (CDKN1B), and p53 (TP53) in the third set. Analysis of two subsets suggested that these genes or a subset of these genes may be used as a specialized PCR set for additional analysis. PMID:27478369

  10. Cluster structure of EU-15 countries derived from the correlation matrix analysis of macroeconomic index fluctuations

    NASA Astrophysics Data System (ADS)

    Gligor, M.; Ausloos, M.

    2007-05-01

    The statistical distances between countries, calculated for various moving average time windows, are mapped into the ultrametric subdominant space as in classical Minimal Spanning Tree methods. The Moving Average Minimal Length Path (MAMLP) algorithm allows a decoupling of fluctuations with respect to the mass center of the system from the movement of the mass center itself. A Hamiltonian representation given by a factor graph is used and plays the role of cost function. The present analysis pertains to 11 macroeconomic (ME) indicators, namely the GDP (x1), Final Consumption Expenditure (x2), Gross Capital Formation (x3), Net Exports (x4), Consumer Price Index (y1), Rates of Interest of the Central Banks (y2), Labour Force (z1), Unemployment (z2), GDP/hour worked (z3), GDP/capita (w1) and Gini coefficient (w2). The target group of countries is composed of 15 EU countries, data taken between 1995 and 2004. By two different methods (the Bipartite Factor Graph Analysis and the Correlation Matrix Eigensystem Analysis) it is found that the strongly correlated countries with respect to the macroeconomic indicators fluctuations can be partitioned into stable clusters.

  11. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  12. Cluster analysis of polyphenol intake in a French middle-aged population (aged 35-64 years).

    PubMed

    Julia, Chantal; Touvier, Mathilde; Lassale, Camille; Fezeu, Léopold; Galan, Pilar; Hercberg, Serge; Kesse-Guyot, Emmanuelle

    2016-01-01

    Polyphenols have been suggested as protective factors for a range of chronic diseases. However, studying the impact of individual polyphenols on health is hindered by the intrinsic inter-correlations among polyphenols. Alternatively, studying foods rich in specific polyphenols fails to grasp the ubiquity of these components. Studying overall dietary patterns would allow for a more comprehensive description of polyphenol intakes in the population. Our objective was to identify clusters of dietary polyphenol intakes in a French middle-aged population (35-64 years old). Participants from the primary prevention trial SUpplementation en VItamines et Minéraux AntioXydants (SU.VI.MAX) study were included in the present cross-sectional study (n 6092; 57·8 % females; mean age 48·7 (sd 6·4) years). The fifty most consumed individual dietary polyphenols were divided into energy-adjusted tertiles and introduced in a multiple correspondence analysis (MCA), leading to comprehensive factors of dietary polyphenol intakes. The identified factors discriminating polyphenol intakes were used in a hierarchical clustering procedure. Four clusters were identified, corresponding broadly to clustered preferences for their respective food sources. Cluster 1 was characterised by high intakes of tea polyphenols. Cluster 2 was characterised by high intakes of wine polyphenols. Cluster 3 was characterised by high intakes of flavanones and flavones, corresponding to high consumption of fruit and vegetables, and more broadly to a healthier diet. Cluster 4 was characterised by high intakes of hydroxycinnamic acids, but was also associated with alcohol consumption and smoking. Profiles of polyphenol intakes allowed for the identification of meaningful combinations of polyphenol intakes in the diet. PMID:27547391

  13. Cluster analysis of polyphenol intake in a French middle-aged population (aged 35-64 years).

    PubMed

    Julia, Chantal; Touvier, Mathilde; Lassale, Camille; Fezeu, Léopold; Galan, Pilar; Hercberg, Serge; Kesse-Guyot, Emmanuelle

    2016-01-01

    Polyphenols have been suggested as protective factors for a range of chronic diseases. However, studying the impact of individual polyphenols on health is hindered by the intrinsic inter-correlations among polyphenols. Alternatively, studying foods rich in specific polyphenols fails to grasp the ubiquity of these components. Studying overall dietary patterns would allow for a more comprehensive description of polyphenol intakes in the population. Our objective was to identify clusters of dietary polyphenol intakes in a French middle-aged population (35-64 years old). Participants from the primary prevention trial SUpplementation en VItamines et Minéraux AntioXydants (SU.VI.MAX) study were included in the present cross-sectional study (n 6092; 57·8 % females; mean age 48·7 (sd 6·4) years). The fifty most consumed individual dietary polyphenols were divided into energy-adjusted tertiles and introduced in a multiple correspondence analysis (MCA), leading to comprehensive factors of dietary polyphenol intakes. The identified factors discriminating polyphenol intakes were used in a hierarchical clustering procedure. Four clusters were identified, corresponding broadly to clustered preferences for their respective food sources. Cluster 1 was characterised by high intakes of tea polyphenols. Cluster 2 was characterised by high intakes of wine polyphenols. Cluster 3 was characterised by high intakes of flavanones and flavones, corresponding to high consumption of fruit and vegetables, and more broadly to a healthier diet. Cluster 4 was characterised by high intakes of hydroxycinnamic acids, but was also associated with alcohol consumption and smoking. Profiles of polyphenol intakes allowed for the identification of meaningful combinations of polyphenol intakes in the diet.

  14. Mutational analysis of the nor gene cluster which encodes nitric-oxide reductase from Paracoccus denitrificans.

    PubMed

    de Boer, A P; van der Oost, J; Reijnders, W N; Westerhoff, H V; Stouthamer, A H; van Spanning, R J

    1996-12-15

    The genes that encode the hc-type nitric-oxide reductase from Paracoccus denitrificans have been identified. They are part of a cluster of six genes (norCBQDEF) and are found near the gene cluster that encodes the cd1-type nitrite reductase, which was identified earlier [de Boer, A. P. N., Reijnders, W. N. M., Kuenen, J. G., Stouthamer, A. H. & van Spanning, R. J. M. (1994) Isolation, sequencing and mutational analysis of a gene cluster involved in nitrite reduction in Paracoccus denitrificans, Antonie Leeu wenhoek 66, 111-127]. norC and norB encode the cytochrome-c-containing subunit II and cytochrome b-containing subunit I of nitric-oxide reductase (NO reductase), respectively. norQ encodes a protein with an ATP-binding motif and has high similarity to NirQ from Pseudomonas stutzeri and Pseudomonas aeruginosa and CbbQ from Pseudomonas hydrogenothermophila. norE encodes a protein with five putative transmembrane alpha-helices and has similarity to CoxIII, the third subunit of the aa3-type cytochrome-c oxidases. norF encodes a small protein with two putative transmembrane alpha-helices. Mutagenesis of norC, norB, norQ and norD resulted in cells unable to grow anaerobically. Nitrite reductase and NO reductase (with succinate or ascorbate as substrates) and nitrous oxide reductase (with succinate as substrate) activities were not detected in these mutant strains. Nitrite extrusion was detected in the medium, indicating that nitrate reductase was active. The norQ and norD mutant strains retained about 16% and 23% of the wild-type level of NorC, respectively. The norE and norF mutant strains had specific growth rates and NorC contents similar to those of the wild-type strain, but had reduced NOR and NIR activities, indicating that their gene products are involved in regulation of enzyme activity. Mutant strains containing the norCBQDEF region on the broad-host-range vector pEG400 were able to grow anaerobically, although at a lower specific growth rate and with lower

  15. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.

    PubMed

    Teodoro, George; Pan, Tony; Kurc, Tahsin M; Kong, Jun; Cooper, Lee A D; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H

    2013-05-01

    Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.

  16. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.

    PubMed

    Teodoro, George; Pan, Tony; Kurc, Tahsin M; Kong, Jun; Cooper, Lee A D; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H

    2013-05-01

    Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system. PMID:25419546

  17. Definition of a Family of Tissue-Protective Cytokines Using Functional Cluster Analysis: A Proof-of-Concept Study.

    PubMed

    Mengozzi, Manuela; Ermilov, Peter; Annenkov, Alexander; Ghezzi, Pietro; Pearl, Frances

    2014-01-01

    The discovery of the tissue-protective activities of erythropoietin (EPO) has underlined the importance of some cytokines in tissue-protection, repair, and remodeling. As such activities have been reported for other cytokines, we asked whether we could define a class of tissue-protective cytokines. We therefore explored a novel approach based on functional clustering. In this pilot study, we started by analyzing a small number of cytokines (30). We functionally classified the 30 cytokines according to their interactions by using the bioinformatics tool STRING (Search Tool for the Retrieval of Interacting Genes), followed by hierarchical cluster analysis. The results of this functional clustering were different from those obtained by clustering cytokines simply according to their sequence. We previously reported that the protective activity of EPO in a model of cerebral ischemia was paralleled by an upregulation of synaptic plasticity genes, particularly early growth response 2 (EGR2). To assess the predictivity of functional clustering, we tested some of the cytokines clustering close to EPO (interleukin-11, IL-11; kit ligand, KITLG; leukemia inhibitory factor, LIF; thrombopoietin, THPO) in an in vitro model of human neuronal cells for their ability to induce EGR2. Two of these, LIF and IL-11, induced EGR2 expression. Although these data would need to be extended to a larger number of cytokines and the biological validation should be done using more robust in vivo models, rather then just one cell line, this study shows the feasibility of this approach. This type of functional cluster analysis could be extended to other fields of cytokine research and help design biological experiments. PMID:24672526

  18. Cluster Analysis of the Klein Sexual Orientation Grid in Clinical and Nonclinical Samples: When Bisexuality Is Not Bisexuality

    PubMed Central

    Klein, Fritz; McCutchan, J. Allen; Grant, Igor

    2014-01-01

    We used a cluster analysis to empirically address whether sexual orientation is a continuum or can usefully be divided into categories such as heterosexual, homosexual, and bisexual using scores on the Klein Sexual Orientation Grid (KSOG) in three samples: groups of men and women recruited through bisexual groups and the Internet (Main Study men; Main Study women), and men recruited for a clinical study of HIV and the nervous system (HIV Study men). A five-cluster classification was chosen for the Main Study men (n = 212), a four-cluster classification for the Main Study women (n = 120), and a five-cluster classification for the HIV Study men (n = 620). We calculated means and standard deviations of these 14 clusters on the 21 variables composing the KSOG. Generally, the KSOG’s overtly erotic items (Sexual Fantasies, Sexual Behavior, and Sexual Attraction), as well as the Self Identification items, tended to be more uniform within groups than the more social items were (Emotional Preference, Socialize with, and Lifestyle). The result is a set of objectively identified subgroups of bisexual men and women along with characterizations of the extent to which their KSOG scores describe and differentiate them. The Bisexual group identified by the cluster analysis of the HIV sample was distinctly different from any of the bisexual groups identified by the clustering process in the Main Sample. Simply put, the HIV sample’s bisexuality is not like bisexuality in general, and attempts to generalize (even cautiously) from this clinical Bisexual group to a larger population would be doomed to failure. This underscores the importance of recruiting non-clinical samples if one wants insight into the nature of bisexuality in the population at large. Although the importance of non-clinical sampling in studies of sexual orientation has been widely and justly asserted, it has rarely been demonstrated by direct comparisons of the type conducted in the present study. PMID:25530727

  19. Major Strands in Scientific Inquiry through Cluster Analysis of Research Abstracts

    NASA Astrophysics Data System (ADS)

    Yeh, Yi-Fen; Jen, Tsung-Hau; Hsu, Ying-Shao

    2012-12-01

    Scientific inquiry involves a variety of abilities scientists use to investigate the natural world. In order to develop students' scientific inquiry, researchers and educators have developed different curricula and a variety of instructional resources, which make features and descriptors of scientific inquiry in teaching and learning even more diverse and complex. For revealing how the multi-facets of scientific inquiry are inherently correlated, this study identified descriptors representing features of scientific inquiry and automatically reviewed the research abstracts where these descriptors were used. A cluster analysis was used to analyze 171 relevant article abstracts published in Web of Science from 1986 to 2010, by using the data mining software WordStat v6.1. Networks of descriptors and of research strands showed the inter-relationships among descriptors and the research strands. Through triangulating the categorization results from automatic data-mining and expert researchers' qualitative reviewing, this study identified seven clusters of high-frequency descriptors and nine major strands of current research studies. The nine strands can further be grouped into five research themes: NOS, Knowledge Construction, Inquiry Ability, Explanatory-driven Inquiry, and Professional Development. With different levels of cohesiveness in network, these themes demonstrated that scientific inquiry was composed of different levels of abilities students need to achieve as well as the endeavors of teachers. Through exploring the network shared among most researchers, this study is expected to provide novice researchers information about elements that expert researchers usually consider and further, it is expected to give expert researchers some new directions to explore in research designs.

  20. Analysis of the inner structure of equatorial noise emissions using high-resolution data from Cluster

    NASA Astrophysics Data System (ADS)

    Hrbackova, Zuzana; Gurnett, Donald; Santolik, Ondrej; Pickett, Jolene; Cornilleau-Wehrlin, Nicole

    Equatorial noise (EN) emissions, sometimes called fast magnetosonic waves, belong to the strongest natural emissions in the frequency range between the ion cyclotron frequency and the lower hybrid frequency. This interval usually falls between a few hertz to hundreds of hertz at radial distances from about 2 R _{E} to 10 R _{E} where the EN emissions occur. They propagate in the plane of the geomagnetic equator, at wave vector angles close to perpendicular to the local magnetic field line. Recent studies have shown that all azimuthal angles of the wave propagation have been observed in the high density region of the Earth’s plasmasphere while they are close to perpendicular to the radial direction in the lower density region outside of the plasmasphere. We use data obtained by the onboard analyzed STAFF-SA and high-resolution WBD instruments recorded by the four Cluster spacecraft during 2001-2010. We have analyzed more than 5500 equatorial passages of all four Cluster spacecraft during this period and altogether, 2166 EN events have been identified by the STAFF-SA instruments. From this database, we have identified 342 events measured by WBD in a specific mode giving us the frequency range from 70 Hz to 9.5 kHz. We show results of a systematic analysis of the inner structure of spectral lines embedded in the EN emissions. We assume that these lines correspond to imprints of local ion Bernstein modes in the generation region. We then use frequency intervals between the spectral lines to determine radial distances of the source region of these emissions with respect to the plasmapause location. This work receives EU support through the FP7-Space grant agreement no 284520 for the MAARBLE collaborative research project.

  1. Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

    PubMed

    Ruzik, L; Obarski, N; Papierz, A; Mojski, M

    2015-06-01

    High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability.

  2. A tripartite clustering analysis on microRNA, gene and disease model.

    PubMed

    Shen, Chengcheng; Liu, Ying

    2012-02-01

    Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings. PMID:22809308

  3. A tripartite clustering analysis on microRNA, gene and disease model.

    PubMed

    Shen, Chengcheng; Liu, Ying

    2012-02-01

    Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.

  4. A Typology of Child School Behavior: Investigation Using Latent Profile Analysis and Cluster Analysis

    ERIC Educational Resources Information Center

    Mindrila, Diana L.

    2016-01-01

    To describe and facilitate the identification of child school behavior patterns, we developed a typology of child school behavior (ages 6-11 years) using the norming data (N = 2,338) for the second edition of the Behavior Assessment System for Children Teacher Rating-Child form). Latent profile analysis was conducted with the entire data set,…

  5. Input Frequency and Lexical Variability in Phonological Development: A Survival Analysis of Word-Initial Cluster Production

    ERIC Educational Resources Information Center

    Ota, Mitsuhiko; Green, Sam J.

    2013-01-01

    Although it has been often hypothesized that children learn to produce new sound patterns first in frequently heard words, the available evidence in support of this claim is inconclusive. To re-examine this question, we conducted a survival analysis of word-initial consonant clusters produced by three children in the Providence Corpus (0 ; 11-4 ;…

  6. Ionic polymer cluster energetics: Computational analysis of pendant chain stiffness and charge imbalance

    NASA Astrophysics Data System (ADS)

    Weiland, Lisa Mauck; Leo, Donald J.

    2005-06-01

    In recent years there has been considerable study of the potential mechanisms underlying the electromechanical response of ionic-polymer-metal composites. The most recent models have been based on the response of the ion-containing clusters that are formed when the material is synthesized. Most of these efforts have employed assumptions of uniform ion distribution within spherical cluster shapes. This work investigates the impact of dispensing with these assumptions in order to better understand the parameters that impact cluster shape, size, and ion transport potential. A computational micromechanics model applying Monte Carlo methodology is employed to predict the equilibrium state of a single cluster of a solvated ionomeric polymer. For a constant solvated state, the model tracks the position of individual ions within a given cluster in response to ion-ion interaction, mechanical stiffness of the pendant chain, cluster surface energy, and external electric-field loading. Results suggest that cluster surface effects play a significant role in the equilibrium cluster state, including ion distribution; pendant chain stiffness also plays a role in ion distribution but to a lesser extent. Moreover, ion pairing is rarely complete even in cation-rich clusters; this in turn supports the supposition of the formation of anode and cathode boundary layers.

  7. Analysis of LAC Observations of Clusters of Galaxies and Supernova Remnants

    NASA Technical Reports Server (NTRS)

    Hughes, J.

    1996-01-01

    The following publications are included and serve as the final report: The X-ray Spectrum of Abell 665; Clusters of Galaxies; Ginga Observation of an Oxygen-rich Supernova Remnant; Ginga Observations of the Coma Cluster and Studies of the Spatial Distribution of Iron; A Measurement of the Hubble Constant from the X-ray Properties and the Sunyaev-Zel'dovich Effect of Abell 2218; Non-polytropic Model for the Coma Cluster; and Abundance Gradients in Cooling Flow Clusters: Ginga LAC (Large Area Counter) and Einstein SSS (Solid State Spectrometer) Spectra of A496, A1795, A2142, and A2199.

  8. An analysis of the first three catalogues of southern star clusters and nebulae

    NASA Astrophysics Data System (ADS)

    Cozens, Glendyn John

    2008-06-01

    of the Lacaille and Herschel catalogues. In order to identify and compare the catalogues, positions given for an object by each astronomer were precessed to J2000.0 coordinates. These modern positions for an object could then be plotted onto modern photographic star atlases and digital images of the sky, to determine the accuracy of the original positions. Analysis of the three non-stellar catalogues included the determination of the radial distance of each object from its "correct" position and diagrams of both difference in Right Ascension and difference in Declination against Right Ascension and Declination, in order to identify any trends. Each catalogue contained some copy or printing errors, but these were omitted from the statistical calculations performed. The results for the three catalogues, from the astrometric perspective, showed that the Herschel catalogue contained the most accurate positions, followed closely by the Lacaille catalogue with no obvious or systematic trends in their inaccuracies. In contrast, the Dunlop catalogue showed some clear trends in the positional inaccuracies which, regardless of mitigating circumstances, to some extent warranted John Herschel's criticism. Finally an examination of the completeness of each catalogue was undertaken to determine the thoroughness of each astronomer. Firstly the effective aperture and theoretical magnitude limit for each telescope was calculated. Next the non-stellar objects were grouped into five types, open clusters, globular clusters, diffuse nebulae, planetary nebulae and galaxies, and a single working magnitude limit was found for each catalogue. A number of indicators were used to determine the working magnitude limit. The number of faint objects of each type which were seen, and the number of bright objects which were missed by the three astronomers, was assessed. In both the Dunlop and Herschel catalogues galaxies gave the best indicator of the working magnitude limit. Globular clusters

  9. Clustered regularly interspaced short palindromic repeats (CRISPRs) analysis of members of the Mycobacterium tuberculosis complex.

    PubMed

    Botelho, Ana; Canto, Ana; Leão, Célia; Cunha, Mónica V

    2015-01-01

    Typical CRISPR (clustered, regularly interspaced, short palindromic repeat) regions are constituted by short direct repeats (DRs), interspersed with similarly sized non-repetitive spacers, derived from transmissible genetic elements, acquired when the cell is challenged with foreign DNA. The analysis of the structure, in number and nature, of CRISPR spacers is a valuable tool for molecular typing since these loci are polymorphic among strains, originating characteristic signatures. The existence of CRISPR structures in the genome of the members of Mycobacterium tuberculosis complex (MTBC) enabled the development of a genotyping method, based on the analysis of the presence or absence of 43 oligonucleotide spacers separated by conserved DRs. This method, called spoligotyping, consists on PCR amplification of the DR chromosomal region and recognition after hybridization of the spacers that are present. The workflow beneath this methodology implies that the PCR products are brought onto a membrane containing synthetic oligonucleotides that have complementary sequences to the spacer sequences. Lack of hybridization of the PCR products to a specific oligonucleotide sequence indicates absence of the correspondent spacer sequence in the examined strain. Spoligotyping gained great notoriety as a robust identification and typing tool for members of MTBC, enabling multiple epidemiological studies on human and animal tuberculosis.

  10. Exploratory Analysis of Biological Networks through Visualization, Clustering, and Functional Annotation in Cytoscape.

    PubMed

    Baryshnikova, Anastasia

    2016-01-01

    Biological networks define how genes, proteins, and other cellular components interact with one another to carry out specific functions, providing a scaffold for understanding cellular organization. Although in-depth network analysis requires advanced mathematical and computational knowledge, a preliminary visual exploration of biological networks is accessible to anyone with basic computer skills. Visualization of biological networks is used primarily to examine network topology, identify functional modules, and predict gene functions based on gene connectivity within the network. Networks are excellent at providing a bird's-eye view of data sets and have the power of illustrating complex ideas in simple and intuitive terms. In addition, they enable exploratory analysis and generation of new hypotheses, which can then be tested using rigorous statistical and experimental tools. This protocol describes a simple procedure for visualizing a biological network using the genetic interaction similarity network for Saccharomyces cerevisiae as an example. The visualization procedure described here relies on the open-source network visualization software Cytoscape and includes detailed instructions on formatting and loading the data, clustering networks, and overlaying functional annotations. PMID:26988373

  11. Classification of Chinese herbs based on the cluster analysis of delayed luminescence.

    PubMed

    Pang, Jingxiang; Yang, Meina; Fu, Jialei; Zhao, Xiaolei; van Wijk, Eduard; Wang, Mei; Liu, Yanli; Zhou, Xiaoyan; Fan, Hua; Han, Jinxiang

    2016-03-01

    Traditional Chinese material medica are an important component of the Chinese pharmacopeia. According to the traditional Chinese medicinal concept, Chinese herbal medicines are classified into different categories based on their therapeutic effects, however, the bioactive principles cannot be solely explained by chemical analysis. The aim of this study is to classify different Chinese herbs based on their therapeutic effects by using delayed luminescence (DL). The DL of 56 Chinese herbs was measured using an ultra-sensitive luminescence detection system. The different DL parameters were used to classify Chinese herbs according to a hierarchical cluster analysis. The samples were divided into two groups based on their DL kinetic parameters. Interestingly, the DL classification results were quite consistent with classification according to the Chinese medicinal concepts of 'cold' and 'heat' properties. In this paper, we show for the first time that by using DL technology, it is possible to classify Chinese herbs according to the Chinese medicinal concept and it may even be possible to predict their therapeutic properties.

  12. Clinical evaluation of nonsyndromic dental anomalies in Dravidian population: A cluster sample analysis

    PubMed Central

    Yamunadevi, Andamuthu; Selvamani, M.; Vinitha, V.; Srivandhana, R.; Balakrithiga, M.; Prabhu, S.; Ganapathy, N.

    2015-01-01

    Aim: To record the prevalence rate of dental anomalies in Dravidian population and analyze the percentage of individual anomalies in the population. Methodology: A cluster sample analysis was done, where 244 subjects studying in a dental institution were all included and analyzed for occurrence of dental anomalies by clinical examination, excluding third molars from analysis. Results: 31.55% of the study subjects had dental anomalies and shape anomalies were more prevalent (22.1%), followed by size (8.6%), number (3.2%) and position anomalies (0.4%). Retained deciduous was seen in 1.63%. Among the individual anomalies, Talon's cusp (TC) was seen predominantly (14.34%), followed by microdontia (6.6%) and supernumerary cusps (5.73%). Conclusion: Prevalence rate of dental anomalies in the Dravidian population is 31.55% in the present study, exclusive of third molars. Shape anomalies are more common, and TC is the most commonly noted anomaly. Varying prevalence rate is reported in different geographical regions of the world. PMID:26538906

  13. Seismotectonic Implications Of Clustered Regional GPS Velocities In The San Francisco Bay Region, California

    NASA Astrophysics Data System (ADS)

    Graymer, R. W.; Simpson, R.

    2012-12-01

    We have used a hierarchical agglomerative clustering algorithm with Euclidean distance and centroid linkage, applied to continuous GPS observations for the Bay region available from the U.S. Geological Survey website. This analysis reveals 4 robust, spatially coherent clusters that coincide with 4 first-order structural blocks separated by 3 major fault systems: San Andreas (SA), Southern/Central Calaveras-Hayward-Rodgers Creek-Maacama (HAY), and Northern Calaveras-Concord-Green Valley-Berryessa-Bartlett Springs (NCAL). Because observations seaward of the San Gregorio (SG) fault are few in number, the cluster to the west of SA may actually contain 2 major structural blocks not adequately resolved: the Pacific plate to the west of the northern SA and a Peninsula block between the Peninsula SA and the SG fault. The average inter-block velocities are 11, 10, and 9 mm/yr across SA, HAY, and NCAL respectively. There appears to be a significant component of fault-normal compression across NCAL, whereas SA and HAY faults appear to be, on regional average, purely strike-slip. The velocities for the Sierra Nevada - Great Valley (SNGV) block to the west of NCAL are impressive in their similarity. The cluster of these velocities in a velocity plot forms a tighter grouping compared with the groupings for the other cluster blocks, suggesting a more rigid behavior for this block than the others. We note that for 4 clusters, none of the 3 cluster boundaries illuminate geologic structures other than north-northwest trending dominantly strike-slip faults, so plate motion is not accommodated by large-scale fault-parallel compression or extension in the region or by significant plastic deformation , at least over the time span of the GPS observations. Complexities of interseismic deformation of the upper crust do not allow simple application of inter-block velocities as long-term slip rates on bounding faults. However, 2D dislocation models using inter-block velocities and typical

  14. The use of clustering software for the classification of comparative genomic hybridization data. an analysis of 109 malignant fibrous histiocytomas.

    PubMed

    Chibon, Frédéric; Mariani, Odette; Mairal, Aline; Derré, Josette; Coindre, Jean-Michel; Terrier, Philippe; Lagacé, Réal; Sastre, Xavier; Aurias, Alain

    2003-02-01

    Malignant fibrous histiocytoma (MFH) is considered the most frequent soft-tissue sarcoma of late adult life. Nevertheless, the validity of this entity has been recurrently questioned by pathologists. Preliminary analyses by comparative genomic hybridization (CGH) of series of MFH have suggested that this tumor group is heterogeneous at the genomic level, and that at least two main genetic subgroups exist. We report an analysis by CGH of a large series of 109 MFH and on the use of clustering software for an objective classification of these tumors. We confirm our preliminary CGH results and demonstrate that two main clusters of tumors are present in the series analyzed. PMID:12581902

  15. An Analysis of Rich Cluster Redshift Survey Data for Large Scale Structure Studies

    NASA Astrophysics Data System (ADS)

    Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.

    1994-12-01

    The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and may hold the promise of confirming structure on the scale of the COBE result. However, many Abell clusters have zero or only one measured redshift, so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work will result in a deeper, more complete (and reliable) sample of positions of rich clusters. Our primary intent for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters in an effort to constrain theoretical models for structure formation. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 64 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms and stripe density plots for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect

  16. Space-time analysis of testicular cancer clusters using residential histories: a case-control study in Denmark.

    PubMed

    Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID

  17. Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.

    2008-01-01

    Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…

  18. Thermodynamic and morphological analysis of large silicon self-interstitial clusters using atomistic simulations

    SciTech Connect

    Chuang, Claire Y.; Sinno, Talid; Sattler, Andreas

    2015-04-07

    We study computationally the formation of thermodynamics and morphology of silicon self-interstitial clusters using a suite of methods driven by a recent parameterization of the Tersoff empirical potential. Formation free energies and cluster capture zones are computed across a wide range of cluster sizes (2 < N{sub i} < 150) and temperatures (0.65 < T/T{sub m} < 1). Self-interstitial clusters above a critical size (N{sub i} ∼ 25) are found to exhibit complex morphological behavior in which clusters can assume either a variety of disordered, three-dimensional configurations, or one of two macroscopically distinct planar configurations. The latter correspond to the well-known Frank and perfect dislocation loops observed experimentally in ion-implanted silicon. The relative importance of the different cluster morphologies is a function of cluster size and temperature and is dictated by a balance between energetic and entropic forces. The competition between these thermodynamic forces produces a sharp transition between the three-dimensional and planar configurations, and represents a type of order-disorder transition. By contrast, the smaller state space available to smaller clusters restricts the diversity of possible structures and inhibits this morphological transition.

  19. Using Multilevel Factor Analysis with Clustered Data: Investigating the Factor Structure of the Positive Values Scale

    ERIC Educational Resources Information Center

    Huang, Francis L.; Cornell, Dewey G.

    2016-01-01

    Advances in multilevel modeling techniques now make it possible to investigate the psychometric properties of instruments using clustered data. Factor models that overlook the clustering effect can lead to underestimated standard errors, incorrect parameter estimates, and model fit indices. In addition, factor structures may differ depending on…

  20. Analysis of Ultra-Compact Dwarf Galaxies in the Antlia cluster

    NASA Astrophysics Data System (ADS)

    Caso, J. P.; Bassino, L. P.; Smith Castelli, A. V.

    As a continuation of the search for ultra-compact dwarf galaxies (UCDs) in the central region of the Antlia cluster, a new selection of these object is presented, as well as a comparison between their colour distributions and those of globular cluster candidates. FULL TEXT IN SPANISH

  1. A typology of men who batter: three types derived from cluster analysis.

    PubMed

    Saunders, D G

    1992-04-01

    Important theoretical and treatment implications may be revealed when men who batter their intimate partners are categorized according to type. Data on 165 batterers were cluster analyzed, and three types identified: family-only aggressors, generalized aggressors, and emotionally volatile aggressors. The clustering variables explained 90% of the variance in category assignment. Implications for treatment are discussed.

  2. Analysis of the Tribolium homeotic complex: insights into mechanisms constraining insect Hox clusters

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The remarkable conservation of Hox clusters is an accepted but little understood principle of biology. Some organizational constraints have been identified for vertebrate Hox clusters, but most of these are thought to be recent innovations that may not apply to other organisms. Ironically, many mode...

  3. An Empirical Taxonomy of Youths' Fears: Cluster Analysis of the American Fear Survey Schedule

    ERIC Educational Resources Information Center

    Burnham, Joy J.; Schaefer, Barbara A.; Giesen, Judy

    2006-01-01

    Fears profiles among children and adolescents were explored using the Fear Survey Schedule for Children-American version (FSSC-AM; J.J. Burnham, 1995, 2005). Eight cluster profiles were identified via multistage Euclidean grouping and supported by homogeneity coefficients and replication. Four clusters reflected overall level of fears (i.e., very…

  4. Metabolic risk profiles created using cluster analysis are differentially associated with physical activity: The ARIC study

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Conditions such as hypertension, dyslipidemia, glucose intolerance, and obesity tend to cluster together and predict cardiovascular disease, type 2 diabetes, and premature mortality. This clustering has led to multiple definitions of the Metabolic Syndrome (MetS). While the definitions agree on the ...

  5. Thermodynamic and morphological analysis of large silicon self-interstitial clusters using atomistic simulations

    NASA Astrophysics Data System (ADS)

    Chuang, Claire Y.; Sattler, Andreas; Sinno, Talid

    2015-04-01

    We study computationally the formation of thermodynamics and morphology of silicon self-interstitial clusters using a suite of methods driven by a recent parameterization of the Tersoff empirical potential. Formation free energies and cluster capture zones are computed across a wide range of cluster sizes (2 < Ni < 150) and temperatures (0.65 < T/Tm < 1). Self-interstitial clusters above a critical size (Ni ˜ 25) are found to exhibit complex morphological behavior in which clusters can assume either a variety of disordered, three-dimensional configurations, or one of two macroscopically distinct planar configurations. The latter correspond to the well-known Frank and perfect dislocation loops observed experimentally in ion-implanted silicon. The relative importance of the different cluster morphologies is a function of cluster size and temperature and is dictated by a balance between energetic and entropic forces. The competition between these thermodynamic forces produces a sharp transition between the three-dimensional and planar configurations, and represents a type of order-disorder transition. By contrast, the smaller state space available to smaller clusters restricts the diversity of possible structures and inhibits this morphological transition.