cluster enrichment analysis: Topics by Science.gov

Sample records for cluster enrichment analysis

Enrichment Clusters: A Practical Plan for Real-World, Student-Driven Learning.

ERIC Educational Resources Information Center

Renzulli, Joseph S.; Gentry, Marcia; Reis, Sally M.

This guidebook provides a rationale and guidelines for implementing a student-driven learning approach using enrichment clusters. Enrichment clusters allow students who share a common interest to meet each week to produce a product, performance, or targeted service based on that common interest. Chapter 1 discusses different models of learning.…
Chemical Enrichment History Of Abell 3112 Galaxy Cluster Out To The Virial Radius

NASA Astrophysics Data System (ADS)

Ezer, C.; Bulbul, E.; Ercan, E.; Smith, R.; Bautz, M.; Loewenstein, M.; McDonald, M.; Miller, E.

2017-10-01

The deep potential well of the galaxy clusters confines all metals produced via supernova explosions within the intra-cluster medium (ICM). The radial distributions of these metals along the ICM are direct records of the metal enrichment history. In this work, we investigate the chemical enrichment history of Abell 3112 galaxy cluster from cluster's core to out to radius R_{200} (˜ 1470 kpc) by analyzing a deep 1.2 Ms Suzaku observations with overlapping 72 ks Chandra observations. The fraction of supernova explosions enriching the ICM is obtained by fitting the X-ray spectra with a robust snapec model implemented in XSPEC. The ratio of supernova type Ia explosions to the core collapse supernova explosions is found in the range 0.12 - 0.16 and uniformly distributed out to R_{200}. The uniform spatial distribution of supernova enrichment indicates an early metal enrichment between the epoch of z ˜ 2 - 3. We also observe that W7, CDD, and WDD SN Ia models equally better explain the highest signal-to-noise region compared to 2D delayed detonation model CDDT. We further report the first time temperature (3.37 ± 0.77 keV) and metallicity (0.22 ± 0.08 Z_{⊙}) measurements of this archetypal cluster at its virial radius.
A Clustering-Based Approach to Enriching Code Foraging Environment.

PubMed

Niu, Nan; Jin, Xiaoyu; Niu, Zhendong; Cheng, Jing-Ru C; Li, Ling; Kataev, Mikhail Yu

2016-09-01

Developers often spend valuable time navigating and seeking relevant code in software maintenance. Currently, there is a lack of theoretical foundations to guide tool design and evaluation to best shape the code base to developers. This paper contributes a unified code navigation theory in light of the optimal food-foraging principles. We further develop a novel framework for automatically assessing the foraging mechanisms in the context of program investigation. We use the framework to examine to what extent the clustering of software entities affects code foraging. Our quantitative analysis of long-lived open-source projects suggests that clustering enriches the software environment and improves foraging efficiency. Our qualitative inquiry reveals concrete insights into real developer's behavior. Our research opens the avenue toward building a new set of ecologically valid code navigation tools.
The colour-magnitude relation of globular clusters in Centaurus and Hydra. Constraints on star cluster self-enrichment with a link to massive Milky Way globular clusters

NASA Astrophysics Data System (ADS)

Fensch, J.; Mieske, S.; Müller-Seidlitz, J.; Hilker, M.

2014-07-01

Aims: We investigate the colour-magnitude relation of metal-poor globular clusters, the so-called blue tilt, in the Hydra and Centaurus galaxy clusters and constrain the primordial conditions for star cluster self-enrichment. Methods: We analyse U,I photometry for about 2500 globular clusters in the central regions of Hydra and Centaurus, based on VLT/FORS1 data. We measure the relation between mean colour and luminosity for the blue and red subpopulation of the globular cluster samples. We convert these relations into mass-metallicity space and compare the obtained GC mass-metallicity relation with predictions from the star cluster self-enrichment model by Bailin & Harris (2009, ApJ, 695, 1082). For this we include effects of dynamical and stellar evolution and a physically well motivated primordial mass-radius scaling. Results: We obtain a mass-metallicity scaling of Z ∝ M0.27 ± 0.05 for Centaurus GCs and Z ∝ M0.40 ± 0.06 for Hydra GCs, consistent with the range of observed relations in other environments. We find that the GC mass-metallicity relation already sets in at present-day masses of a few and is well established in the luminosity range of massive MW clusters like ω Centauri. The inclusion of a primordial mass-radius scaling of star clusters significantly improves the fit of the self-enrichment model to the data. The self-enrichment model accurately reproduces the observed relations for average primordial half-light radii rh ~ 1-1.5 pc, star formation efficiencies f⋆ ~ 0.3-0.4, and pre-enrichment levels of [Fe/H] - 1.7 dex. The slightly steeper blue tilt for Hydra can be explained either by a ~30% smaller average rh at fixed f⋆ ~ 0.3, or analogously by a ~20% smaller f⋆ at fixed rh ~ 1.5 pc. Within the self-enrichment scenario, the observed blue tilt implies a correlation between GC mass and width of the stellar metallicity distribution. We find that this implied correlation matches the trend of width with GC mass measured in Galactic GCs
Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture

PubMed Central

Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; Cole, James R.; Hashsham, Syed A.; Looft, Torey; Zhu, Yong-Guan

2016-01-01

ABSTRACT Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. PMID:27073098
A Detailed Study of Chemical Enrichment History of Galaxy Clusters out to Virial Radius

NASA Astrophysics Data System (ADS)

Loewenstein, Michael

The origin of the metal enrichment of the intracluster medium (ICM) represents a fundamental problem in extragalactic astrophysics, with implications for our understanding of how stars and galaxies form, the nature of Type Ia supernova (SNIa) progenitors, and the thermal history of the ICM. These heavy elements are ultimately synthesized by supernova (SN) explosions; however, the details of the sites of metal production and mechanisms that transport metals to the ICM remain unclear. To make progress, accurate abundance profiles for multiple elements extending from the cluster core out to the virial radius (r180) are required for a significant cluster sample. We propose an X-ray spectroscopic study of a carefully-chosen sample of archival Suzaku and XMM-Newton observations of 23 clusters: XMM-Newton data probe the cluster temperature and abundances out to (0.5-1)r500, while Suzaku data probe the cluster outskirts. A method devised by our team to utilize all elements with emission lines in the X-ray bandpass to measure the relative contributions of supernova explosions by direct modeling of their X-ray spectra will be applied in order to constrain the demographics of the enriching supernova population. In addition we will conduct a stacking analysis of our already existing Suzaku and XMM-Newton cluster spectra to search for weak emssion lines that are important SN diagnostics, and to look for trends with cluster mass and redshift. The funding we propose here will also support the data analysis of our recent Suzaku observations of the archetypal cluster A3112 (200 ks each on the core and outskirts). Our data analysis, intepreted using theoretical models we have developed, will enable us to constrain the star formation history, SN demographics, and nature of SNIa progenitors associated with galaxy cluster stellar populations - and, hence, directly addresess NASA s Strategic Objective 2.4.2 in Astrophysics that aims to improve the understanding of how the Universe works
Hemagglutinin Clusters in the Plasma Membrane Are Not Enriched with Cholesterol and Sphingolipids

DOE PAGES

Wilson, Robert L.; Frisz, Jessica F.; Klitzing, Haley A.; ...

2015-04-07

The clusters of the influenza envelope protein, hemagglutinin, within the plasma membrane are hypothesized to be enriched with cholesterol and sphingolipids. Here in this paper, we directly tested this hypothesis by using high-resolution secondary ion mass spectrometry to image the distributions of antibody-labeled hemagglutinin and isotope-labeled cholesterol and sphingolipids in the plasma membranes of fibroblast cells that stably express hemagglutinin. We found that the hemagglutinin clusters were neither enriched with cholesterol nor colocalized with sphingolipid domains. Thus, hemagglutinin clustering and localization in the plasma membrane is not controlled by cohesive interactions between hemagglutinin and liquid-ordered domains enriched with cholesterol andmore » sphingolipids, or from specific binding interactions between hemagglutinin, cholesterol, and/or the majority of sphingolipid species in the plasma membrane.« less
Implementing Enrichment Clusters in Elementary Schools: Lessons Learned

ERIC Educational Resources Information Center

Fiddyment, Gail E.

2014-01-01

Enrichment clusters offer a way for schools to encourage a high level of learning as students and adults work together to develop a product, service, or performance by applying advanced knowledge and authentic processes to real-world problems. This study utilized a qualitative research design to examine the perceptions and experiences of two…
Clusters of antibiotic resistance genes enriched together stay together in swine agriculture

DOE PAGES

Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong; ...

2016-04-12

Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance
Clusters of antibiotic resistance genes enriched together stay together in swine agriculture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Timothy A.; Stedtfeld, Robert D.; Wang, Qiong

Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundancemore » of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk.Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance
Clusters of Antibiotic Resistance Genes Enriched Together Stay Together in Swine Agriculture.

PubMed

Johnson, Timothy A; Stedtfeld, Robert D; Wang, Qiong; Cole, James R; Hashsham, Syed A; Looft, Torey; Zhu, Yong-Guan; Tiedje, James M

2016-04-12

Antibiotic resistance is a worldwide health risk, but the influence of animal agriculture on the genetic context and enrichment of individual antibiotic resistance alleles remains unclear. Using quantitative PCR followed by amplicon sequencing, we quantified and sequenced 44 genes related to antibiotic resistance, mobile genetic elements, and bacterial phylogeny in microbiomes from U.S. laboratory swine and from swine farms from three Chinese regions. We identified highly abundant resistance clusters: groups of resistance and mobile genetic element alleles that cooccur. For example, the abundance of genes conferring resistance to six classes of antibiotics together with class 1 integrase and the abundance of IS6100-type transposons in three Chinese regions are directly correlated. These resistance cluster genes likely colocalize in microbial genomes in the farms. Resistance cluster alleles were dramatically enriched (up to 1 to 10% as abundant as 16S rRNA) and indicate that multidrug-resistant bacteria are likely the norm rather than an exception in these communities. This enrichment largely occurred independently of phylogenetic composition; thus, resistance clusters are likely present in many bacterial taxa. Furthermore, resistance clusters contain resistance genes that confer resistance to antibiotics independently of their particular use on the farms. Selection for these clusters is likely due to the use of only a subset of the broad range of chemicals to which the clusters confer resistance. The scale of animal agriculture and its wastes, the enrichment and horizontal gene transfer potential of the clusters, and the vicinity of large human populations suggest that managing this resistance reservoir is important for minimizing human risk. Agricultural antibiotic use results in clusters of cooccurring resistance genes that together confer resistance to multiple antibiotics. The use of a single antibiotic could select for an entire suite of resistance genes if
Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.

PubMed

Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T

2010-01-01

Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new
The origin of ICM enrichment in the outskirts of present-day galaxy clusters from cosmological hydrodynamical simulations

NASA Astrophysics Data System (ADS)

Biffi, V.; Planelles, S.; Borgani, S.; Rasia, E.; Murante, G.; Fabjan, D.; Gaspari, M.

2018-05-01

The uniformity of the intracluster medium (ICM) enrichment level in the outskirts of nearby galaxy clusters suggests that chemical elements were deposited and widely spread into the intergalactic medium before the cluster formation. This observational evidence is supported by numerical findings from cosmological hydrodynamical simulations, as presented in Biffi et al., including the effect of thermal feedback from active galactic nuclei. Here, we further investigate this picture, by tracing back in time the spatial origin and metallicity evolution of the gas residing at z = 0 in the outskirts of simulated galaxy clusters. In these regions, we find a large distribution of iron abundances, including a component of highly enriched gas, already present at z = 2. At z > 1, the gas in the present-day outskirts was distributed over tens of virial radii from the main cluster and had been already enriched within high-redshift haloes. At z = 2, about 40 {per cent} of the most Fe-rich gas at z = 0 was not residing in any halo more massive than 10^{11} h^{-1} M_{⊙} in the region and yet its average iron abundance was already 0.4, w.r.t. the solar value by Anders & Grevesse. This confirms that the in situ enrichment of the ICM in the outskirts of present-day clusters does not play a significant role, and its uniform metal abundance is rather the consequence of the accretion of both low-metallicity and pre-enriched (at z > 2) gas, from the diffuse component and through merging substructures. These findings do not depend on the mass of the cluster nor on its core properties.
Network Analysis Tools: from biological networks to clusters and pathways.

PubMed

Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

2008-01-01

Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.
On Iron Enrichment, Star Formation, and Type Ia Supernovae in Galaxy Clusters

NASA Technical Reports Server (NTRS)

Loewenstein, Michael

2006-01-01

The nature of star formation and Type Ia supernovae (SNIa) in galaxies in the field and in rich galaxy clusters are contrasted by juxtaposing the buildup of heavy metals in the universe inferred from observed star formation and supernovae rate histories with data on the evolution of Fe abundances in the intracluster medium (ICM). Models for the chemical evolution of Fe in these environments are constructed, subject to observational constraints, for this purpose. While models with a mean delay for SNIa of 3 Gyr and standard initial mass function (IMF) are fully consistent with observations in the field, cluster Fe enrichment immediately tracked a rapid, top-heavy phase of star formation - although transport of Fe into the ICM may have been more prolonged and star formation likely continued beyond redshift 1. The means of this prompt enrichment consisted of SNII yielding greater than or equal to 0.1 solar mass per explosion (if the SNIa rate normalization is scaled down from its value in the field according to the relative number of candidate progenitor stars in the 3 - 8 solar mass range) and/or SNIa with short delay times originating during the rapid star formation epoch. Star formation is greater than 3 times more efficient in rich clusters than in the field, mitigating the overcooling problem in numerical cluster simulations. Both the fraction of baryons cycled through stars, and the fraction of the total present-day stellar mass in the form of stellar remnants, are substantially greater in clusters than in the field.
Tissue enrichment analysis for C. elegans genomics.

PubMed

Angeles-Albores, David; N Lee, Raymond Y; Chan, Juancarlos; Sternberg, Paul W

2016-09-13

Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python's standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results.
Ranking metrics in gene set enrichment analysis: do they matter?

PubMed

Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

2017-05-12

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner
clusterProfiler: an R package for comparing biological themes among gene clusters.

PubMed

Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

2012-05-01

Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.
Concurrent formation of supermassive stars and globular clusters: implications for early self-enrichment

NASA Astrophysics Data System (ADS)

Gieles, Mark; Charbonnel, Corinne; Krause, Martin G. H.; Hénault-Brunet, Vincent; Agertz, Oscar; Lamers, Henny J. G. L. M.; Bastian, Nathan; Gualandris, Alessia; Zocchi, Alice; Petts, James A.

2018-04-01

We present a model for the concurrent formation of globular clusters (GCs) and supermassive stars (SMSs, ≳ 103 M⊙) to address the origin of the HeCNONaMgAl abundance anomalies in GCs. GCs form in converging gas flows and accumulate low-angular momentum gas, which accretes onto protostars. This leads to an adiabatic contraction of the cluster and an increase of the stellar collision rate. A SMS can form via runaway collisions if the cluster reaches sufficiently high density before two-body relaxation halts the contraction. This condition is met if the number of stars ≳ 106 and the gas accretion rate ≳ 105 M⊙/Myr, reminiscent of GC formation in high gas-density environments, such as - but not restricted to - the early Universe. The strong SMS wind mixes with the inflowing pristine gas, such that the protostars accrete diluted hot-hydrogen burning yields of the SMS. Because of continuous rejuvenation, the amount of processed material liberated by the SMS can be an order of magnitude higher than its maximum mass. This `conveyor-belt' production of hot-hydrogen burning products provides a solution to the mass budget problem that plagues other scenarios. Additionally, the liberated material is mildly enriched in helium and relatively rich in other hot-hydrogen burning products, in agreement with abundances of GCs today. Finally, we find a super-linear scaling between the amount of processed material and cluster mass, providing an explanation for the observed increase of the fraction of processed material with GC mass. We discuss open questions of this new GC enrichment scenario and propose observational tests.
NEAT: an efficient network enrichment analysis test.

PubMed

Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C

2016-09-05

Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).

Origin of central abundances in the hot intra-cluster medium. II. Chemical enrichment and supernova yield models

NASA Astrophysics Data System (ADS)

Mernier, F.; de Plaa, J.; Pinto, C.; Kaastra, J. S.; Kosec, P.; Zhang, Y.-Y.; Mao, J.; Werner, N.; Pols, O. R.; Vink, J.

2016-11-01

The hot intra-cluster medium (ICM) is rich in metals, which are synthesised by supernovae (SNe) and accumulate over time into the deep gravitational potential well of clusters of galaxies. Since most of the elements visible in X-rays are formed by type Ia (SNIa) and/or core-collapse (SNcc) supernovae, measuring their abundances gives us direct information on the nucleosynthesis products of billions of SNe since the epoch of the star formation peak (z 2-3). In this study, we compare the most accurate average X/Fe abundance ratios (compiled in a previous work from XMM-Newton EPIC and RGS observations of 44 galaxy clusters, groups, and ellipticals), representative of the chemical enrichment in the nearby ICM, to various SNIa and SNcc nucleosynthesis models found in the literature. The use of a SNcc model combined to any favoured standard SNIa model (deflagration or delayed-detonation) fails to reproduce our abundance pattern. In particular, the Ca/Fe and Ni/Fe ratios are significantly underestimated by the models. We show that the Ca/Fe ratio can be reproduced better, either by taking a SNIa delayed-detonation model that matches the observations of the Tycho supernova remnant, or by adding a contribution from the "Ca-rich gap transient" SNe, whose material should easily mix into the hot ICM. On the other hand, the Ni/Fe ratio can be reproduced better by assuming that both deflagration and delayed-detonation SNIa contribute in similar proportions to the ICM enrichment. In either case, the fraction of SNIa over the total number of SNe (SNIa+SNcc) contributing to the ICM enrichment ranges within 29-45%. This fraction is found to be systematically higher than the corresponding SNIa/(SNIa+SNcc) fraction contributing to the enrichment of the proto-solar environnement (15-25%). We also discuss and quantify two useful constraints on both SNIa (I.e. the initial metallicity on SNIa progenitors and the fraction of low-mass stars that result in SNIa) and SNcc (I.e. the effect of
Fluorine Variations in the Globular Cluster NGC 6656 (M22): Implications for Internal Enrichment Timescales

NASA Astrophysics Data System (ADS)

D'Orazi, Valentina; Lucatello, Sara; Lugaro, Maria; Gratton, Raffaele G.; Angelou, George; Bragaglia, Angela; Carretta, Eugenio; Alves-Brito, Alan; Ivans, Inese I.; Masseron, Thomas; Mucciarelli, Alessio

2013-01-01

Observed chemical (anti)correlations in proton-capture elements among globular cluster stars are presently recognized as the signature of self-enrichment from now extinct, previous generations of stars. This defines the multiple population scenario. Since fluorine is also affected by proton captures, determining its abundance in globular clusters provides new and complementary clues regarding the nature of these previous generations and supplies strong observational constraints to the chemical enrichment timescales. In this paper, we present our results on near-infrared CRIRES spectroscopic observations of six cool giant stars in NGC 6656 (M22): the main objective is to derive the F content and its internal variation in this peculiar cluster, which exhibits significant changes in both light- and heavy-element abundances. Across our sample, we detected F variations beyond the measurement uncertainties and found that the F abundances are positively correlated with O and anticorrelated with Na, as expected according to the multiple population framework. Furthermore, our observations reveal an increase in the F content between the two different sub-groups, s-process rich and s-process poor, hosted within M22. The comparison with theoretical models suggests that asymptotic giant stars with masses between 4 and 5 M ⊙ are responsible for the observed chemical pattern, confirming evidence from previous works: the difference in age between the two sub-components in M22 must be not larger than a few hundred Myr. Based on observations taken with ESO telescopes under program 087.0319(A).
Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants.

PubMed

Pasquali, Lorenzo; Gaulton, Kyle J; Rodríguez-Seguí, Santiago A; Mularoni, Loris; Miguel-Escalada, Irene; Akerman, İldem; Tena, Juan J; Morán, Ignasi; Gómez-Marín, Carlos; van de Bunt, Martijn; Ponsa-Cobas, Joan; Castro, Natalia; Nammo, Takao; Cebola, Inês; García-Hurtado, Javier; Maestro, Miguel Angel; Pattou, François; Piemonti, Lorenzo; Berney, Thierry; Gloyn, Anna L; Ravassard, Philippe; Skarmeta, José Luis Gómez; Müller, Ferenc; McCarthy, Mark I; Ferrer, Jorge

2014-02-01

Type 2 diabetes affects over 300 million people, causing severe complications and premature death, yet the underlying molecular mechanisms are largely unknown. Pancreatic islet dysfunction is central in type 2 diabetes pathogenesis, and understanding islet genome regulation could therefore provide valuable mechanistic insights. We have now mapped and examined the function of human islet cis-regulatory networks. We identify genomic sequences that are targeted by islet transcription factors to drive islet-specific gene activity and show that most such sequences reside in clusters of enhancers that form physical three-dimensional chromatin domains. We find that sequence variants associated with type 2 diabetes and fasting glycemia are enriched in these clustered islet enhancers and identify trait-associated variants that disrupt DNA binding and islet enhancer activity. Our studies illustrate how islet transcription factors interact functionally with the epigenome and provide systematic evidence that the dysregulation of islet enhancers is relevant to the mechanisms underlying type 2 diabetes.
Cluster Correspondence Analysis.

PubMed

van de Velden, M; D'Enza, A Iodice; Palumbo, F

2017-03-01

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
CHEERS: Chemical enrichment of clusters of galaxies measured using a large XMM-Newton sample

NASA Astrophysics Data System (ADS)

de Plaa, J.; Mernier, F.; Kaastra, J.; Pinto, C.

2017-10-01

The Chemical Enrichment RGS Sample (CHEERS) is aimed to be a sample of the most optimal clusters of galaxies for observation with the Reflection Grating Spectrometer (RGS) aboard XMM-Newton. It consists of 5 Ms of deep cluster observations of 44 objects obtained through a very large program and archival observations. The main goal is to measure chemical abundances in the hot Intra-Cluster Medium (ICM) of clusters to provide constraints on chemical evolution models. Especially the origin and evolution of type Ia supernovae is still poorly known and X-ray observations could contribute to constrain models regarding the SNIa explosion mechanism. Due to the high quality of the data, the uncertainties on the abundances are dominated by systematic effects. By carefully treating each systematic effect, we increase the accuracy or estimate the remaining uncertainty on the measurement. The resulting abundances are then compared to supernova models. In addition, also radial abundance profiles are derived. In the talk, we present an overview of the results that the CHEERS collaboration obtained based on the CHEERS data. We focus on the abundance measurements. The other topics range from turbulence measurements through line broadening to cool gas in groups.
GOMA: functional enrichment analysis tool based on GO modules

PubMed Central

Huang, Qiang; Wu, Ling-Yun; Wang, Yong; Zhang, Xiang-Sun

2013-01-01

Analyzing the function of gene sets is a critical step in interpreting the results of high-throughput experiments in systems biology. A variety of enrichment analysis tools have been developed in recent years, but most output a long list of significantly enriched terms that are often redundant, making it difficult to extract the most meaningful functions. In this paper, we present GOMA, a novel enrichment analysis method based on the new concept of enriched functional Gene Ontology (GO) modules. With this method, we systematically revealed functional GO modules, i.e., groups of functionally similar GO terms, via an optimization model and then ranked them by enrichment scores. Our new method simplifies enrichment analysis results by reducing redundancy, thereby preventing inconsistent enrichment results among functionally similar terms and providing more biologically meaningful results. PMID:23237213
MicroRNA-Target Network Inference and Local Network Enrichment Analysis Identify Two microRNA Clusters with Distinct Functions in Head and Neck Squamous Cell Carcinoma

PubMed Central

Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S.; Theis, Fabian J.

2015-01-01

MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method “miRlastic”, which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of mi
MicroRNA-Target Network Inference and Local Network Enrichment Analysis Identify Two microRNA Clusters with Distinct Functions in Head and Neck Squamous Cell Carcinoma.

PubMed

Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S; Theis, Fabian J

2015-12-18

MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method "miRlastic", which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

PubMed

Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

2015-11-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Redefining the Breast Cancer Exosome Proteome by Tandem Mass Tag Quantitative Proteomics and Multivariate Cluster Analysis.

PubMed

Clark, David J; Fondrie, William E; Liao, Zhongping; Hanson, Phyllis I; Fulton, Amy; Mao, Li; Yang, Austin J

2015-10-20

Exosomes are microvesicles of endocytic origin constitutively released by multiple cell types into the extracellular environment. With evidence that exosomes can be detected in the blood of patients with various malignancies, the development of a platform that uses exosomes as a diagnostic tool has been proposed. However, it has been difficult to truly define the exosome proteome due to the challenge of discerning contaminant proteins that may be identified via mass spectrometry using various exosome enrichment strategies. To better define the exosome proteome in breast cancer, we incorporated a combination of Tandem-Mass-Tag (TMT) quantitative proteomics approach and Support Vector Machine (SVM) cluster analysis of three conditioned media derived fractions corresponding to a 10 000g cellular debris pellet, a 100 000g crude exosome pellet, and an Optiprep enriched exosome pellet. The quantitative analysis identified 2 179 proteins in all three fractions, with known exosomal cargo proteins displaying at least a 2-fold enrichment in the exosome fraction based on the TMT protein ratios. Employing SVM cluster analysis allowed for the classification 251 proteins as "true" exosomal cargo proteins. This study provides a robust and vigorous framework for the future development of using exosomes as a potential multiprotein marker phenotyping tool that could be useful in breast cancer diagnosis and monitoring disease progression.
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis.

PubMed

Yu, Guangchuang; Wang, Li-Gen; Yan, Guang-Rong; He, Qing-Yu

2015-02-15

Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. This allows biologists to verify disease relevance in a biological experiment and identify unexpected disease associations. Comparison among gene clusters is also supported. DOSE is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/DOSE.html). Supplementary data are available at Bioinformatics online. gcyu@connect.hku.hk or tqyhe@jnu.edu.cn. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

PubMed Central

Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

2016-01-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Enrichment 2.0 Gifted and Talented Education for the 21st Century

ERIC Educational Resources Information Center

Eckstein, Michelle

2009-01-01

Enrichment clusters, a component of the Schoolwide Enrichment Model, are multigrade investigative groups based on constructivist learning methodology. Enrichment clusters are organized around major disciplines, interdisciplinary themes, or cross-disciplinary topics. Within clusters, students are grouped across grade levels by interests and focused…
Comprehensive cluster analysis with Transitivity Clustering.

PubMed

Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

2011-03-01

Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.
[Cluster analysis in biomedical researches].

PubMed

Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

2013-01-01

Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
A uniform contribution of core-collapse and type Ia supernovae to the chemical enrichment pattern in the outskirts of the Virgo Cluster

DOE PAGES

Simionescu, A.; Werner, N.; Urban, O.; ...

2015-09-24

We present the first measurements of the abundances of α-elements (Mg, Si, and S) extending out beyond the virial radius of a cluster of galaxies. Our results, based on Suzaku Key Project observations of the Virgo Cluster, show that the chemical composition of the intracluster medium is consistent with being constant on large scales, with a flat distribution of the Si/Fe, S/Fe, and Mg/Fe ratios as a function of radius and azimuth out to 1.4 Mpc (1.3 r 200). Chemical enrichment of the intergalactic medium due solely to core-collapse supernovae (SNcc) is excluded with very high significance; instead, the measuredmore » metal abundance ratios are generally consistent with the solar value. The uniform metal abundance ratios observed today are likely the result of an early phase of enrichment and mixing, with both SNcc and SNe Ia contributing to the metal budget during the period of peak star formation activity at redshifts of 2–3. Furthermore, we estimate the ratio between the number of SNe Ia and the total number of supernovae enriching the intergalactic medium to be between 12% and 37%, broadly consistent with the metal abundance patterns in our own Galaxy or with the SN Ia contribution estimated for the cluster cores.« less
A UNIFORM CONTRIBUTION OF CORE-COLLAPSE AND TYPE Ia SUPERNOVAE TO THE CHEMICAL ENRICHMENT PATTERN IN THE OUTSKIRTS OF THE VIRGO CLUSTER

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simionescu, A.; Ichinohe, Y.; Werner, N.

2015-10-01

We present the first measurements of the abundances of α-elements (Mg, Si, and S) extending out beyond the virial radius of a cluster of galaxies. Our results, based on Suzaku Key Project observations of the Virgo Cluster, show that the chemical composition of the intracluster medium is consistent with being constant on large scales, with a flat distribution of the Si/Fe, S/Fe, and Mg/Fe ratios as a function of radius and azimuth out to 1.4 Mpc (1.3 r{sub 200}). Chemical enrichment of the intergalactic medium due solely to core-collapse supernovae (SNcc) is excluded with very high significance; instead, the measuredmore » metal abundance ratios are generally consistent with the solar value. The uniform metal abundance ratios observed today are likely the result of an early phase of enrichment and mixing, with both SNcc and SNe Ia contributing to the metal budget during the period of peak star formation activity at redshifts of 2–3. We estimate the ratio between the number of SNe Ia and the total number of supernovae enriching the intergalactic medium to be between 12% and 37%, broadly consistent with the metal abundance patterns in our own Galaxy or with the SN Ia contribution estimated for the cluster cores.« less
Clustering in the stellar abundance space

NASA Astrophysics Data System (ADS)

Boesso, R.; Rocha-Pinto, H. J.

2018-03-01

We have studied the chemical enrichment history of the interstellar medium through an analysis of the n-dimensional stellar abundance space. This work is a non-parametric analysis of the stellar chemical abundance space. The main goal is to study the stars from their organization within this abundance space. Within this space, we seek to find clusters (in a statistical sense), that is, stars likely to share similar chemo-evolutionary history, using two methods: the hierarchical clustering and the principal component analysis. We analysed some selected abundance surveys available in the literature. For each sample, we labelled the group of stars according to its average abundance curve. In all samples, we identify the existence of a main enrichment pattern of the stars, which we call chemical enrichment flow. This flow is set by the structured and well-defined mean rate at which the abundances of the interstellar medium increase, resulting from the mixture of the material ejected from the stars and stellar mass-loss and interstellar medium gas. One of the main results of our analysis is the identification of subgroups of stars with peculiar chemistry. These stars are situated in regions outside of the enrichment flow in the abundance space. These peculiar stars show a mismatch in the enrichment rate of a few elements, such as Mg, Si, Sc and V, when compared to the mean enrichment rate of the other elements of the same stars. We believe that the existence of these groups of stars with peculiar chemistry may be related to the accretion of planetary material on to stellar surfaces or may be due to production of the same chemical element by different nucleosynthetic sites.
ALMA REVEALS POTENTIAL LOCALIZED DUST ENRICHMENT FROM MASSIVE STAR CLUSTERS IN II Zw 40

DOE Office of Scientific and Technical Information (OSTI.GOV)

Consiglio, S. Michelle; Turner, Jean L.; Beck, Sara

2016-12-10

We present subarcsecond images of submillimeter CO and continuum emission from a local galaxy forming massive star clusters: the blue compact dwarf galaxy II Zw 40. At ∼0.″4 resolution (20 pc), the CO(3-2), CO(1-0), 3 mm, and 870 μ m continuum maps illustrate star formation on the scales of individual molecular clouds. Dust contributes about one-third of the 870 μ m continuum emission, with free–free accounting for the rest. On these scales, there is not a good correspondence between gas, dust, and free–free emission. Dust continuum is enhanced toward the star-forming region as compared to the CO emission. We suggestmore » that an unexpectedly low and spatially variable gas-to-dust ratio is the result of rapid and localized dust enrichment of clouds by the massive clusters of the starburst.« less
ProbCD: enrichment analysis accounting for categorization uncertainty.

PubMed

Vêncio, Ricardo Z N; Shmulevich, Ilya

2007-10-12

As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R-based software to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line interface was created to allow usage by non-programmers and is available at: http://xerad.systemsbiology.net/ProbCD/. We present an analysis framework and software tools to address the issue of uncertainty in categorical data analysis. In particular, concerning the enrichment analysis, ProbCD can accommodate: (i) the stochastic nature of the high-throughput experimental techniques and (ii) probabilistic gene annotation.

Globular cluster formation with multiple stellar populations: self-enrichment in fractal massive molecular clouds

NASA Astrophysics Data System (ADS)

Bekki, Kenji

2017-08-01

Internal chemical abundance spreads are one of fundamental properties of globular clusters (GCs) in the Galaxy. In order to understand the origin of such abundance spreads, we numerically investigate GC formation from massive molecular clouds (MCs) with fractal structures using our new hydrodynamical simulations with star formation and feedback effects of core-collapse supernovae (SNe) and asymptotic giant branch (AGB) stars. We particularly investigate star formation from gas chemically contaminated by SNe and AGB stars ('self-enrichment') in forming GCs within MCs with different initial conditions and environments. The principal results are as follows. GCs with multiple generations of stars can be formed from merging of hierarchical star cluster complexes that are developed from high-density regions of fractal MCs. Feedback effects of SNe and AGB stars can control the formation efficiencies of stars formed from original gas of MCs and from gas ejected from AGB stars. The simulated GCs have strong radial gradients of helium abundances within the central 3 pc. The original MC masses need to be as large as 107 M⊙ for a canonical initial stellar mass function (IMF) so that the final masses of stars formed from AGB ejecta can be ˜105 M⊙. Since star formation from AGB ejecta is rather prolonged (˜108 yr), their formation can be strongly suppressed by SNe of the stars themselves. This result implies that the so-called mass budget problem is much more severe than ever thought in the self-enrichment scenario of GC formation and thus that IMF for the second generation of stars should be 'top-light'.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

PubMed

Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

2015-01-01

Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

PubMed Central

2014-01-01

Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations
Spectral gene set enrichment (SGSE).

PubMed

Frost, H Robert; Li, Zhigang; Moore, Jason H

2015-03-03

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.
Mixture modelling for cluster analysis.

PubMed

McLachlan, G J; Chang, S U

2004-10-01

Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Mass-invariance of the iron enrichment in the hot haloes of massive ellipticals, groups, and clusters of galaxies

NASA Astrophysics Data System (ADS)

Mernier, F.; de Plaa, J.; Werner, N.; Kaastra, J. S.; Raassen, A. J. J.; Gu, L.; Mao, J.; Urdampilleta, I.; Truong, N.; Simionescu, A.

2018-05-01

X-ray measurements find systematically lower Fe abundances in the X-ray emitting haloes pervading groups (kT ≲ 1.7 keV) than in clusters of galaxies. These results have been difficult to reconcile with theoretical predictions. However, models using incomplete atomic data or the assumption of isothermal plasmas may have biased the best fit Fe abundance in groups and giant elliptical galaxies low. In this work, we take advantage of a major update of the atomic code in the spectral fitting package SPEX to re-evaluate the Fe abundance in 43 clusters, groups, and elliptical galaxies (the CHEERS sample) in a self-consistent analysis and within a common radius of 0.1r500. For the first time, we report a remarkably similar average Fe enrichment in all these systems. Unlike previous results, this strongly suggests that metals are synthesised and transported in these haloes with the same average efficiency across two orders of magnitude in total mass. We show that the previous metallicity measurements in low temperature systems were biased low due to incomplete atomic data in the spectral fitting codes. The reasons for such a code-related Fe bias, also implying previously unconsidered biases in the emission measure and temperature structure, are discussed.
The limitations of simple gene set enrichment analysis assuming gene independence.

PubMed

Tamayo, Pablo; Steinhardt, George; Liberzon, Arthur; Mesirov, Jill P

2016-02-01

Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods. © The Author(s) 2012.
White wines aroma recovery and enrichment: Sensory-led aroma selection and consumer perception.

PubMed

Lezaeta, Alvaro; Bordeu, Edmundo; Agosin, Eduardo; Pérez-Correa, J Ricardo; Varela, Paula

2018-06-01

We developed a sensory-based methodology to aromatically enrich wines using different aromatic fractions recovered during fermentations of Sauvignon Blanc must. By means of threshold determination and generic descriptive analysis using a trained sensory panel, the aromatic fractions were characterized, selected, and clustered. The selected fractions were grouped, re-assessed, and validated by the trained panel. A consumer panel assessed overall liking and answered a CATA question on some enriched wines and their ideal sample. Differences in elicitation rates between non-enriched and enriched wines with respect to the ideal product highlighted product optimization and the role of aromatic enrichment. Enrichment with aromatic fractions increased the aromatic quality of wines and enhanced consumer appreciation. Copyright © 2018. Published by Elsevier Ltd.
Glycemic index and microstructure analysis of a newly developed fiber enriched cookie.

PubMed

Schuchardt, Jan Philipp; Wonik, Jasmin; Bindrich, Ute; Heinemann, Michaela; Kohrs, Heike; Schneider, Inga; Möller, Katharina; Hahn, Andreas

2016-01-01

A diet with a high glycemic index (GI) is associated with an elevated risk for obesity or type 2 diabetes. We investigated the GI of a newly-developed fiber enriched cookie and characterized the microstructure of ingredients used. In a study with 26 non-diabetic healthy volunteers it was shown that the fiber enriched cookie has a GI of 58.9 in relation to white bread as reference. Using a conversion factor of 1.4, the GI of the fiber enriched cookie in relation to a glucose-solution is 42.0 and can be classified as a low-GI food. Postprandial insulin concentration was significantly lower after consumption of fiber enriched cookies compared to white bread. Glucose release after in vitro digestion was significantly lower from fiber enriched cookies compared to other cookies tested. In addition to its high percentage of fiber, the cookies' low GI can be attributed to the limited gelatinization potential of the starch granules found in the ingredients used. Using confocal laser scanning microscopy it is shown that starch granule surface area of whole grain barley flour, spelt flour and oat flakes bears cluster-shaped protein-NSPS complexes that preferentially absorb water in conditions of water shortage and thereby prevent starch gelatinization.
Globular Cluster Formation at High Density: A Model for Elemental Enrichment with Fast Recycling of Massive-star Debris

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elmegreen, Bruce G., E-mail: bge@us.ibm.com

The self-enrichment of massive star clusters by p -processed elements is shown to increase significantly with increasing gas density as a result of enhanced star formation rates and stellar scatterings compared to the lifetime of a massive star. Considering the type of cloud core where a globular cluster (GC) might have formed, we follow the evolution and enrichment of the gas and the time dependence of stellar mass. A key assumption is that interactions between massive stars are important at high density, including interactions between massive stars and massive-star binaries that can shred stellar envelopes. Massive-star interactions should also scattermore » low-mass stars out of the cluster. Reasonable agreement with the observations is obtained for a cloud-core mass of ∼4 × 10{sup 6} M {sub ⊙} and a density of ∼2 × 10{sup 6} cm{sup −3}. The results depend primarily on a few dimensionless parameters, including, most importantly, the ratio of the gas consumption time to the lifetime of a massive star, which has to be low, ∼10%, and the efficiency of scattering low-mass stars per unit dynamical time, which has to be relatively large, such as a few percent. Also for these conditions, the velocity dispersions of embedded GCs should be comparable to the high gas dispersions of galaxies at that time, so that stellar ejection by multistar interactions could cause low-mass stars to leave a dwarf galaxy host altogether. This could solve the problem of missing first-generation stars in the halos of Fornax and WLM.« less
Cluster analysis in phenotyping a Portuguese population.

PubMed

Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

2015-09-03

Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.
Data depth based clustering analysis

DOE PAGES

Jeong, Myeong -Hun; Cai, Yaping; Sullivan, Clair J.; ...

2016-01-01

Here, this paper proposes a new algorithm for identifying patterns within data, based on data depth. Such a clustering analysis has an enormous potential to discover previously unknown insights from existing data sets. Many clustering algorithms already exist for this purpose. However, most algorithms are not affine invariant. Therefore, they must operate with different parameters after the data sets are rotated, scaled, or translated. Further, most clustering algorithms, based on Euclidean distance, can be sensitive to noises because they have no global perspective. Parameter selection also significantly affects the clustering results of each algorithm. Unlike many existing clustering algorithms, themore » proposed algorithm, called data depth based clustering analysis (DBCA), is able to detect coherent clusters after the data sets are affine transformed without changing a parameter. It is also robust to noises because using data depth can measure centrality and outlyingness of the underlying data. Further, it can generate relatively stable clusters by varying the parameter. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed algorithm outperforms DBSCAN and HDBSCAN in terms of affine invariance, and exceeds or matches the ro-bustness to noises of DBSCAN or HDBSCAN. The robust-ness to parameter selection is also demonstrated through the case study of clustering twitter data.« less
WordCluster: detecting clusters of DNA words and genomic elements

PubMed Central

2011-01-01

Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981
Exosome enrichment of human serum using multiple cycles of centrifugation.

PubMed

Kim, Jeongkwon; Tan, Zhijing; Lubman, David M

2015-09-01

In this work, we compared the use of repeated cycles of centrifugation at conventional speeds for enrichment of exosomes from human serum compared to the use of ultracentrifugation (UC). After removal of cells and cell debris, a speed of 110 000 × g or 40 000 × g was used for the UC or centrifugation enrichment process, respectively. The enriched exosomes were analyzed using the bicinchoninic acid assay, 1D gel separation, transmission electron microscopy, Western blotting, and high-resolution LC-MS/MS analysis. It was found that a five-cycle repetition of UC or centrifugation is necessary for successful removal of nonexosomal proteins in the enrichment of exosomes from human serum. More significantly, 5× centrifugation enrichment was found to provide similar or better performance than 5× UC enrichment in terms of enriched exosome protein amount, Western blot band intensity for detection of CD-63, and numbers of identified exosome-related proteins and cluster of differentiation (CD) proteins. A total of 478 proteins were identified in the LC-MS/MS analyses of exosome proteins obtained from 5× UCs and 5× centrifugations including many important CD membrane proteins. The presence of previously reported exosome-related proteins including key exosome protein markers demonstrates the utility of this method for analysis of proteins in human serum. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The PhytoClust tool for metabolic gene clusters discovery in plant genomes

PubMed Central

Fuchs, Lisa-Maria

2017-01-01

Abstract The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. PMID:28486689
Tissue-enriched expression profiles in Aedes aegypti identify hemocyte-specific transcriptome responses to infection

PubMed Central

Choi, Young-Jun; Fuchs, Jeremy F.; Mayhew, George F.; Yu, Helen E.; Christensen, Bruce M.

2012-01-01

Hemocytes are integral components of mosquito immune mechanisms such as phagocytosis, melanization, and production of antimicrobial peptides. However, our understanding of hemocyte-specific molecular processes and their contribution to shaping the host immune response remains limited. To better understand the immunophysiological features distinctive of hemocytes, we conducted genome-wide analysis of hemocyte-enriched transcripts, and examined how tissue-enriched expression patterns change with the immune status of the host. Our microarray data indicate that the hemocyte-enriched trascriptome is dynamic and context-dependent. Analysis of transcripts enriched after bacterial challenge in circulating hemocytes with respect to carcass added a dimension to evaluating infection-responsive genes and immune-related gene families. We resolved patterns of transcriptional change unique to hemocytes from those that are likely shared by other immune responsive tissues, and identified clusters of genes preferentially induced in hemocytes, likely reflecting their involvement in cell type specific functions. In addition, the study revealed conserved hemocyte-enriched molecular repertoires which might be implicated in core hemocyte function by cross-species meta-analysis of microarray expression data from Anopheles gambiae and Drosophila melanogaster. PMID:22796331
On 7Li Enrichment by Low-Mass Metal-Poor Red Giant Branch Stars.

PubMed

de La Reza R; da Silva L; Drake; Terra

2000-06-01

First-ascent red giants with strong and very strong Li lines have just been discovered in globular clusters. Using the stellar internal prompt (7)Li enrichment-mass-loss scenario, we explore the possibility of (7)Li enrichment in the interstellar matter of the globular cluster M3 produced by these Li-rich giants. We found that enrichment as large as 70% or more compared to the initial (7)Li content of M3 can be obtained during the entire life of this cluster. However, because M3 will cross into the Galactic plane several times, the new (7)Li will be very probably removed by ram pressure into the disk. Globular clusters appear then as possible new sources of (7)Li in the Galactic disk. It is also suggested that the known Na/Al variations in stars of globular clusters could be somehow related to the (7)Li variations and that the cool bottom process mixing mechanism acting in the case of (7)Li could also play a role in the case of Na and Al surface enrichments.
Using Cluster Analysis to Examine Husband-Wife Decision Making

ERIC Educational Resources Information Center

Bonds-Raacke, Jennifer M.

2006-01-01

Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
DICON: interactive visual analysis of multidimensional clusters.

PubMed

Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin

2011-12-01

Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
The PhytoClust tool for metabolic gene clusters discovery in plant genomes.

PubMed

Töpfer, Nadine; Fuchs, Lisa-Maria; Aharoni, Asaph

2017-07-07

The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

PubMed

Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

2014-11-01

Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

PubMed

Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

2017-08-31

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool

PubMed Central

2013-01-01

Background System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Results Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Conclusions Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr. PMID:23586463
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.

PubMed

Chen, Edward Y; Tan, Christopher M; Kou, Yan; Duan, Qiaonan; Wang, Zichen; Meirelles, Gabriela Vaz; Clark, Neil R; Ma'ayan, Avi

2013-04-15

System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement. Here, we present Enrichr, an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3). The software can also be embedded into any tool that performs gene list analysis. We applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signatures to the enrichment signatures of matched normal tissues. We observed a common pattern of up regulation of the polycomb group PRC2 and enrichment for the histone mark H3K27me3 in many cancer cell lines, as well as alterations in Toll-like receptor and interlukin signaling in K562 cells when compared with normal myeloid CD33+ cells. Such analyses provide global visualization of critical differences between normal tissues and cancer cell lines but can be applied to many other scenarios. Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

PubMed

Zheng, Qi; Wang, Xiu-Jie

2008-07-01

Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
Variable Screening for Cluster Analysis.

ERIC Educational Resources Information Center

Donoghue, John R.

Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…
Somatotyping using 3D anthropometry: a cluster analysis.

PubMed

Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

2013-01-01

Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
Pathway enrichment analysis approach based on topological structure and updated annotation of pathway.

PubMed

Yang, Qian; Wang, Shuyuan; Dai, Enyu; Zhou, Shunheng; Liu, Dianming; Liu, Haizhou; Meng, Qianqian; Jiang, Bin; Jiang, Wei

2017-08-16

Pathway enrichment analysis has been widely used to identify cancer risk pathways, and contributes to elucidating the mechanism of tumorigenesis. However, most of the existing approaches use the outdated pathway information and neglect the complex gene interactions in pathway. Here, we first reviewed the existing widely used pathway enrichment analysis approaches briefly, and then, we proposed a novel topology-based pathway enrichment analysis (TPEA) method, which integrated topological properties and global upstream/downstream positions of genes in pathways. We compared TPEA with four widely used pathway enrichment analysis tools, including database for annotation, visualization and integrated discovery (DAVID), gene set enrichment analysis (GSEA), centrality-based pathway enrichment (CePa) and signaling pathway impact analysis (SPIA), through analyzing six gene expression profiles of three tumor types (colorectal cancer, thyroid cancer and endometrial cancer). As a result, we identified several well-known cancer risk pathways that could not be obtained by the existing tools, and the results of TPEA were more stable than that of the other tools in analyzing different data sets of the same cancer. Ultimately, we developed an R package to implement TPEA, which could online update KEGG pathway information and is available at the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/TPEA/. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The metallicity of the intracluster medium over cosmic time: further evidence for early enrichment

NASA Astrophysics Data System (ADS)

Mantz, Adam B.; Allen, Steven W.; Morris, R. Glenn; Simionescu, Aurora; Urban, Ondrej; Werner, Norbert; Zhuravleva, Irina

2017-12-01

We use Chandra X-ray data to measure the metallicity of the intracluster medium (ICM) in 245 massive galaxy clusters selected from X-ray and Sunyaev-Zel'dovich (SZ) effect surveys, spanning redshifts 0 < z < 1.2. Metallicities were measured in three different radial ranges, spanning cluster cores through their outskirts. We explore trends in these measurements as a function of cluster redshift, temperature and surface brightness 'peakiness' (a proxy for gas cooling efficiency in cluster centres). The data at large radii (0.5-1 r500) are consistent with a constant metallicity, while at intermediate radii (0.1-0.5 r500) we see a late-time increase in enrichment, consistent with the expected production and mixing of metals in cluster cores. In cluster centres, there are strong trends of metallicity with temperature and peakiness, reflecting enhanced metal production in the lowest entropy gas. Within the cool-core/sharply peaked cluster population, there is a large intrinsic scatter in central metallicity and no overall evolution, indicating significant astrophysical variations in the efficiency of enrichment. The central metallicity in clusters with flat surface brightness profiles is lower, with a smaller intrinsic scatter, but increases towards lower redshifts. Our results are consistent with other recent measurements of ICM metallicity as a function of redshift. They reinforce the picture implied by observations of uniform metal distributions in the outskirts of nearby clusters, in which most of the enrichment of the ICM takes place before cluster formation, with significant later enrichment taking place only in cluster centres, as the stellar populations of the central galaxies evolve.
Mixed-Initiative Clustering

ERIC Educational Resources Information Center

Huang, Yifen

2010-01-01

Mixed-initiative clustering is a task where a user and a machine work collaboratively to analyze a large set of documents. We hypothesize that a user and a machine can both learn better clustering models through enriched communication and interactive learning from each other. The first contribution or this thesis is providing a framework of…
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

PubMed Central

Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

2017-01-01

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Magnesium isotopes: a tool to understand self-enrichment in globular clusters

NASA Astrophysics Data System (ADS)

Ventura, P.; D'Antona, F.; Imbriani, G.; Di Criscienzo, M.; Dell'Agli, F.; Tailo, M.

2018-06-01

A critical issue in the asymptotic giant branch (AGB) self-enrichment scenario for the formation of multiple populations in globular clusters (GCs) is the inability to reproduce the magnesium isotopic ratios, despite the model in principle can account for the depletion of magnesium. In this work, we analyse how the uncertainties on the various p-capture cross sections affect the results related to the magnesium content of the ejecta of AGB stars. The observed distribution of the magnesium isotopes and of the overall Mg-Al trend in M13 and NGC 6752 are successfully reproduced when the proton-capture rate by 25Mg at the temperatures ˜100 MK, in particular the 25Mg(p, γ)26Alm channel, is enhanced by a factor ˜3 with respect to the most recent experimental determinations. This assumption also allows us to reproduce the full extent of the Mg spread and the Mg-Si anticorrelation observed in NGC 2419. The uncertainties in the rate of the 25Mg(p, γ)26Alm reaction at the temperatures of interest here leave space for our assumption and we suggest that new experimental measurements are needed to settle this problem. We also discuss the competitive model based on the supermassive star nucleosynthesis.
Quantitative mass spectrometric analysis of glycoproteins combined with enrichment methods.

PubMed

Ahn, Yeong Hee; Kim, Jin Young; Yoo, Jong Shin

2015-01-01

Mass spectrometry (MS) has been a core technology for high sensitive and high-throughput analysis of the enriched glycoproteome in aspects of quantitative assays as well as qualitative profiling of glycoproteins. Because it has been widely recognized that aberrant glycosylation in a glycoprotein may involve in progression of a certain disease, the development of efficient analysis tool for the aberrant glycoproteins is very important for deep understanding about pathological function of the glycoprotein and new biomarker development. This review first describes the protein glycosylation-targeting enrichment technologies mainly employing solid-phase extraction methods such as hydrizide-capturing, lectin-specific capturing, and affinity separation techniques based on porous graphitized carbon, hydrophilic interaction chromatography, or immobilized boronic acid. Second, MS-based quantitative analysis strategies coupled with the protein glycosylation-targeting enrichment technologies, by using a label-free MS, stable isotope-labeling, or targeted multiple reaction monitoring (MRM) MS, are summarized with recent published studies. © 2014 The Authors. Mass Spectrometry Reviews Published by Wiley Periodicals, Inc.
Formation, Heating And Chemical Enrichment Of The Intracluster Medium

NASA Astrophysics Data System (ADS)

Eckert, Dominique

2017-07-01

The intracluster medium (ICM) contains the majority of the baryons (80-90%) of galaxy clusters and groups. It has been progressively heated up by gravitational and non-gravitational processes since the cluster formation epoch (z 2-3) until it reaches the very high temperatures we see today, i.e. between 10 and 100 million degrees. The global properties of the ICM follow tight scaling laws with halo mass which are shaped both by gravitational and non-gravitational effects (in particular gas cooling and AGN feedback). Finally, we also know that the ICM is enriched in metals which have been ejected from cluster galaxies throughout the cluster formation history. I will give a review of what is currently known about the formation and evolution of the ICM, focusing on the heating processes (shocks, turbulence) and the metal enrichment history of the gas.
MicroRNA cluster miR-17-92 Cluster in Exosomes Enhance Neuroplasticity and Functional Recovery After Stroke in Rats.

PubMed

Xin, Hongqi; Katakowski, Mark; Wang, Fengjie; Qian, Jian-Yong; Liu, Xian Shuang; Ali, Meser M; Buller, Benjamin; Zhang, Zheng Gang; Chopp, Michael

2017-03-01

Multipotent mesenchymal stromal cell (MSC) harvested exosomes are hypothesized as the major paracrine effectors of MSCs. In vitro, the miR-17-92 cluster promotes oligodendrogenesis, neurogenesis, and axonal outgrowth. We, therefore, investigated whether the miR-17-92 cluster-enriched exosomes harvested from MSCs transfected with an miR-17-92 cluster plasmid enhance neurological recovery compared with control MSC-derived exosomes. Rats subjected to 2 hours of transient middle cerebral artery occlusion were intravenously administered miR-17-92 cluster-enriched exosomes, control MSC exosomes, or liposomes and were euthanized 28 days post-middle cerebral artery occlusion. Histochemistry, immunohistochemistry, and Golgi-Cox staining were used to assess dendritic, axonal, synaptic, and myelin remodeling. Expression of phosphatase and tensin homolog and activation of its downstream proteins, protein kinase B, mechanistic target of rapamycin, and glycogen synthase kinase 3β in the peri-infarct region were measured by means of Western blots. Compared with the liposome treatment, both exosome treatment groups exhibited significant improvement of functional recovery, but miR-17-92 cluster-enriched exosome treatment had significantly more robust effects on improvement of neurological function and enhancements of oligodendrogenesis, neurogenesis, and neurite remodeling/neuronal dendrite plasticity in the ischemic boundary zone (IBZ) than the control MSC exosome treatment. Moreover, miR-17-92 cluster-enriched exosome treatment substantially inhibited phosphatase and tensin homolog, a validated miR-17-92 cluster target gene, and subsequently increased the phosphorylation of phosphatase and tensin homolog downstream proteins, protein kinase B, mechanistic target of rapamycin, and glycogen synthase kinase 3β compared with control MSC exosome treatment. Our data suggest that treatment of stroke with tailored exosomes enriched with the miR-17-92 cluster increases neural plasticity
The uniformity and time-invariance of the intra-cluster metal distribution in galaxy clusters from the IllustrisTNG simulations

NASA Astrophysics Data System (ADS)

Vogelsberger, Mark; Marinacci, Federico; Torrey, Paul; Genel, Shy; Springel, Volker; Weinberger, Rainer; Pakmor, Rüdiger; Hernquist, Lars; Naiman, Jill; Pillepich, Annalisa; Nelson, Dylan

2018-02-01

The distribution of metals in the intra-cluster medium (ICM) encodes important information about the enrichment history and formation of galaxy clusters. Here, we explore the metal content of clusters in IllustrisTNG - a new suite of galaxy formation simulations building on the Illustris project. Our cluster sample contains 20 objects in TNG100 - a ˜(100 Mpc)3 volume simulation with 2 × 18203 resolution elements, and 370 objects in TNG300 - a ˜(300 Mpc)3 volume simulation with 2 × 25003 resolution elements. The z = 0 metallicity profiles agree with observations, and the enrichment history is consistent with observational data going beyond z ˜ 1, showing nearly no metallicity evolution. The abundance profiles vary only minimally within the cluster samples, especially in the outskirts with a relative scatter of ˜ 15 per cent. The average metallicity profile flattens towards the centre, where we find a logarithmic slope of -0.1 compared to -0.5 in the outskirts. Cool core clusters have more centrally peaked metallicity profiles (˜0.8 solar) compared to non-cool core systems (˜0.5 solar), similar to observational trends. Si/Fe and O/Fe radial profiles follow positive gradients. The outer abundance profiles do not evolve below z ˜ 2, whereas the inner profiles flatten towards z = 0. More than ˜ 80 per cent of the metals in the ICM have been accreted from the proto-cluster environment, which has been enriched to ˜0.1 solar already at z ˜ 2. We conclude that the intra-cluster metal distribution is uniform among our cluster sample, nearly time-invariant in the outskirts for more than 10 Gyr, and forms through a universal enrichment history.
Trends of metals enrichment in deposited particulate matter at semi-arid area of Iran.

PubMed

Fouladi Fard, Reza; Naddafi, Kazem; Hassanvand, Mohammad Sadegh; Khazaei, Mohammad; Rahmani, Farah

2018-04-30

The presence and enrichment of heavy metals in dust depositions have been recognized as an emerging environmental health issues in the urban and industrial areas. In this study, the deposition of some metals was found in Qom, a city located in a semi-desert area in Iran that is surrounded by industrial areas. Dust deposition samples were collected using five sampling stations during a year. Dust samples were digested applying acidic condition and then, the metal content was analyzed using inductively coupled plasma technology (ICP-OES). Comparative results showed the following order, from the maximum to the minimum concentration (mg/kg dust) of elements: Ca > Al > Fe > Mg > Ti > Si > K > B > Sr > Mn > P > Ba > Cr > Zn > Ni > Sn > Pb > V > Na > Cu > Co > U > Li > Ce > Ag. The differences among the average concentrations of metals in the five stations were not significant (p value > 0.05). The average concentration of some metals increased significantly during cold seasons. In this study, the cluster analysis (CA) and princicipal component analysis (PCA) were applied, and relationships among some elements in different clusters were found. In addition, the geo-accumulation and enrichment analysis revealed that the following metals had been enriched more than the average values: boron, silver, tin, uranium, lead, zinc, cobalt, chromium, lithium, nickel, strontium, and coper. The presence of thermal power plant, pesticide manufacturing plants, publishing centers, traffic jam, and some industrial areas around the city has resulted in the enrichment of some metals (particularly in cold seasons with atmospheric stable conditions) in dust deposition.
From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

NASA Astrophysics Data System (ADS)

Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

2018-03-01

In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.
Anthropogenic Enrichment of Heavy Metals in Urban Dust and Possible Corresponding Sources

NASA Astrophysics Data System (ADS)

van Laaten, Neele; Merten, Dirk; Pirrung, Michael

2017-04-01

Atmospheric dust (particulate matter, PM) is regarded as a crucial factor for human health and a major environmental problem in densely populated areas. Due to anthropogenic processes like traffic, waste incineration and industry increased amounts of PM can be detected in those areas. To reduce the amounts detailed knowledge on both the composition of PM and the source contribution in a target area is needed. The latter has, to our knowledge, rarely been regarded in central Europe. Within this study, spider webs from various locations in the city of Jena (Germany), that act as natural trappers of PM, were analyzed for the contents of 27 trace elements using aqua regia digestion followed by ICP-OES and ICP-MS determinations. Aerosol-crust enrichment factors were calculated for selected elements and both a cluster analysis and a factor analysis were executed to identify sources of PM. High values for the enrichment factors clearly show an anthropogenic influence. In addition, the cluster analysis leads to a grouping of the sampling points mainly depending on the kind and volume of traffic at the corresponding locations. Five different possible sources of PM can be found by the factor analysis: Soil erosion (41% of variance), abrasion of rails (16%), tyre and break wear (16%), charcoal combustion (8%) and oil combustion (7%).
A tripartite clustering analysis on microRNA, gene and disease model.

PubMed

Shen, Chengcheng; Liu, Ying

2012-02-01

Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.

The metallicity of the intracluster medium over cosmic time: further evidence for early enrichment

DOE PAGES

Mantz, Adam B.; Allen, Steven W.; Morris, R. Glenn; ...

2017-08-26

Here, we use Chandra X-ray data to measure the metallicity of the intracluster medium (ICM) in 245 massive galaxy clusters selected from X-ray and Sunyaev–Zel'dovich (SZ) effect surveys, spanning redshifts 0 < z < 1.2. Metallicities were measured in three different radial ranges, spanning cluster cores through their outskirts. We explore trends in these measurements as a function of cluster redshift, temperature and surface brightness ‘peakiness’ (a proxy for gas cooling efficiency in cluster centres). The data at large radii (0.5–1 r500) are consistent with a constant metallicity, while at intermediate radii (0.1–0.5 r500) we see a late-time increase inmore » enrichment, consistent with the expected production and mixing of metals in cluster cores. In cluster centres, there are strong trends of metallicity with temperature and peakiness, reflecting enhanced metal production in the lowest entropy gas. Within the cool-core/sharply peaked cluster population, there is a large intrinsic scatter in central metallicity and no overall evolution, indicating significant astrophysical variations in the efficiency of enrichment. The central metallicity in clusters with flat surface brightness profiles is lower, with a smaller intrinsic scatter, but increases towards lower redshifts. Our results are consistent with other recent measurements of ICM metallicity as a function of redshift. They reinforce the picture implied by observations of uniform metal distributions in the outskirts of nearby clusters, in which most of the enrichment of the ICM takes place before cluster formation, with significant later enrichment taking place only in cluster centres, as the stellar populations of the central galaxies evolve.« less
Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

ERIC Educational Resources Information Center

DiStefano, Christine; Kamphaus, R. W.

2006-01-01

Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

PubMed

Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A

2011-07-01

Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach
Identification and functional analysis of endothelial tip cell-enriched genes.

PubMed

del Toro, Raquel; Prahst, Claudia; Mathivet, Thomas; Siegfried, Geraldine; Kaminker, Joshua S; Larrivee, Bruno; Breant, Christiane; Duarte, Antonio; Takakura, Nobuyuki; Fukamizu, Akiyoshi; Penninger, Josef; Eichmann, Anne

2010-11-11

Sprouting of developing blood vessels is mediated by specialized motile endothelial cells localized at the tips of growing capillaries. Following behind the tip cells, endothelial stalk cells form the capillary lumen and proliferate. Expression of the Notch ligand Delta-like-4 (Dll4) in tip cells suppresses tip cell fate in neighboring stalk cells via Notch signaling. In DLL4(+/-) mouse mutants, most retinal endothelial cells display morphologic features of tip cells. We hypothesized that these mouse mutants could be used to isolate tip cells and so to determine their genetic repertoire. Using transcriptome analysis of retinal endothelial cells isolated from DLL4(+/-) and wild-type mice, we identified 3 clusters of tip cell-enriched genes, encoding extracellular matrix degrading enzymes, basement membrane components, and secreted molecules. Secreted molecules endothelial-specific molecule 1, angiopoietin 2, and apelin bind to cognate receptors on endothelial stalk cells. Knockout mice and zebrafish morpholino knockdown of apelin showed delayed angiogenesis and reduced proliferation of stalk cells expressing the apelin receptor APJ. Thus, tip cells may regulate angiogenesis via matrix remodeling, production of basement membrane, and release of secreted molecules, some of which regulate stalk cell behavior.
A hybrid monkey search algorithm for clustering analysis.

PubMed

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Applications of cluster analysis to satellite soundings

NASA Technical Reports Server (NTRS)

Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.

1984-01-01

The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.
Meta-analysis of pathway enrichment: combining independent and dependent omics data sets.

PubMed

Kaever, Alexander; Landesfeind, Manuel; Feussner, Kirstin; Morgenstern, Burkhard; Feussner, Ivo; Meinicke, Peter

2014-01-01

A major challenge in current systems biology is the combination and integrative analysis of large data sets obtained from different high-throughput omics platforms, such as mass spectrometry based Metabolomics and Proteomics or DNA microarray or RNA-seq-based Transcriptomics. Especially in the case of non-targeted Metabolomics experiments, where it is often impossible to unambiguously map ion features from mass spectrometry analysis to metabolites, the integration of more reliable omics technologies is highly desirable. A popular method for the knowledge-based interpretation of single data sets is the (Gene) Set Enrichment Analysis. In order to combine the results from different analyses, we introduce a methodical framework for the meta-analysis of p-values obtained from Pathway Enrichment Analysis (Set Enrichment Analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. For dependent data sets, e.g. obtained from the same biological samples, the framework utilizes a covariance estimation procedure based on the nonsignificant pathways in single data set enrichment analysis. The framework is evaluated and applied in the joint analysis of Metabolomics mass spectrometry and Transcriptomics DNA microarray data in the context of plant wounding. In extensive studies of simulated data set dependence, the introduced correlation could be fully reconstructed by means of the covariance estimation based on pathway enrichment. By restricting the range of p-values of pathways considered in the estimation, the overestimation of correlation, which is introduced by the significant pathways, could be reduced. When applying the proposed methods to the real data sets, the meta-analysis was shown not only to be a powerful tool to investigate the correlation between different data sets and summarize the results of multiple analyses but also to distinguish experiment-specific key pathways.
QUANTITATIVE MASS SPECTROMETRIC ANALYSIS OF GLYCOPROTEINS COMBINED WITH ENRICHMENT METHODS

PubMed Central

Ahn, Yeong Hee; Kim, Jin Young; Yoo, Jong Shin

2015-01-01

Mass spectrometry (MS) has been a core technology for high sensitive and high-throughput analysis of the enriched glycoproteome in aspects of quantitative assays as well as qualitative profiling of glycoproteins. Because it has been widely recognized that aberrant glycosylation in a glycoprotein may involve in progression of a certain disease, the development of efficient analysis tool for the aberrant glycoproteins is very important for deep understanding about pathological function of the glycoprotein and new biomarker development. This review first describes the protein glycosylation-targeting enrichment technologies mainly employing solid-phase extraction methods such as hydrizide-capturing, lectin-specific capturing, and affinity separation techniques based on porous graphitized carbon, hydrophilic interaction chromatography, or immobilized boronic acid. Second, MS-based quantitative analysis strategies coupled with the protein glycosylation-targeting enrichment technologies, by using a label-free MS, stable isotope-labeling, or targeted multiple reaction monitoring (MRM) MS, are summarized with recent published studies. © 2014 The Authors. Mass Spectrometry Reviews Published by Wiley Periodicals, Inc. Rapid Commun. Mass Spec Rev 34:148–165, 2015. PMID:24889823
Cluster analysis of multiple planetary flow regimes

NASA Technical Reports Server (NTRS)

Mo, Kingtse; Ghil, Michael

1987-01-01

A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.
Anaerobic degradation of propane and butane by sulfate-reducing bacteria enriched from marine hydrocarbon cold seeps.

PubMed

Jaekel, Ulrike; Musat, Niculina; Adam, Birgit; Kuypers, Marcel; Grundmann, Olav; Musat, Florin

2013-05-01

The short-chain, non-methane hydrocarbons propane and butane can contribute significantly to the carbon and sulfur cycles in marine environments affected by oil or natural gas seepage. In the present study, we enriched and identified novel propane and butane-degrading sulfate reducers from marine oil and gas cold seeps in the Gulf of Mexico and Hydrate Ridge. The enrichment cultures obtained were able to degrade simultaneously propane and butane, but not other gaseous alkanes. They were cold-adapted, showing highest sulfate-reduction rates between 16 and 20 °C. Analysis of 16S rRNA gene libraries, followed by whole-cell hybridizations with sequence-specific oligonucleotide probes showed that each enrichment culture was dominated by a unique phylotype affiliated with the Desulfosarcina-Desulfococcus cluster within the Deltaproteobacteria. These phylotypes formed a distinct phylogenetic cluster of propane and butane degraders, including sequences from environments associated with hydrocarbon seeps. Incubations with (13)C-labeled substrates, hybridizations with sequence-specific probes and nanoSIMS analyses showed that cells of the dominant phylotypes were the first to become enriched in (13)C, demonstrating that they were directly involved in hydrocarbon degradation. Furthermore, using the nanoSIMS data, carbon assimilation rates were calculated for the dominant cells in each enrichment culture.
Cluster analysis of multiple planetary flow regimes

NASA Technical Reports Server (NTRS)

Mo, Kingtse; Ghil, Michael

1988-01-01

A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.
[Visual field progression in glaucoma: cluster analysis].

PubMed

Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

2012-11-01

Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

PubMed

van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

2017-01-01

In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Effective Enrichment and Mass Spectrometry Analysis of Phosphopeptides Using Mesoporous Metal Oxide Nanomaterials

PubMed Central

Nelson, Cory A.; Szczech, Jeannine R.; Dooley, Chad J.; Xu, Qingge; Lawrence, Matthew J.; Zhu, Haoyue; Jin, Song; Ge, Ying

2010-01-01

Mass spectrometry (MS)-based phosphoproteomics remains challenging due to the low abundance of phosphoproteins and substoichiometric phosphorylation. This demands better methods to effectively enrich phosphoproteins/peptides prior to MS analysis. We have previously communicated the first use of mesoporous zirconium oxide (ZrO2) nanomaterials for effective phosphopeptide enrichment. Here we present the full report including the synthesis, characterization, and application of mesoporous titanium dioxide (TiO2), ZrO2, and hafnium oxide (HfO2) in phosphopeptide enrichment and MS analysis. Mesoporous ZrO2 and HfO2 are demonstrated to be superior to TiO2 for phosphopeptide enrichment from a complex mixture with high specificity (>99%), which could almost be considered as “a purification”, mainly because of the extremely large active surface area of mesoporous nanomaterials. A single enrichment and Fourier transform MS analysis of phosphopeptides digested from a complex mixture containing 7% of α-casein identified 21 out of 22 phosphorylation sites for α-casein. Moreover, the mesoporous ZrO2 and HfO2 can be reused after a simple solution regeneration procedure with comparable enrichment performance to that of fresh materials. Mesoporous ZrO2 and HfO2 nanomaterials hold great promise for applications in MS-based phosphoproteomics. PMID:20704311
A cluster-based strategy for assessing the overlap between large chemical libraries and its application to a recent acquisition.

PubMed

Engels, Michael F M; Gibbs, Alan C; Jaeger, Edward P; Verbinnen, Danny; Lobanov, Victor S; Agrafiotis, Dimitris K

2006-01-01

We report on the structural comparison of the corporate collections of Johnson & Johnson Pharmaceutical Research & Development (JNJPRD) and 3-Dimensional Pharmaceuticals (3DP), performed in the context of the recent acquisition of 3DP by JNJPRD. The main objective of the study was to assess the druglikeness of the 3DP library and the extent to which it enriched the chemical diversity of the JNJPRD corporate collection. The two databases, at the time of acquisition, collectively contained more than 1.1 million compounds with a clearly defined structural description. The analysis was based on a clustering approach and aimed at providing an intuitive quantitative estimate and visual representation of this enrichment. A novel hierarchical clustering algorithm called divisive k-means was employed in combination with Kelley's cluster-level selection method to partition the combined data set into clusters, and the diversity contribution of each library was evaluated as a function of the relative occupancy of these clusters. Typical 3DP chemotypes enriching the diversity of the JNJPRD collection were catalogued and visualized using a modified maximum common substructure algorithm. The joint collection of JNJPRD and 3DP compounds was also compared to other databases of known medicinally active or druglike compounds. The potential of the methodology for the analysis of very large chemical databases is discussed.
A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

PubMed Central

Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.

2017-01-01

ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158
Separate enrichment analysis of pathways for up- and downregulated genes.

PubMed

Hong, Guini; Zhang, Wenjing; Li, Hongdong; Shen, Xiaopei; Guo, Zheng

2014-03-06

Two strategies are often adopted for enrichment analysis of pathways: the analysis of all differentially expressed (DE) genes together or the analysis of up- and downregulated genes separately. However, few studies have examined the rationales of these enrichment analysis strategies. Using both microarray and RNA-seq data, we show that gene pairs with functional links in pathways tended to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. We then show that the imbalance could greatly reduce the statistical power for finding disease-associated pathways through the analysis of all-DE genes. Further, using gene expression profiles from five types of tumours, we illustrate that the separate analysis of up- and downregulated genes could identify more pathways that are really pertinent to phenotypic difference. In conclusion, analysing up- and downregulated genes separately is more powerful than analysing all of the DE genes together.
Nursing home care quality: a cluster analysis.

PubMed

Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

2017-02-13

Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.
LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis.

PubMed

Nagraj, V P; Magee, Neal E; Sheffield, Nathan C

2018-06-06

The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, new genomic data can be easily connected to annotations from external data sources. Here, we present an interactive interface for enrichment analysis of genomic locus overlaps using a web server called LOLAweb. LOLAweb accepts a set of genomic ranges from the user and tests it for enrichment against a database of region sets. LOLAweb renders results in an R Shiny application to provide interactive visualization features, enabling users to filter, sort, and explore enrichment results dynamically. LOLAweb is built and deployed in a Linux container, making it scalable to many concurrent users on our servers and also enabling users to download and run LOLAweb locally.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

PubMed Central

Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large

Involvement of glycosphingolipid-enriched lipid rafts in inflammatory responses.

PubMed

Iwabuchi, Kazuhisa

2015-01-01

Glycosphingolipids (GSLs) are membrane components consisting of hydrophobic ceramide and hydrophilic sugar moieties. GSLs cluster with cholesterol in cell membranes to form GSL-enriched lipid rafts. Biochemical analyses have demonstrated that GSL-enriched lipid rafts contain several kinds of transducer molecules, including Src family kinases. Among the GSLs, lactosylceramide (LacCer, CDw17) can bind to various microorganisms, is highly expressed on the plasma membranes of human phagocytes, and forms lipid rafts containing the Src family tyrosine kinase Lyn. LacCer-enriched lipid rafts mediate immunological and inflammatory reactions, including superoxide generation, chemotaxis, and non-opsonic phagocytosis. Therefore, LacCer-enriched membrane microdomains are thought to function as pattern recognition receptors (PRRs), which recognize pathogen-associated molecular patterns (PAMPs) expressed on microorganisms. LacCer also serves as a signal transduction molecule for functions mediated by CD11b/CD18-integrin (αM/β2-integrin, CR3, Mac-1), as well as being associated with several key cellular processes. LacCer recruits PCKα/ε and phospholipase A2 to stimulate PECAM-1 expression in human monocytes and their adhesion to endothelial cells, as well as regulating β1-integrin clustering and endocytosis on cell surfaces. This review describes the organizational and inflammation-related functions of LacCer-enriched lipid rafts.
MAVTgsa: An R Package for Gene Set (Enrichment) Analysis

DOE PAGES

Chien, Chih-Yi; Chang, Ching-Wei; Tsai, Chen-An; ...

2014-01-01

Gene semore » t analysis methods aim to determine whether an a priori defined set of genes shows statistically significant difference in expression on either categorical or continuous outcomes. Although many methods for gene set analysis have been proposed, a systematic analysis tool for identification of different types of gene set significance modules has not been developed previously. This work presents an R package, called MAVTgsa, which includes three different methods for integrated gene set enrichment analysis. (1) The one-sided OLS (ordinary least squares) test detects coordinated changes of genes in gene set in one direction, either up- or downregulation. (2) The two-sided MANOVA (multivariate analysis variance) detects changes both up- and downregulation for studying two or more experimental conditions. (3) A random forests-based procedure is to identify gene sets that can accurately predict samples from different experimental conditions or are associated with the continuous phenotypes. MAVTgsa computes the P values and FDR (false discovery rate) q -value for all gene sets in the study. Furthermore, MAVTgsa provides several visualization outputs to support and interpret the enrichment results. This package is available online.« less
The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

NASA Astrophysics Data System (ADS)

Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

2017-07-01

Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

PubMed Central

Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

2015-01-01

Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

ERIC Educational Resources Information Center

Hwang, Heungsun; Dillon, William R.

2010-01-01

A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…
The applicability and effectiveness of cluster analysis

NASA Technical Reports Server (NTRS)

Ingram, D. S.; Actkinson, A. L.

1973-01-01

An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Cross-scale analysis of cluster correspondence using different operational neighborhoods

NASA Astrophysics Data System (ADS)

Lu, Yongmei; Thill, Jean-Claude

2008-09-01

Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.
Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

PubMed

Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

2017-12-01

Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

PubMed

Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

2013-03-01

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

PubMed

Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

2016-01-01

Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Assessment of cluster yield components by image analysis.

PubMed

Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

2015-04-01

Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
Cluster-enriched Yang-Baxter equation from SUSY gauge theories

NASA Astrophysics Data System (ADS)

Yamazaki, Masahito

2018-04-01

We propose a new generalization of the Yang-Baxter equation, where the R-matrix depends on cluster y-variables in addition to the spectral parameters. We point out that we can construct solutions to this new equation from the recently found correspondence between Yang-Baxter equations and supersymmetric gauge theories. The S^2 partition function of a certain 2d N=(2,2) quiver gauge theory gives an R-matrix, whereas its FI parameters can be identified with the cluster y-variables.
Advanced analysis of forest fire clustering

NASA Astrophysics Data System (ADS)

Kanevski, Mikhail; Pereira, Mario; Golay, Jean

2017-04-01

Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index
Cluster and constraint analysis in tetrahedron packings

NASA Astrophysics Data System (ADS)

Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang

2015-04-01

The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.
Onsite Gaseous Centrifuge Enrichment Plant UF6 Cylinder Destructive Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anheier, Norman C.; Cannon, Bret D.; Qiao, Hong

2012-07-17

The IAEA safeguards approach for gaseous centrifuge enrichment plants (GCEPs) includes measurements of gross, partial, and bias defects in a statistical sampling plan. These safeguard methods consist principally of mass and enrichment nondestructive assay (NDA) verification. Destructive assay (DA) samples are collected from a limited number of cylinders for high precision offsite mass spectrometer analysis. DA is typically used to quantify bias defects in the GCEP material balance. Under current safeguards measures, the operator collects a DA sample from a sample tap following homogenization. The sample is collected in a small UF6 sample bottle, then sealed and shipped under IAEAmore » chain of custody to an offsite analytical laboratory. Current practice is expensive and resource intensive. We propose a new and novel approach for performing onsite gaseous UF6 DA analysis that provides rapid and accurate assessment of enrichment bias defects. DA samples are collected using a custom sampling device attached to a conventional sample tap. A few micrograms of gaseous UF6 is chemically adsorbed onto a sampling coupon in a matter of minutes. The collected DA sample is then analyzed onsite using Laser Ablation Absorption Ratio Spectrometry-Destructive Assay (LAARS-DA). DA results are determined in a matter of minutes at sufficient accuracy to support reliable bias defect conclusions, while greatly reducing DA sample volume, analysis time, and cost.« less
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

NASA Technical Reports Server (NTRS)

Wharton, S. W.; Turner, B. J.

1981-01-01

An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Clustering analysis strategies for electron energy loss spectroscopy (EELS).

PubMed

Torruella, Pau; Estrader, Marta; López-Ortega, Alberto; Baró, Maria Dolors; Varela, Maria; Peiró, Francesca; Estradé, Sònia

2018-02-01

In this work, the use of cluster analysis algorithms, widely applied in the field of big data, is proposed to explore and analyze electron energy loss spectroscopy (EELS) data sets. Three different data clustering approaches have been tested both with simulated and experimental data from Fe 3 O 4 /Mn 3 O 4 core/shell nanoparticles. The first method consists on applying data clustering directly to the acquired spectra. A second approach is to analyze spectral variance with principal component analysis (PCA) within a given data cluster. Lastly, data clustering on PCA score maps is discussed. The advantages and requirements of each approach are studied. Results demonstrate how clustering is able to recover compositional and oxidation state information from EELS data with minimal user input, giving great prospects for its usage in EEL spectroscopy. Copyright © 2017 Elsevier B.V. All rights reserved.
An effective fuzzy kernel clustering analysis approach for gene expression data.

PubMed

Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

2015-01-01

Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Atlas-Guided Cluster Analysis of Large Tractography Datasets

PubMed Central

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Atlas-guided cluster analysis of large tractography datasets.

PubMed

Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

2013-01-01

Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.

NRVS and EPR Spectroscopy of 57Fe-enriched [FeFe] Hydrogenase Indicate Stepwise Assembly of the H-cluster†

PubMed Central

Kuchenreuther, Jon M.; Guo, Yisong; Wang, Hongxin; Myers, William K.; George, Simon J.; Boyke, Christine A.; Yoda, Yoshitaka; Alp, E. Ercan; Zhao, Jiyong; Britt, R. David; Swartz, James R.; Cramer, Stephen P.

2013-01-01

The [FeFe] hydrogenase from Clostridium pasteurianum (CpI) harbors four Fe–S clusters that facilitate electron transfer to the H-cluster, a ligand-coordinated six-iron prosthetic group that catalyzes the redox interconversion of protons and H2. Here, we have used 57Fe nuclear resonance vibrational spectroscopy (NRVS) to study the iron centers in CpI, and we compare our data to that for a [4Fe–4S] ferredoxin as well as a model complex resembling the [2Fe]H catalytic domain of the H-cluster. In order to enrich the hydrogenase with 57Fe nuclei, we used cell-free methods to post-translationally mature the enzyme. Specifically, inactive CpI apoprotein with 56Fe-labeled Fe–S clusters was activated in vitro using 57Fe-enriched maturation proteins. This approach enabled us to selectively label the [2Fe]H subcluster with 57Fe, which NRVS confirms by detecting 57Fe–CO and 57Fe–CN normal modes from the H-cluster nonprotein ligands. The NRVS and iron quantification results also suggest that the hydrogenase contains a second 57Fe–S cluster. EPR spectroscopy indicates that this 57Fe-enriched metal center is not the [4Fe– 4S]H subcluster of the H-cluster. This finding demonstrates that the CpI hydrogenase retained an 56Fe-enriched [4Fe–4S]H cluster during in vitro maturation, providing unambiguous evidence for stepwise assembly of the H-cluster. In addition, this work represents the first NRVS characterization of [FeFe] hydrogenases. PMID:23249091
Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

ERIC Educational Resources Information Center

de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

2006-01-01

K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

PubMed

Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

2017-01-01

Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

PubMed

Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

2017-06-01

Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.
Uniform Contribution of Supernova Explosions to the Chemical Enrichment of Abell 3112 out to R{sub 200}

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ezer, Cemile; Ercan, E. Nihal; Bulbul, Esra

2017-02-10

The spatial distribution of the metals residing in the intra-cluster medium (ICM) of galaxy clusters records all the information on a cluster’s nucleosynthesis and chemical enrichment history. We present measurements from a total of 1.2 Ms Suzaku XIS and 72 ks Chandra observations of the cool-core galaxy cluster Abell 3112 out to its virial radius (∼1470 kpc). We find that the ratio of the observed supernova type Ia explosions to the total supernova explosions has a uniform distribution at a level of 12%–16% out to the cluster’s virial radius. The observed fraction of type Ia supernova explosions is in agreementmore » with the corresponding fraction found in our Galaxy and the chemical enrichment of our Galaxy. The non-varying supernova enrichment suggests that the ICM in cluster outskirts was enriched by metals at an early stage before the cluster itself was formed during a period of intense star formation activity. Additionally, we find that the 2D delayed detonation model CDDT produce significantly worse fits to the X-ray spectra compared to simple 1D W7 models. This is due to the relative overestimate of Si, and the underestimate of Mg in these models with respect to the measured abundances.« less
BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.

PubMed

Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

2015-02-21

Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.
Radiological analysis of plutonium glass batches with natural/enriched boron

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rainisch, R.

2000-06-22

The disposition of surplus plutonium inventories by the US Department of Energy (DOE) includes the immobilization of certain plutonium materials in a borosilicate glass matrix, also referred to as vitrification. This paper addresses source terms of plutonium masses immobilized in a borosilicate glass matrix where the glass components include both natural boron and enriched boron. The calculated source terms pertain to neutron and gamma source strength (particles per second), and source spectrum changes. The calculated source terms corresponding to natural boron and enriched boron are compared to determine the benefits (decrease in radiation source terms) for to the use ofmore » enriched boron. The analysis of plutonium glass source terms shows that a large component of the neutron source terms is due to (a, n) reactions. The Americium-241 and plutonium present in the glass emit alpha particles (a). These alpha particles interact with low-Z nuclides like B-11, B-10, and O-17 in the glass to produce neutrons. The low-Z nuclides are referred to as target particles. The reference glass contains 9.4 wt percent B{sub 2}O{sub 3}. Boron-11 was found to strongly support the (a, n) reactions in the glass matrix. B-11 has a natural abundance of over 80 percent. The (a, n) reaction rates for B-10 are lower than for B-11 and the analysis shows that the plutonium glass neutron source terms can be reduced by artificially enriching natural boron with B-10. The natural abundance of B-10 is 19.9 percent. Boron enriched to 96-wt percent B-10 or above can be obtained commercially. Since lower source terms imply lower dose rates to radiation workers handling the plutonium glass materials, it is important to know the achievable decrease in source terms as a result of boron enrichment. Plutonium materials are normally handled in glove boxes with shielded glass windows and the work entails both extremity and whole-body exposures. Lowering the source terms of the plutonium batches will
Poor Prognosis Indicated by Venous Circulating Tumor Cell Clusters in Early-Stage Lung Cancers.

PubMed

Murlidhar, Vasudha; Reddy, Rishindra M; Fouladdel, Shamileh; Zhao, Lili; Ishikawa, Martin K; Grabauskiene, Svetlana; Zhang, Zhuo; Lin, Jules; Chang, Andrew C; Carrott, Philip; Lynch, William R; Orringer, Mark B; Kumar-Sinha, Chandan; Palanisamy, Nallasivam; Beer, David G; Wicha, Max S; Ramnath, Nithya; Azizi, Ebrahim; Nagrath, Sunitha

2017-09-15

Early detection of metastasis can be aided by circulating tumor cells (CTC), which also show potential to predict early relapse. Because of the limited CTC numbers in peripheral blood in early stages, we investigated CTCs in pulmonary vein blood accessed during surgical resection of tumors. Pulmonary vein (PV) and peripheral vein (Pe) blood specimens from patients with lung cancer were drawn during the perioperative period and assessed for CTC burden using a microfluidic device. From 108 blood samples analyzed from 36 patients, PV had significantly higher number of CTCs compared with preoperative Pe ( P < 0.0001) and intraoperative Pe ( P < 0.001) blood. CTC clusters with large number of CTCs were observed in 50% of patients, with PV often revealing larger clusters. Long-term surveillance indicated that presence of clusters in preoperative Pe blood predicted a trend toward poor prognosis. Gene expression analysis by RT-qPCR revealed enrichment of p53 signaling and extracellular matrix involvement in PV and Pe samples. Ki67 expression was detected in 62.5% of PV samples and 59.2% of Pe samples, with the majority (72.7%) of patients positive for Ki67 expression in PV having single CTCs as opposed to clusters. Gene ontology analysis revealed enrichment of cell migration and immune-related pathways in CTC clusters, suggesting survival advantage of clusters in circulation. Clusters display characteristics of therapeutic resistance, indicating the aggressive nature of these cells. Thus, CTCs isolated from early stages of lung cancer are predictive of poor prognosis and can be interrogated to determine biomarkers predictive of recurrence. Cancer Res; 77(18); 5194-206. ©2017 AACR . ©2017 American Association for Cancer Research.
Dynamic transcriptomic analysis in hircine longissimus dorsi muscle from fetal to neonatal development stages.

PubMed

Zhan, Siyuan; Zhao, Wei; Song, Tianzeng; Dong, Yao; Guo, Jiazhong; Cao, Jiaxue; Zhong, Tao; Wang, Linjie; Li, Li; Zhang, Hongping

2018-01-01

Muscle growth and development from fetal to neonatal stages consist of a series of delicately regulated and orchestrated changes in expression of genes. In this study, we performed whole transcriptome profiling based on RNA-Seq of caprine longissimus dorsi muscle tissue obtained from prenatal stages (days 45, 60, and 105 of gestation) and neonatal stage (the 3-day-old newborn) to identify genes that are differentially expressed and investigate their temporal expression profiles. A total of 3276 differentially expressed genes (DEGs) were identified (Q value < 0.01). Time-series expression profile clustering analysis indicated that DEGs were significantly clustered into eight clusters which can be divided into two classes (Q value < 0.05), class I profiles with downregulated patterns and class II profiles with upregulated patterns. Based on cluster analysis, GO enrichment analysis found that 75, 25, and 8 terms to be significantly enriched in biological process (BP), cellular component (CC), and molecular function (MF) categories in class I profiles, while 35, 21, and 8 terms to be significantly enriched in BP, CC, and MF in class II profiles. KEGG pathway analysis revealed that DEGs from class I profiles were significantly enriched in 22 pathways and the most enriched pathway was Rap1 signaling pathway. DEGs from class II profiles were significantly enriched in 17 pathways and the mainly enriched pathway was AMPK signaling pathway. Finally, six selected DEGs from our sequencing results were confirmed by qPCR. Our study provides a comprehensive understanding of the molecular mechanisms during goat skeletal muscle development from fetal to neonatal stages and valuable information for future studies of muscle development in goats.
Radial metal abundance profiles in the intra-cluster medium of cool-core galaxy clusters, groups, and ellipticals

NASA Astrophysics Data System (ADS)

Mernier, F.; de Plaa, J.; Kaastra, J. S.; Zhang, Y.-Y.; Akamatsu, H.; Gu, L.; Kosec, P.; Mao, J.; Pinto, C.; Reiprich, T. H.; Sanders, J. S.; Simionescu, A.; Werner, N.

2017-07-01

The hot intra-cluster medium (ICM) permeating galaxy clusters and groups is not pristine, as it has been continuously enriched by metals synthesised in Type Ia (SNIa) and core-collapse (SNcc) supernovae since the major epoch of star formation (z ≃ 2-3). The cluster/group enrichment history and mechanisms responsible for releasing and mixing the metals can be probed via the radial distribution of SNIa and SNcc products within the ICM. In this paper, we use deep XMM-Newton/EPIC observations from a sample of 44 nearby cool-core galaxy clusters, groups, and ellipticals (CHEERS) to constrain the average radial O, Mg, Si, S, Ar, Ca, Fe, and Ni abundance profiles. The radial distributions of all these elements, averaged over a large sample for the first time, represent the best constrained profiles available currently. Specific attention is devoted to a proper modelling of the EPIC spectral components, and to other systematic uncertainties that may affect our results. We find an overall decrease of the Fe abundance with radius out to 0.9 r500 and 0.6 r500 for clusters and groups, respectively, in good agreement with predictions from the most recent hydrodynamical simulations. The average radial profiles of all the other elements (X) are also centrally peaked and, when rescaled to their average central X/Fe ratios, follow well the Fe profile out to at least 0.5 r500. As predicted by recent simulations, we find that the relative contribution of SNIa (SNcc) to the total ICM enrichment is consistent with being uniform at all radii, both for clusters and groups using two sets of SNIa and SNcc yield models that reproduce the X/Fe abundance pattern in the core well. In addition to implying that the central metal peak is balanced between SNIa and SNcc, our results suggest that the enriching SNIa and SNcc products must share the same origin and that the delay between the bulk of the SNIa and SNcc explosions must be shorter than the timescale necessary to diffuse out the metals
GEAR: genomic enrichment analysis of regional DNA copy number changes.

PubMed

Kim, Tae-Min; Jung, Yu-Chae; Rhyu, Mun-Gan; Jung, Myeong Ho; Chung, Yeun-Jun

2008-02-01

We developed an algorithm named GEAR (genomic enrichment analysis of regional DNA copy number changes) for functional interpretation of genome-wide DNA copy number changes identified by array-based comparative genomic hybridization. GEAR selects two types of chromosomal alterations with potential biological relevance, i.e. recurrent and phenotype-specific alterations. Then it performs functional enrichment analysis using a priori selected functional gene sets to identify primary and clinical genomic signatures. The genomic signatures identified by GEAR represent functionally coordinated genomic changes, which can provide clues on the underlying molecular mechanisms related to the phenotypes of interest. GEAR can help the identification of key molecular functions that are activated or repressed in the tumor genomes leading to the improved understanding on the tumor biology. GEAR software is available with online manual in the website, http://www.systemsbiology.co.kr/GEAR/.
Star Clusters in the Magellanic Clouds

NASA Astrophysics Data System (ADS)

Gallagher, J. S., III

2014-09-01

The Magellanic Clouds (MC) are prime locations for studies of star clusters covering a full range in age and mass. This contribution briefly reviews selected properties of Magellanic star clusters, by focusing first on young systems that show evidence for hierarchical star formation. The structures and chemical abundance patterns of older intermediate age star clusters in the Small Magellanic Cloud (SMC) are a second topic. These suggest a complex history has affected the chemical enrichment in the SMC and that low tidal stresses in the SMC foster star cluster survival.
Cluster analysis for determining distribution center location

NASA Astrophysics Data System (ADS)

Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

2017-12-01

Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.
No Evidence for Multiple Stellar Populations in the Low-mass Galactic Globular Cluster E 3

NASA Astrophysics Data System (ADS)

Salinas, Ricardo; Strader, Jay

2015-08-01

Multiple stellar populations are a widespread phenomenon among Galactic globular clusters. Even though the origin of the enriched material from which new generations of stars are produced remains unclear, it is likely that self-enrichment will be feasible only in clusters massive enough to retain this enriched material. We searched for multiple populations in the low mass (M˜ 1.4× {10}4 {M}⊙ ) globular cluster E3, analyzing SOAR/Goodman multi-object spectroscopy centered on the blue cyanogen (CN) absorption features of 23 red giant branch stars. We find that the CN abundance does not present the typical bimodal behavior seen in clusters hosting multistellar populations, but rather a unimodal distribution that indicates the presence of a genuine single stellar population, or a level of enrichment much lower than in clusters that show evidence for two populations from high-resolution spectroscopy. E3 would be the first bona fide Galactic old globular cluster where no sign of self-enrichment is found. Based on observations obtained at the Southern Astrophysical Research (SOAR) Telescope, which is a joint project of the Ministério da Ciência, Tecnologia, e Inovação (MCTI) da República Federativa do Brasil, the US National Optical Astronomy Observatory (NOAO), the University of North Carolina at Chapel Hill (UNC), and Michigan State University (MSU).
Using cluster analysis for medical resource decision making.

PubMed

Dilts, D; Khamalah, J; Plotkin, A

1995-01-01

Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.
Genome Sequences of Three Cluster AU Arthrobacter Phages, Caterpillar, Nightmare, and Teacup

PubMed Central

Adair, Tamarah L.; Stowe, Emily; Pizzorno, Marie C.; Krukonis, Gregory; Harrison, Melinda; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah

2017-01-01

ABSTRACT Caterpillar, Nightmare, and Teacup are cluster AU siphoviral phages isolated from enriched soil on Arthrobacter sp. strain ATCC 21022. These genomes are 58 kbp long with an average G+C content of 50%. Sequence analysis predicts 86 to 92 protein-coding genes, including a large number of small proteins with predicted transmembrane domains. PMID:29122860
Cluster Analysis of Minnesota School Districts. A Research Report.

ERIC Educational Resources Information Center

Cleary, James

The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…
Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

PubMed

Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

2017-11-01

Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease
Geographic atrophy phenotype identification by cluster analysis.

PubMed

Monés, Jordi; Biarnés, Marc

2018-03-01

To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm 2 /year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

PubMed Central

Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

2016-01-01

Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

Cluster analysis of the hot subdwarfs in the PG survey

NASA Technical Reports Server (NTRS)

Thejll, Peter; Charache, Darryl; Shipman, Harry L.

1989-01-01

Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.
A Survey of Popular R Packages for Cluster Analysis

ERIC Educational Resources Information Center

Flynt, Abby; Dean, Nema

2016-01-01

Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

PubMed

Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

2017-05-01

The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.
NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.

PubMed

Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun

2017-09-21

High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.
[Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

PubMed

Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

2016-09-01

To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire
Differences in Pedaling Technique in Cycling: A Cluster Analysis.

PubMed

Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A

2016-10-01

To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (PO VT2 ) compared with cycling at their maximal power output (PO MAX ). Twenty athletes performed an incremental cycling test to determine their power output (PO MAX and PO VT2 ; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in PO MAX and PO VT2 . Athletes were assigned to 2 clusters based on the behavior of outcome variables at PO VT2 and PO MAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at PO VT2 vs PO MAX , cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.
Performance analysis of clustering techniques over microarray data: A case study

NASA Astrophysics Data System (ADS)

Dash, Rasmita; Misra, Bijan Bihari

2018-03-01

Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

PubMed

Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

2014-10-16

The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a
A Spectroscopic Analysis of the Galactic Globular Cluster NGC 6273 (M19)

NASA Astrophysics Data System (ADS)

Johnson, Christian I.; Rich, R. Michael; Pilachowski, Catherine A.; Caldwell, Nelson; Mateo, Mario; Bailey, John I., III; Crane, Jeffrey D.

2015-08-01

A combined effort utilizing spectroscopy and photometry has revealed the existence of a new globular cluster class. These “anomalous” clusters, which we refer to as “iron-complex” clusters, are differentiated from normal clusters by exhibiting large (≳0.10 dex) intrinsic metallicity dispersions, complex sub-giant branches, and correlated [Fe/H] and s-process enhancements. In order to further investigate this phenomenon, we have measured radial velocities and chemical abundances for red giant branch stars in the massive, but scarcely studied, globular cluster NGC 6273. The velocities and abundances were determined using high resolution (R ˜ 27,000) spectra obtained with the Michigan/Magellan Fiber System (M2FS) and MSpec spectrograph on the Magellan-Clay 6.5 m telescope at Las Campanas Observatory. We find that NGC 6273 has an average heliocentric radial velocity of +144.49 km s-1 (σ = 9.64 km s-1) and an extended metallicity distribution ([Fe/H] = -1.80 to -1.30) composed of at least two distinct stellar populations. Although the two dominant populations have similar [Na/Fe], [Al/Fe], and [α/Fe] abundance patterns, the more metal-rich stars exhibit significant [La/Fe] enhancements. The [La/Eu] data indicate that the increase in [La/Fe] is due to almost pure s-process enrichment. A third more metal-rich population with low [X/Fe] ratios may also be present. Therefore, NGC 6273 joins clusters such as ω Centauri, M2, M22, and NGC 5286 as a new class of iron-complex clusters exhibiting complicated star formation histories. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.
Cluster analysis of Southeastern U.S. climate stations

NASA Astrophysics Data System (ADS)

Stooksbury, D. E.; Michaels, P. J.

1991-09-01

A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.
Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

PubMed

Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

2015-03-01

Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.
Enrichment and separation techniques for large-scale proteomics analysis of the protein post-translational modifications.

PubMed

Huang, Junfeng; Wang, Fangjun; Ye, Mingliang; Zou, Hanfa

2014-11-06

Comprehensive analysis of the post-translational modifications (PTMs) on proteins at proteome level is crucial to elucidate the regulatory mechanisms of various biological processes. In the past decades, thanks to the development of specific PTM enrichment techniques and efficient multidimensional liquid chromatography (LC) separation strategy, the identification of protein PTMs have made tremendous progress. A huge number of modification sites for some major protein PTMs have been identified by proteomics analysis. In this review, we first introduced the recent progresses of PTM enrichment methods for the analysis of several major PTMs including phosphorylation, glycosylation, ubiquitination, acetylation, methylation, and oxidation/reduction status. We then briefly summarized the challenges for PTM enrichment. Finally, we introduced the fractionation and separation techniques for efficient separation of PTM peptides in large-scale PTM analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
A generalized analysis of hydrophobic and loop clusters within globular protein sequences

PubMed Central

Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

2007-01-01

Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The
The Psychology of Yoga Practitioners: A Cluster Analysis.

PubMed

Genovese, Jeremy E C; Fondran, Kristine M

2017-11-01

Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall -Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.
The Psychology of Yoga Practitioners: A Cluster Analysis.

PubMed

Genovese, Jeremy E C; Fondran, Kristine M

2017-03-30

Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.
Graph analysis of cell clusters forming vascular networks

NASA Astrophysics Data System (ADS)

Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

2018-03-01

This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.
Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.

PubMed

Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J

2016-04-01

Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All
A high fat diet containing saturated but not unsaturated fatty acids enhances T cell receptor clustering on the nanoscale.

PubMed

Shaikh, Saame Raza; Boyle, Sarah; Edidin, Michael

2015-09-01

Cell culture studies show that the nanoscale lateral organization of surface receptors, their clustering or dispersion, can be altered by changing the lipid composition of the membrane bilayer. However, little is known about similar changes in vivo, which can be effected by changing dietary lipids. We describe the use of a newly developed method, k-space image correlation spectroscopy, kICS, for analysis of quantum dot fluorescence to show that a high fat diet can alter the nanometer-scale clustering of the murine T cell receptor, TCR, on the surface of naive CD4(+) T cells. We found that diets enriched primarily in saturated fatty acids increased TCR nanoscale clustering to a level usually seen only on activated cells. Diets enriched in monounsaturated or n-3 polyunsaturated fatty acids had no effect on TCR clustering. Also none of the high fat diets affected TCR clustering on the micrometer scale. Furthermore, the effect of the diets was similar in young and middle aged mice. Our data establish proof-of-principle that TCR nanoscale clustering is sensitive to the composition of dietary fat. Copyright © 2015 Elsevier Ltd. All rights reserved.
Systematization of actinides using cluster analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kopyrin, A.A.; Terent`eva, T.N.; Khramov, N.N.

1994-11-01

A representation of the actinides in multidimensional property space is proposed for systematization of these elements using cluster analysis. Literature data for their atomic properties are used. Owing to the wide variation of published ionization potentials, medians are used to estimate them. Vertical dendograms are used for classification on the basis of distances between the actinides in atomic-property space. The properties of actinium and lawrencium are furthest removed from the main group. Thorium and mendelevium exhibit individualized properties. A cluster based on the einsteinium-fermium pair is joined by californium.
Pathway-based variant enrichment analysis on the example of dilated cardiomyopathy.

PubMed

Backes, Christina; Meder, Benjamin; Lai, Alan; Stoll, Monika; Rühle, Frank; Katus, Hugo A; Keller, Andreas

2016-01-01

Genome-wide association (GWA) studies have significantly contributed to the understanding of human genetic variation and its impact on clinical traits. Frequently only a limited number of highly significant associations were considered as biologically relevant. Increasingly, network analysis of affected genes is used to explore the potential role of the genetic background on disease mechanisms. Instead of first determining affected genes or calculating scores for genes and performing pathway analysis on the gene level, we integrated both steps and directly calculated enrichment on the genetic variant level. The respective approach has been tested on dilated cardiomyopathy (DCM) GWA data as showcase. To compute significance values, 5000 permutation tests were carried out and p values were adjusted for multiple testing. For 282 KEGG pathways, we computed variant enrichment scores and significance values. Of these, 65 were significant. Surprisingly, we discovered the "nucleotide excision repair" and "tuberculosis" pathways to be most significantly associated with DCM (p = 10(-9)). The latter pathway is driven by genes of the HLA-D antigen group, a finding that closely resembles previous discoveries made by expression quantitative trait locus analysis in the context of DCM-GWA. Next, we implemented a sub-network-based analysis, which searches for affected parts of KEGG, however, independent on the pre-defined pathways. Here, proteins of the contractile apparatus of cardiac cells as well as the FAS sub-network were found to be affected by common polymorphisms in DCM. In this work, we performed enrichment analysis directly on variants, leveraging the potential to discover biological information in thousands of published GWA studies. The applied approach is cutoff free and considers a ranked list of genetic variants as input.

A Multivariate Analysis of Galaxy Cluster Properties

NASA Astrophysics Data System (ADS)

Ogle, P. M.; Djorgovski, S.

1993-05-01

We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

NASA Astrophysics Data System (ADS)

Fu, Pei-hua; Yin, Hong-bo

In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
Topic modeling for cluster analysis of large biological and medical datasets

PubMed Central

2014-01-01

Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than
Topic modeling for cluster analysis of large biological and medical datasets.

PubMed

Zhao, Weizhong; Zou, Wen; Chen, James J

2014-01-01

The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting
Multiscale visual quality assessment for cluster analysis with self-organizing maps

NASA Astrophysics Data System (ADS)

Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

2011-01-01

Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

PubMed

Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

2017-11-01

Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

PubMed Central

Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

2013-01-01

Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh

2013-11-30

Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

USGS Publications Warehouse

McKenna, J.E.

2003-01-01

The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.
Development of small scale cluster computer for numerical analysis

NASA Astrophysics Data System (ADS)

Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

2017-09-01

In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
Electrical Load Profile Analysis Using Clustering Techniques

NASA Astrophysics Data System (ADS)

Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.

2017-03-01

Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

PubMed

Williams, N J; Nasuto, S J; Saddy, J D

2015-07-30

The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

PubMed

Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

2017-05-01

Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
Role of Hydrophobic Clusters and Long-Range Contact Networks in the Folding of (α/β)8 Barrel Proteins

PubMed Central

Selvaraj, S.; Gromiha, M. Michael

2003-01-01

Analysis on the three dimensional structures of (α/β)8 barrel proteins provides ample light to understand the factors that are responsible for directing and maintaining their common fold. In this work, the hydrophobically enriched clusters are identified in 92% of the considered (α/β)8 barrel proteins. The residue segments with hydrophobic clusters have high thermal stability. Further, these clusters are formed and stabilized through long-range interactions. Specifically, a network of long-range contacts connects adjacent β-strands of the (α/β)8 barrel domain and the hydrophobic clusters. The implications of hydrophobic clusters and long-range networks in providing a feasible common mechanism for the folding of (α/β)8 barrel proteins are proposed. PMID:12609894
SUPERMODEL ANALYSIS OF GALAXY CLUSTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fusco-Femiano, R.; Cavaliere, A.; Lapi, A.

2009-11-01

We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core (CC) and Non Cool Core (NCC) classes, in terms of the Supermodel (SM) developed by Cavaliere et al. Based on the gravitational wells set by the dark matter (DM) halos, the SM straightforwardly expresses the equilibrium of the intracluster plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blast waves from mergers or from outbursts of active galactic nuclei. The cluster set analyzed here highlights notmore » only how simply the SM represents the main dichotomy CC versus NCC clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For CC clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked profile of the temperature marked by a decline toward the center, without requiring currently strong radiative cooling and high mass deposition rates. NCC clusters like A1656 require instead a central entropy floor of a substantial level, and some like A2256 and even more A644 feature structured temperature profiles that also call for a definite floor extension; in such conditions the SM accurately fits the observations, and suggests that in these clusters the ICP has been just remolded by a merger event, in the way of a remnant cool core. The SM also predicts that DM halos with high concentration should correlate with flatter entropy profiles and steeper brightness in the outskirts; this is indeed the case with A1689, for which from X-rays we find concentration values c approx 10, the hallmark of an early halo formation. Thus, we show the SM to constitute a fast tool not only to provide wide libraries of accurate fits to X-ray temperature and density profiles, but also to retrieve from the
Using Cluster Analysis for Data Mining in Educational Technology Research

ERIC Educational Resources Information Center

Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

2012-01-01

Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
[Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

PubMed

Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

2018-03-06

To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Identification of chronic rhinosinusitis phenotypes using cluster analysis.

PubMed

Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J

2015-05-01

Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.
GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.

PubMed

Subhash, Santhilal; Kanduri, Chandrasekhar

2016-09-13

High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.
High-Resolution CCD Spectra of Stars in Globular Clusters. IX. The "Young" Clusters Ruprecht 106 and PAL 12

NASA Astrophysics Data System (ADS)

Brown, Jeffrey A.; Wallerstein, George; Zucker, Daniel

1997-07-01

We have performed a spectroscopic abundance analysis of two stars each in the anomalously young globular clusters Rup 106 and Pal 12. We find [Fe/H] =~ -1.45 for Rup 106 and -1.0 for Pal 12. The abundance ratios in both clusters are peculiar in comparison to other globulars: the alpha -elements are not enhanced over the solar ratio. We find that oxygen in Rup 106 is also relatively low, with [O/Fe] =~ 0.0 - +0.1. The similarity of the ratio of the alpha-elements to iron to the solar ratio shows that species contributed by supernovae of type Ia have ``caught up" with species produced by SN II's. The similar contributions of the alpha - and Fe-peak species to disk stars shows that age, not metallicity, is the determining factor in the ratio of SN II/SN Ia nucleosynthesis. Galactic enrichment models show that these abundance ratios can be understood as being the result of these two clusters coming from an environment with multiple discontinuous star formation events.

Identification and characterization of near-fatal asthma phenotypes by cluster analysis.

PubMed

Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V

2015-09-01

Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Comprehensive assessment of cancer missense mutation clustering in protein structures.

PubMed

Kamburov, Atanas; Lawrence, Michael S; Polak, Paz; Leshchiner, Ignaty; Lage, Kasper; Golub, Todd R; Lander, Eric S; Getz, Gad

2015-10-06

Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.
Comprehensive assessment of cancer missense mutation clustering in protein structures

PubMed Central

Kamburov, Atanas; Lawrence, Michael S.; Polak, Paz; Leshchiner, Ignaty; Lage, Kasper; Golub, Todd R.; Lander, Eric S.; Getz, Gad

2015-01-01

Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations. PMID:26392535
Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis

ERIC Educational Resources Information Center

Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian

2006-01-01

Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…
A scoping review of spatial cluster analysis techniques for point-event data.

PubMed

Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott

2013-05-01

Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.
Using cluster analysis to organize and explore regional GPS velocities

USGS Publications Warehouse

Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

2012-01-01

Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.
Diversity of bacteria and glycosyl hydrolase family 48 genes in cellulolytic consortia enriched from thermophilic biocompost.

PubMed

Izquierdo, Javier A; Sizova, Maria V; Lynd, Lee R

2010-06-01

The enrichment from nature of novel microbial communities with high cellulolytic activity is useful in the identification of novel organisms and novel functions that enhance the fundamental understanding of microbial cellulose degradation. In this work we identify predominant organisms in three cellulolytic enrichment cultures with thermophilic compost as an inoculum. Community structure based on 16S rRNA gene clone libraries featured extensive representation of clostridia from cluster III, with minor representation of clostridial clusters I and XIV and a novel Lutispora species cluster. Our studies reveal different levels of 16S rRNA gene diversity, ranging from 3 to 18 operational taxonomic units (OTUs), as well as variability in community membership across the three enrichment cultures. By comparison, glycosyl hydrolase family 48 (GHF48) diversity analyses revealed a narrower breadth of novel clostridial genes associated with cultured and uncultured cellulose degraders. The novel GHF48 genes identified in this study were related to the novel clostridia Clostridium straminisolvens and Clostridium clariflavum, with one cluster sharing as little as 73% sequence similarity with the closest known relative. In all, 14 new GHF48 gene sequences were added to the known diversity of 35 genes from cultured species.
Trace metal enrichment and organic matter sources in the surface sediments of Arabian Sea along southwest India (Kerala coast).

PubMed

Sreekanth, Athira; Mrudulrag, S K; Cheriyan, Eldhose; Sujatha, C H

2015-12-30

The geochemical distribution and enrichment of trace metals (Cd, Co, Cu, Fe, Mn, Ni, Pb and Zn) were determined in the surface sediments of Arabian Sea, along southwest India, Kerala coast. The results of geochemical indices indicated that surficial sediments of five transects are uncontaminated with respect to Mn, Zn and Cu, uncontaminated to moderately contaminated with Co and Ni, and moderately to strongly contaminated with Pb. The deposition of trace elements exhibited three different patterns i) Cd and Zn enhanced with settling biodetritus from the upwelled waters, ii) Pb, Co and Ni show higher enrichment, evidenced by the association through adsorption of iron-manganese nodules onto clay minerals and iii) Cu enrichment observed close to major urban sectors, initiated by the precipitation as Cu sulfides. Correlation, principal component analysis (PCA) and cluster analysis (CA) were used to confirm the origin information of metals and the nature of organic matter composition. Copyright © 2015 Elsevier Ltd. All rights reserved.
Suicide in the oldest old: an observational study and cluster analysis.

PubMed

Sinyor, Mark; Tan, Lynnette Pei Lin; Schaffer, Ayal; Gallagher, Damien; Shulman, Kenneth

2016-01-01

The older population are at a high risk for suicide. This study sought to learn more about the characteristics of suicide in the oldest-old and to use a cluster analysis to determine if oldest-old suicide victims assort into clinically meaningful subgroups. Data were collected from a coroner's chart review of suicide victims in Toronto from 1998 to 2011. We compared two age groups (65-79 year olds, n = 335, and 80+ year olds, n = 191) and then conducted a hierarchical agglomerative cluster analysis using Ward's method to identify distinct clusters in the 80+ group. The younger and older age groups differed according to marital status, living circumstances and pattern of stressors. The cluster analysis identified three distinct clusters in the 80+ group. Cluster 1 was the largest (n = 124) and included people who were either married or widowed who had significantly more depression and somewhat more medical health stressors. In contrast, cluster 2 (n = 50) comprised people who were almost all single and living alone with significantly less identified depression and slightly fewer medical health stressors. All members of cluster 3 (n = 17) lived in a retirement residence or nursing home, and this group had the highest rates of depression, dementia, other mental illness and past suicide attempts. This is the first study to use the cluster analysis technique to identify meaningful subgroups among suicide victims in the oldest-old. The results reveal different patterns of suicide in the older population that may be relevant for clinical care. Copyright © 2015 John Wiley & Sons, Ltd.
Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-means Cluster Analysis.

PubMed

Craen, Saskia de; Commandeur, Jacques J F; Frank, Laurence E; Heiser, Willem J

2006-06-01

K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these populations showed a significant effect of lack of sphericity and group size. This effect was, however, not as large as expected, with still a recovery index of more than 0.5 in the "worst case scenario." An interaction effect between the two data aspects was also found. The decreasing trend in the recovery of clusters for increasing departures from sphericity is different for equal and unequal group sizes.
Regional heatwaves in china: a cluster analysis

NASA Astrophysics Data System (ADS)

Wang, Pinya; Tang, Jianping; Wang, Shuyu; Dong, Xinning; Fang, Juan

2018-03-01

With the consideration of spatial extension of heatwave events, two kind of regional heatwaves using absolute and relative thresholds, namely RHWs-A and RHWs-R, are investigated during 1959-2013. The temperature data is derived from the daily maximum temperatures (DMTs) of 587 stations in China. Totally 298 RHWs-A and 374 RHWs-R are identified during the past 55 years, and both of them are growing more frequent since the mid-1980s. By utilizing the cluster analysis, several typical spatial distributions of RHWs-A/RHWs-R are obtained. For RHWs-A, there are three clusters covering the southeastern, northwestern China and the lower reaches of Yangtze River, of which the southeastern cluster groups the most heatwaves. For RHWs-R, there are seven clusters distributed throughout the whole regions of China. The clusters in the northwestern and northeastern China are more stable than others for both RHWs-A and RHWs-R, and the northern clusters are of larger intensity than that of the southern ones. All RHWs-A/RHWs-R are accompanied by the anomalous high systems along with the reduced soil moisture. The southern clusters are controlled by Northwestern Pacific subtropical high (WPSH), and the northern ones are influenced by the mid-latitude high systems. The influences of atmospheric circulations and soil moisture on regional heatwaves are further demonstrated by two case analyses of the severe RHW-A in 2003 and the RHW-R in 2013.
Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

PubMed

Stefurak, Tres; Calhoun, Georgia B

2007-01-01

The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.
Regression analysis of clustered failure time data with informative cluster size under the additive transformation models.

PubMed

Chen, Ling; Feng, Yanqin; Sun, Jianguo

2017-10-01

This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Enrichment and Bioavailability of Trace Elements in Soil in Vicinity of Railways in Japan.

PubMed

Wang, Zhen; Watanabe, Izumi; Ozaki, Hirozaku; Zhang, Jianqiang

2018-01-01

This study focuses on the concentrations, distribution, pollution levels, and bioavailability of 12 trace elements in soils along 6 different railways in Japan. Three diesel powered railways and three electricity powered railways were chosen as target. Surface soils (< 3 cm) were collected in vicinity of railways for analysis. Digestion and extraction were performed before concentration and bioavailability analysis. Enrichment factor was applied to investigate contamination levels of selected elements. The mean concentrations of Cr, Co, Ni, Cu, Zn, Sn, and Pb in soil samples were higher than soil background value in Japan. Concentrations of trace elements in soils along different railway had different characteristics. Horizontal distribution of Cu, Zn, Cd, Sn, and Pb in soil samples showed obviously downtrend with distance along railways with high frequency. Concentrations of V, Mn, Fe, and Co were higher in soils along railways which pass through city center. According to principal component analysis and cluster analysis, concentrations of Cu, Zn, Sn, and Pb could be considered as the indicators of soil contamination level along electricity powered trains, whereas indicators along diesel powered trains were not clear. Enrichment factor analysis proved that operation of freight trains had impact on pollution level of Cr, Ni, and Cd. Bioavailability of Mn, Co, Zn, and Cd in soil along electricity-powered railways were higher, and bioavailability of Pb in railways located in countryside was lower. Thus, enrichment and bioavailability of trace elements can be indicators of railway-originated trace elements pollution in soil.
Determining the Ages and Distances of 4 Open Clusters

NASA Astrophysics Data System (ADS)

Sawczynec, Erica A.; James D. Armstrong, Joe M. Ritter, Jeff Kuhn

2018-01-01

The study of nearby young open clusters can give insight into star formation and potentially the local rate of metal enrichment. Presented is a BVRI photometric analysis of 4 open clusters; NGC 2509, NGC 2483, NGC 2482, and NGC 6705, in order to reevaluate previously published ages and distances using modern CCD photometry, and newer stellar models. Observations were obtained from the Cerro Tololo node of the Las Cumbres Observatory 1.0 meter network. Color magnitude diagrams were compared to modeled isochrones and the updated ages and distances determined. An interesting stellar association was found in the color magnitude diagram of NGC 6705. The structure is suggestive of two epochs of stellar formation. Members of this structure were evaluated using the Gaia Archive in order to explore the possibility of a heterogeneous population. The status of NGC 2483 as an open cluster has been debated; however, it has been noted that there is a high concentration of Be stars found in the region. It is concluded that NGC 2483 is an open cluster.
The Productivity Analysis of Chennai Automotive Industry Cluster

NASA Astrophysics Data System (ADS)

Bhaskaran, E.

2014-07-01

Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.
Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

NASA Astrophysics Data System (ADS)

Žibert, Janez; Pražnikar, Jure

2012-09-01

The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of
A Note on Cluster Effects in Latent Class Analysis

ERIC Educational Resources Information Center

Kaplan, David; Keller, Bryan

2011-01-01

This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…
Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis

PubMed Central

Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.

2013-01-01

Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation
Stability-based validation of dietary patterns obtained by cluster analysis.

PubMed

Sauvageot, Nicolas; Schritz, Anna; Leite, Sonia; Alkerwi, Ala'a; Stranges, Saverio; Zannad, Faiez; Streel, Sylvie; Hoge, Axelle; Donneau, Anne-Françoise; Albert, Adelin; Guillaume, Michèle

2017-01-14

Cluster analysis is a data-driven method used to create clusters of individuals sharing similar dietary habits. However, this method requires specific choices from the user which have an influence on the results. Therefore, there is a need of an objective methodology helping researchers in their decisions during cluster analysis. The objective of this study was to use such a methodology based on stability of clustering solutions to select the most appropriate clustering method and number of clusters for describing dietary patterns in the NESCAV study (Nutrition, Environment and Cardiovascular Health), a large population-based cross-sectional study in the Greater Region (N = 2298). Clustering solutions were obtained with K-means, K-medians and Ward's method and a number of clusters varying from 2 to 6. Their stability was assessed with three indices: adjusted Rand index, Cramer's V and misclassification rate. The most stable solution was obtained with K-means method and a number of clusters equal to 3. The "Convenient" cluster characterized by the consumption of convenient foods was the most prevalent with 46% of the population having this dietary behaviour. In addition, a "Prudent" and a "Non-Prudent" patterns associated respectively with healthy and non-healthy dietary habits were adopted by 25% and 29% of the population. The "Convenient" and "Non-Prudent" clusters were associated with higher cardiovascular risk whereas the "Prudent" pattern was associated with a decreased cardiovascular risk. Associations with others factors showed that the choice of a specific dietary pattern is part of a wider lifestyle profile. This study is of interest for both researchers and public health professionals. From a methodological standpoint, we showed that using stability of clustering solutions could help researchers in their choices. From a public health perspective, this study showed the need of targeted health promotion campaigns describing the benefits of healthy

An enrichment method based on synergistic and reversible covalent interactions for large-scale analysis of glycoproteins.

PubMed

Xiao, Haopeng; Chen, Weixuan; Smeekens, Johanna M; Wu, Ronghu

2018-04-27

Protein glycosylation is ubiquitous in biological systems and essential for cell survival. However, the heterogeneity of glycans and the low abundance of many glycoproteins complicate their global analysis. Chemical methods based on reversible covalent interactions between boronic acid and glycans have great potential to enrich glycopeptides, but the binding affinity is typically not strong enough to capture low-abundance species. Here, we develop a strategy using dendrimer-conjugated benzoboroxole to enhance the glycopeptide enrichment. We test the performance of several boronic acid derivatives, showing that benzoboroxole markedly increases glycopeptide coverage from human cell lysates. The enrichment is further improved by conjugating benzoboroxole to a dendrimer, which enables synergistic benzoboroxole-glycan interactions. This robust and simple method is highly effective for sensitive glycoproteomics analysis, especially capturing low-abundance glycopeptides. Importantly, the enriched glycopeptides remain intact, making the current method compatible with mass-spectrometry-based approaches to identify glycosylation sites and glycan structures.
Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals.

PubMed

D'Addabbo, Annarita; Palmieri, Orazio; Maglietta, Rosalia; Latiano, Anna; Mukherjee, Sayan; Annese, Vito; Ancona, Nicola

2011-08-01

A meta-analysis has re-analysed previous genome-wide association scanning definitively confirming eleven genes and further identifying 21 new loci. However, the identified genes/loci still explain only the minority of genetic predisposition of Crohn's disease. To identify genes weakly involved in disease predisposition by analysing chromosomal regions enriched of single nucleotide polymorphisms with modest statistical association. We utilized the WTCCC data set evaluating 1748 CD and 2938 controls. The identification of candidate genes/loci was performed by a two-step procedure: first of all chromosomal regions enriched of weak association signals were localized; subsequently, weak signals clustered in gene regions were identified. The statistical significance was assessed by non parametric permutation tests. The cytoband enrichment analysis highlighted 44 regions (P≤0.05) enriched with single nucleotide polymorphisms significantly associated with the trait including 23 out of 31 previously confirmed and replicated genes. Importantly, we highlight further 20 novel chromosomal regions carrying approximately one hundred genes/loci with modest association. Amongst these we find compelling functional candidate genes such as MAPT, GRB2 and CREM, LCT, and IL12RB2. Our study suggests a different statistical perspective to discover genes weakly associated with a given trait, although further confirmatory functional studies are needed. Copyright © 2011 Editrice Gastroenterologica Italiana S.r.l. All rights reserved.
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

PubMed

Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

2017-07-01

Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
A Bayesian cluster analysis method for single-molecule localization microscopy data.

PubMed

Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick

2016-12-01

Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
Using Star Clusters as Tracers of Star Formation and Chemical Evolution: The Chemical Enrichment History of the Large Magellanic Cloud

NASA Astrophysics Data System (ADS)

Chilingarian, Igor V.; Asa’d, Randa

2018-05-01

The star formation (SFH) and chemical enrichment (CEH) histories of Local Group galaxies are traditionally studied by analyzing their resolved stellar populations in a form of color–magnitude diagrams obtained with the Hubble Space Telescope. Star clusters can be studied in integrated light using ground-based telescopes to much larger distances. They represent snapshots of the chemical evolution of their host galaxy at different ages. Here we present a simple theoretical framework for the chemical evolution based on the instantaneous recycling approximation (IRA) model. We infer a CEH from an SFH and vice versa using observational data. We also present a more advanced model for the evolution of individual chemical elements that takes into account the contribution of supernovae type Ia. We demonstrate that ages, iron, and α-element abundances of 15 star clusters derived from the fitting of their integrated optical spectra reliably trace the CEH of the Large Magellanic Cloud obtained from resolved stellar populations in the age range 40 Myr < t < 3.5 Gyr. The CEH predicted by our model from the global SFH of the LMC agrees remarkably well with the observed cluster age–metallicity relation. Moreover, the present-day total gas mass of the LMC estimated by the IRA model (6.2× {10}8 {M}ȯ ) matches within uncertainties the observed H I mass corrected for the presence of molecular gas (5.8+/- 0.5× {10}8 {M}ȯ ). We briefly discuss how our approach can be used to study SFHs of galaxies as distant as 10 Mpc at the level of detail that is currently available only in a handful of nearby Milky Way satellites. .
Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

PubMed

Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
The Quantitative Analysis of Chennai Automotive Industry Cluster

NASA Astrophysics Data System (ADS)

Bhaskaran, Ethirajan

2016-07-01

Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity
A dynamic intron retention program enriched in RNA processing genes regulates gene expression during terminal erythropoiesis

DOE PAGES

Pimentel, Harold; Parra, Marilyn; Gee, Sherry L.; ...

2015-11-03

Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentallydynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ~50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclearlocalized. Splicemore » site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. Finally, we conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.« less
A dynamic intron retention program enriched in RNA processing genes regulates gene expression during terminal erythropoiesis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pimentel, Harold; Parra, Marilyn; Gee, Sherry L.

Differentiating erythroblasts execute a dynamic alternative splicing program shown here to include extensive and diverse intron retention (IR) events. Cluster analysis revealed hundreds of developmentallydynamic introns that exhibit increased IR in mature erythroblasts, and are enriched in functions related to RNA processing such as SF3B1 spliceosomal factor. Distinct, developmentally-stable IR clusters are enriched in metal-ion binding functions and include mitoferrin genes SLC25A37 and SLC25A28 that are critical for iron homeostasis. Some IR transcripts are abundant, e.g. comprising ~50% of highly-expressed SLC25A37 and SF3B1 transcripts in late erythroblasts, and thereby limiting functional mRNA levels. IR transcripts tested were predominantly nuclearlocalized. Splicemore » site strength correlated with IR among stable but not dynamic intron clusters, indicating distinct regulation of dynamically-increased IR in late erythroblasts. Retained introns were preferentially associated with alternative exons with premature termination codons (PTCs). High IR was observed in disease-causing genes including SF3B1 and the RNA binding protein FUS. Comparative studies demonstrated that the intron retention program in erythroblasts shares features with other tissues but ultimately is unique to erythropoiesis. Finally, we conclude that IR is a multi-dimensional set of processes that post-transcriptionally regulate diverse gene groups during normal erythropoiesis, misregulation of which could be responsible for human disease.« less
Globular cluster formation with multiple stellar populations from hierarchical star cluster complexes

NASA Astrophysics Data System (ADS)

Bekki, Kenji

2017-05-01

Most old globular clusters (GCs) in the Galaxy are observed to have internal chemical abundance spreads in light elements. We discuss a new GC formation scenario based on hierarchical star formation within fractal molecular clouds. In the new scenario, a cluster of bound and unbound star clusters ('star cluster complex', SCC) that have a power-law cluster mass function with a slope (β) of 2 is first formed from a massive gas clump developed in a dwarf galaxy. Such cluster complexes and β = 2 are observed and expected from hierarchical star formation. The most massive star cluster ('main cluster'), which is the progenitor of a GC, can accrete gas ejected from asymptotic giant branch (AGB) stars initially in the cluster and other low-mass clusters before the clusters are tidally stripped or destroyed to become field stars in the dwarf. The SCC is initially embedded in a giant gas hole created by numerous supernovae of the SCC so that cold gas outside the hole can be accreted on to the main cluster later. New stars formed from the accreted gas have chemical abundances that are different from those of the original SCC. Using hydrodynamical simulations of GC formation based on this scenario, we show that the main cluster with the initial mass as large as [2-5] × 105 M⊙ can accrete more than 105 M⊙ gas from AGB stars of the SCC. We suggest that merging of hierarchical SSCs can play key roles in stellar halo formation around GCs and self-enrichment processes in the early phase of GC formation.
ESEA: Discovering the Dysregulated Pathways based on Edge Set Enrichment Analysis

PubMed Central

Han, Junwei; Shi, Xinrui; Zhang, Yunpeng; Xu, Yanjun; Jiang, Ying; Zhang, Chunlong; Feng, Li; Yang, Haixiu; Shang, Desi; Sun, Zeguo; Su, Fei; Li, Chunquan; Li, Xia

2015-01-01

Pathway analyses are playing an increasingly important role in understanding biological mechanism, cellular function and disease states. Current pathway-identification methods generally focus on only the changes of gene expression levels; however, the biological relationships among genes are also the fundamental components of pathways, and the dysregulated relationships may also alter the pathway activities. We propose a powerful computational method, Edge Set Enrichment Analysis (ESEA), for the identification of dysregulated pathways. This provides a novel way of pathway analysis by investigating the changes of biological relationships of pathways in the context of gene expression data. Simulation studies illustrate the power and performance of ESEA under various simulated conditions. Using real datasets from p53 mutation, Type 2 diabetes and lung cancer, we validate effectiveness of ESEA in identifying dysregulated pathways. We further compare our results with five other pathway enrichment analysis methods. With these analyses, we show that ESEA is able to help uncover dysregulated biological pathways underlying complex traits and human diseases via specific use of the dysregulated biological relationships. We develop a freely available R-based tool of ESEA. Currently, ESEA can support pathway analysis of the seven public databases (KEGG; Reactome; Biocarta; NCI; SPIKE; HumanCyc; Panther). PMID:26267116
Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

PubMed Central

Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

2016-01-01

Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas
Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

PubMed

Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

2016-11-01

To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.
Cluster analysis of obesity and asthma phenotypes.

PubMed

Sutherland, E Rand; Goleva, Elena; King, Tonya S; Lehman, Erik; Stevens, Allen D; Jackson, Leisa P; Stream, Amanda R; Fahy, John V; Leung, Donald Y M

2012-01-01

Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC). Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. In a cohort of clinical trial participants (n = 250), minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα) and induction of MAP kinase phosphatase-1 (MKP-1) expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2)) and severity of asthma symptoms (AEQ score) the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively). Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ) and control (ACQ), exhaled nitric oxide concentration (F(E)NO) and airway hyperresponsiveness (methacholine PC(20)) but were similar with regard to measures of lung function (FEV(1) (%) and FEV(1)/FVC), airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP). Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals
Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

PubMed

Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

2017-12-01

Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the
Selective Enrichment and Direct Analysis of Protein S-Palmitoylation Sites.

PubMed

Thinon, Emmanuelle; Fernandez, Joseph P; Molina, Henrik; Hang, Howard C

2018-05-04

S-Fatty-acylation is the covalent attachment of long chain fatty acids, predominately palmitate (C16:0, S-palmitoylation), to cysteine (Cys) residues via a thioester linkage on proteins. This post-translational and reversible lipid modification regulates protein function and localization in eukaryotes and is important in mammalian physiology and human diseases. While chemical labeling methods have improved the detection and enrichment of S-fatty-acylated proteins, mapping sites of modification and characterizing the endogenously attached fatty acids are still challenging. Here, we describe the integration and optimization of fatty acid chemical reporter labeling with hydroxylamine-mediated enrichment of S-fatty-acylated proteins and direct tagging of modified Cys residues to selectively map lipid modification sites. This afforded improved enrichment and direct identification of many protein S-fatty-acylation sites compared to previously described methods. Notably, we directly identified the S-fatty-acylation sites of IFITM3, an important interferon-stimulated inhibitor of virus entry, and we further demonstrated that the highly conserved Cys residues are primarily modified by palmitic acid. The methods described here should facilitate the direct analysis of protein S-fatty-acylation sites and their endogenously attached fatty acids in diverse cell types and activation states important for mammalian physiology and diseases.
Graph-Theoretic Analysis of Monomethyl Phosphate Clustering in Ionic Solutions.

PubMed

Han, Kyungreem; Venable, Richard M; Bryant, Anne-Marie; Legacy, Christopher J; Shen, Rong; Li, Hui; Roux, Benoît; Gericke, Arne; Pastor, Richard W

2018-02-01

All-atom molecular dynamics simulations combined with graph-theoretic analysis reveal that clustering of monomethyl phosphate dianion (MMP 2- ) is strongly influenced by the types and combinations of cations in the aqueous solution. Although Ca 2+ promotes the formation of stable and large MMP 2- clusters, K + alone does not. Nonetheless, clusters are larger and their link lifetimes are longer in mixtures of K + and Ca 2+ . This "synergistic" effect depends sensitively on the Lennard-Jones interaction parameters between Ca 2+ and the phosphorus oxygen and correlates with the hydration of the clusters. The pronounced MMP 2- clustering effect of Ca 2+ in the presence of K + is confirmed by Fourier transform infrared spectroscopy. The characterization of the cation-dependent clustering of MMP 2- provides a starting point for understanding cation-dependent clustering of phosphoinositides in cell membranes.
Applications and theory of electrokinetic enrichment in micro-nanofluidic chips.

PubMed

Chen, Xueye; Zhang, Shuai; Zhang, Lei; Yao, Zhen; Chen, Xiaodong; Zheng, Yue; Liu, Yanlin

2017-09-01

This review reports the progress on the recent development of electrokinetic enrichment in micro-nanofluidic chips. The governing equations of electrokinetic enrichment in micro-nanofluidic chips are given. Various enrichment applications including protein analysis, DNA analysis, bacteria analysis, viruses analysis and cell analysis are illustrated and discussed. The advantages and difficulties of each enrichment method are expatiated. This paper will provide a particularly convenient and valuable reference to those who intend to research the electrokinetic enrichment based on micro-nanofluidic chips.
Comprehensive proteome analysis of Actinoplanes sp. SE50/110 highlighting the location of proteins encoded by the acarbose and the pyochelin biosynthesis gene cluster.

PubMed

Wendler, Sergej; Otto, Andreas; Ortseifen, Vera; Bonn, Florian; Neshat, Armin; Schneiker-Bekel, Susanne; Walter, Frederik; Wolf, Timo; Zemke, Till; Wehmeier, Udo F; Hecker, Michael; Kalinowski, Jörn; Becher, Dörte; Pühler, Alfred

2015-07-01

Acarbose is an α-glucosidase inhibitor produced by Actinoplanes sp. SE50/110 that is medically important due to its application in the treatment of type2 diabetes. In this work, a comprehensive proteome analysis of Actinoplanes sp. SE50/110 was carried out to determine the location of proteins of the acarbose (acb) and the putative pyochelin (pch) biosynthesis gene cluster. Therefore, a comprehensive state-of-the-art proteomics approach combining subcellular fractionation, shotgun proteomics and spectral counting to assess the relative abundance of proteins within fractions was applied. The analysis of four different proteome fractions (cytosolic, enriched membrane, membrane shaving and extracellular fraction) resulted in the identification of 1582 of the 8270 predicted proteins. All 22 Acb-proteins and 21 of the 23 Pch-proteins were detected. Predicted membrane-associated, integral membrane or extracellular proteins of the pch and the acb gene cluster were found among the most abundant proteins in corresponding fractions. Intracellular biosynthetic proteins of both gene clusters were not only detected in the cytosolic, but also in the enriched membrane fraction, indicating that the biosynthesis of acarbose and putative pyochelin metabolites takes place at the inner membrane. Actinoplanes sp. SE50/110 is a natural producer of the α-glucosidase inhibitor acarbose, a bacterial secondary metabolite that is used as a drug for the treatment of type 2 diabetes, a disease which is a global pandemic that currently affects 387 million people and accounts for 11% of worldwide healthcare expenditures (www.idf.org). The work presented here is the first comprehensive investigation of protein localization and abundance in Actinoplanes sp. SE50/110 and provides an extensive source of information for the selection of genes for future mutational analysis and other hypothesis driven experiments. The conclusion that acarbose or pyochelin family siderophores are synthesized at the
Predictable enriched environment prevents development of hyper-emotionality in the VPA rat model of autism.

PubMed

Favre, Mônica R; La Mendola, Deborah; Meystre, Julie; Christodoulou, Dimitri; Cochrane, Melissa J; Markram, Henry; Markram, Kamila

2015-01-01

Understanding the effects of environmental stimulation in autism can improve therapeutic interventions against debilitating sensory overload, social withdrawal, fear and anxiety. Here, we evaluate the role of environmental predictability on behavior and protein expression, and inter-individual differences, in the valproic acid (VPA) model of autism. Male rats embryonically exposed (E11.5) either to VPA, a known autism risk factor in humans, or to saline, were housed from weaning into adulthood in a standard laboratory environment, an unpredictably enriched environment, or a predictably enriched environment. Animals were tested for sociability, nociception, stereotypy, fear conditioning and anxiety, and for tissue content of glutamate signaling proteins in the primary somatosensory cortex, hippocampus and amygdala, and of corticosterone in plasma, amygdala and hippocampus. Standard group analyses on separate measures were complemented with a composite emotionality score, using Cronbach's Alpha analysis, and with multivariate profiling of individual animals, using Hierarchical Cluster Analysis. We found that predictable environmental enrichment prevented the development of hyper-emotionality in the VPA-exposed group, while unpredictable enrichment did not. Individual variation in the severity of the autistic-like symptoms (fear, anxiety, social withdrawal and sensory abnormalities) correlated with neurochemical profiles, and predicted their responsiveness to predictability in the environment. In controls, the association between socio-affective behaviors, neurochemical profiles and environmental predictability was negligible. This study suggests that rearing in a predictable environment prevents the development of hyper-emotional features in animals exposed to an autism risk factor, and demonstrates that unpredictable environments can lead to negative outcomes, even in the presence of environmental enrichment.

Predictable enriched environment prevents development of hyper-emotionality in the VPA rat model of autism

PubMed Central

Favre, Mônica R.; La Mendola, Deborah; Meystre, Julie; Christodoulou, Dimitri; Cochrane, Melissa J.; Markram, Henry; Markram, Kamila

2015-01-01

Understanding the effects of environmental stimulation in autism can improve therapeutic interventions against debilitating sensory overload, social withdrawal, fear and anxiety. Here, we evaluate the role of environmental predictability on behavior and protein expression, and inter-individual differences, in the valproic acid (VPA) model of autism. Male rats embryonically exposed (E11.5) either to VPA, a known autism risk factor in humans, or to saline, were housed from weaning into adulthood in a standard laboratory environment, an unpredictably enriched environment, or a predictably enriched environment. Animals were tested for sociability, nociception, stereotypy, fear conditioning and anxiety, and for tissue content of glutamate signaling proteins in the primary somatosensory cortex, hippocampus and amygdala, and of corticosterone in plasma, amygdala and hippocampus. Standard group analyses on separate measures were complemented with a composite emotionality score, using Cronbach's Alpha analysis, and with multivariate profiling of individual animals, using Hierarchical Cluster Analysis. We found that predictable environmental enrichment prevented the development of hyper-emotionality in the VPA-exposed group, while unpredictable enrichment did not. Individual variation in the severity of the autistic-like symptoms (fear, anxiety, social withdrawal and sensory abnormalities) correlated with neurochemical profiles, and predicted their responsiveness to predictability in the environment. In controls, the association between socio-affective behaviors, neurochemical profiles and environmental predictability was negligible. This study suggests that rearing in a predictable environment prevents the development of hyper-emotional features in animals exposed to an autism risk factor, and demonstrates that unpredictable environments can lead to negative outcomes, even in the presence of environmental enrichment. PMID:26089770
Cluster analysis as a prediction tool for pregnancy outcomes.

PubMed

Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

2015-03-01

Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.
Identifying Peer Institutions Using Cluster Analysis

ERIC Educational Resources Information Center

Boronico, Jess; Choksi, Shail S.

2012-01-01

The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…
Analysis of correlated mutations in HIV-1 protease using spectral clustering.

PubMed

Liu, Ying; Eyal, Eran; Bahar, Ivet

2008-05-15

The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.
Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

PubMed

Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

2018-04-01

Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

PubMed

Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

2018-07-01

Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

PubMed

Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

2017-10-01

Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P < .0001 for all comparisons). We identified 3 clinical clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc
Epigenetic functions enriched in transcription factors binding to mouse recombination hotspots.

PubMed

Wu, Min; Kwoh, Chee-Keong; Przytycka, Teresa M; Li, Jing; Zheng, Jie

2012-06-21

The regulatory mechanism of recombination is a fundamental problem in genomics, with wide applications in genome-wide association studies, birth-defect diseases, molecular evolution, cancer research, etc. In mammalian genomes, recombination events cluster into short genomic regions called "recombination hotspots". Recently, a 13-mer motif enriched in hotspots is identified as a candidate cis-regulatory element of human recombination hotspots; moreover, a zinc finger protein, PRDM9, binds to this motif and is associated with variation of recombination phenotype in human and mouse genomes, thus is a trans-acting regulator of recombination hotspots. However, this pair of cis and trans-regulators covers only a fraction of hotspots, thus other regulators of recombination hotspots remain to be discovered. In this paper, we propose an approach to predicting additional trans-regulators from DNA-binding proteins by comparing their enrichment of binding sites in hotspots. Applying this approach on newly mapped mouse hotspots genome-wide, we confirmed that PRDM9 is a major trans-regulator of hotspots. In addition, a list of top candidate trans-regulators of mouse hotspots is reported. Using GO analysis we observed that the top genes are enriched with function of histone modification, highlighting the epigenetic regulatory mechanisms of recombination hotspots.
Epigenetic functions enriched in transcription factors binding to mouse recombination hotspots

PubMed Central

2012-01-01

The regulatory mechanism of recombination is a fundamental problem in genomics, with wide applications in genome-wide association studies, birth-defect diseases, molecular evolution, cancer research, etc. In mammalian genomes, recombination events cluster into short genomic regions called "recombination hotspots". Recently, a 13-mer motif enriched in hotspots is identified as a candidate cis-regulatory element of human recombination hotspots; moreover, a zinc finger protein, PRDM9, binds to this motif and is associated with variation of recombination phenotype in human and mouse genomes, thus is a trans-acting regulator of recombination hotspots. However, this pair of cis and trans-regulators covers only a fraction of hotspots, thus other regulators of recombination hotspots remain to be discovered. In this paper, we propose an approach to predicting additional trans-regulators from DNA-binding proteins by comparing their enrichment of binding sites in hotspots. Applying this approach on newly mapped mouse hotspots genome-wide, we confirmed that PRDM9 is a major trans-regulator of hotspots. In addition, a list of top candidate trans-regulators of mouse hotspots is reported. Using GO analysis we observed that the top genes are enriched with function of histone modification, highlighting the epigenetic regulatory mechanisms of recombination hotspots. PMID:22759569
Surface Analysis Cluster Tool | Materials Science | NREL

Science.gov Websites

spectroscopic ellipsometry during film deposition. The cluster tool can be used to study the effect of various prior to analysis. Here we illustrate the surface cleaning effect of an aqueous ammonia treatment on a
Central Elemental Abundance Ratios In the Perseus Cluster: Resonant Scattering or SN Ia Enrichment?

NASA Technical Reports Server (NTRS)

Dupke, Renato A.; Arnaud, Keith; White, Nicholas E. (Technical Monitor)

2001-01-01

We have determined abundance ratios in the core of the Perseus Cluster for several elements. These ratios indicate a central dominance of Type 1a supernova (SN Ia) ejects similar to that found for A496, A2199 and A3571. Simultaneous analysis of ASCA spectra from SIS1, GIS2, and GIS3 shows that the ratio of Ni to Fe abundances is approx. 3.4 +/- 1.1 times solar within the central 4'. This ratio is consistent with (and more precise than) that observed in other clusters whose central regions are dominated by SN Ia ejecta. Such a large Ni overabundance is predicted by "convective deflagration" explosion models for SNe Ia such as W7 but is inconsistent with delayed detonation models. We note that with current instrumentation the Ni K(alpha) line is confused with Fe K(beta) and that the Ni overabundance we observe has been interpreted by others as an anomalously large ratio of Fe K(beta) to Fe K(alpha) caused by resonant scattering in the Fe K(alpha) line. We argue that a central enhancement of SN Ia ejecta and hence a high ratio of Ni to Fe abundances are naturally explained by scenarios that include the generation of chemical gradients by suppressed SN Ia winds or ram pressure stripping of cluster galaxies. It is not necessary to suppose that the intracluster gas is optically thick to resonant scattering of the Fe K(alpha) line.
Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.

PubMed

Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren

2014-12-01

Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.
A uniform metallicity in the outskirts of massive, nearby galaxy clusters

DOE PAGES

Urban, O.; Werner, N.; Allen, S. W.; ...

2017-06-20

Suzaku measurements of a homogeneous metal distribution of Z ~ 0:3 Solar in the outskirts of the nearby Perseus cluster suggest that chemical elements were deposited and mixed into the intergalactic medium before clusters formed, likely over 10 billion years ago. A key prediction of this early enrichment scenario is that the intracluster medium in all massive clusters should be uniformly enriched to a similar level. Here, we confirm this prediction by determining the iron abundances in the outskirts (r > 0:25r200) of a sample of ten other nearby galaxy clusters observed with Suzaku for which robust measurements based onmore » the Fe-K lines can be made. Across our sample the iron abundances are consistent with a constant value, ZFe = 0:316 ± 0:012 Solar (Χ 2 = 28:85 for 25 degrees of freedom). This is remarkably similar to the measurements for the Perseus cluster of ZFe = 0:314±0:012 Solar, using the Solar abundance scale of Asplund et al. (2009).« less
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

PubMed

Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

2008-06-18

Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson
Application of microarray analysis on computer cluster and cloud platforms.

PubMed

Bernau, C; Boulesteix, A-L; Knaus, J

2013-01-01

Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
A Cluster Analysis of Bronchial Asthma Patients with Depressive Symptoms.

PubMed

Seino, Yo; Hasegawa, Takashi; Koya, Toshiyuki; Sakagami, Takuro; Mashima, Ichiro; Shimizu, Natsue; Muramatsu, Yoshiyuki; Muramatsu, Kumiko; Suzuki, Eiichi; Kikuchi, Toshiaki

2018-03-09

Objective Whether or not depression affects the control or severity of asthma is unclear. We performed a cluster analysis of asthma patients with depressive symptoms to clarify their characteristics. Methods and subjects Multiple medical institutions in Niigata Prefecture, Japan, were surveyed in 2014. We recorded the age, disease duration, body mass index (BMI), medications, and surveyed asthma control status and severity, as well as depressive symptoms and adherence to treatment using questionnaires. A hierarchical cluster analysis was performed on the group of patients assessed as having depression. Results Of 2,273 patients, 128 were assessed as being positive for depressive symptoms (DS[+]). Thirty-three were excluded because of missing data, and the remaining 95 DS[+] patients were classified into 3 clusters (A, B, and C). The patients in cluster A (n=19) were elderly, had severe, poorly controlled asthma, and demonstrated possible adherence barriers; those in cluster B (n=26) were elderly with a low BMI and had no significant adherence barriers but had severe, poorly controlled asthma; and those in cluster C (n=50) were younger, with a high BMI, no significant adherence barriers, well-controlled asthma, and few were severely affected. The scores for depressive symptoms were not significantly different between clusters. Conclusion About half of the patients in the DS[+] group had severe, poorly controlled asthma, and these clusters were able to be distinguished by their ASK-12 score, which reflects adherence barriers. The control status and severity of asthma may also be related to the age, disease duration, and BMI in the DS[+] group.
A proteome view of structural, functional, and taxonomic characteristics of major protein domain clusters.

PubMed

Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing

2017-10-27

Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Glascoe, L G; Glaser, R E; Chin, H S

2004-06-17

The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less
Enrichment and proteomic analysis of plasma membrane from rat dorsal root ganglions

PubMed Central

2009-01-01

Background Dorsal root ganglion (DRG) neurons are primary sensory neurons that conduct neuronal impulses related to pain, touch and temperature senses. Plasma membrane (PM) of DRG cells plays important roles in their functions. PM proteins are main performers of the functions. However, mainly due to the very low amount of DRG that leads to the difficulties in PM sample collection, few proteomic analyses on the PM have been reported and it is a subject that demands further investigation. Results By using aqueous polymer two-phase partition in combination with high salt and high pH washing, PMs were efficiently enriched, demonstrated by western blot analysis. A total of 954 non-redundant proteins were identified from the plasma membrane-enriched preparation with CapLC-MS/MS analysis subsequent to protein separation by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) or shotgun digestion. 205 (21.5%) of the identified proteins were unambiguously assigned as PM proteins, including a large number of signal proteins, receptors, ion channel and transporters. Conclusion The aqueous polymer two-phase partition is a simple, rapid and relatively inexpensive method. It is well suitable for the purification of PMs from small amount of tissues. Therefore, it is reasonable for the DRG PM to be enriched by using aqueous two-phase partition as a preferred method. Proteomic analysis showed that DRG PM was rich in proteins involved in the fundamental biological processes including material exchange, energy transformation and information transmission, etc. These data would help to our further understanding of the fundamental DRG functions. PMID:19889238
Bayesian network meta-analysis for cluster randomized trials with binary outcomes.

PubMed

Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard

2017-06-01

Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

Radial distribution of metals in the hot intra-cluster medium as observed by XMM-Newton

NASA Astrophysics Data System (ADS)

Mernier, F.; de Plaa, J.; Kaastra, J.; Zhang, Y.; Akamatsu, H.; Gu, L.; Mao, J.; Pinto, C.; Reiprich, T.; Sanders, J.

2017-10-01

The hot intra-cluster medium (ICM), which accounts for ˜80% of the baryonic content in galaxy clusters, is rich in heavy elements. Since these metals have been produced by stars and supernovae before enriching the ICM, measuring metal abundance distributions in galaxy clusters and groups provides essential clues to determine the main astrophysical source(s) and epoch(s) of the ICM enrichment. In this work, we present radial abundance profiles averaged over 44 nearby cool-core galaxy clusters, groups, and massive ellipticals (the CHEERS sample) measured with XMM-Newton EPIC. While most of the Fe of the Universe is thought to be synthesised by Type Ia supernovae (SNIa), lighter elements, such as O, Mg, Si or S, are mostly produced by core-collapse supernovae (SNcc). The derived average radial profiles of the O, Mg, Si, S, Ar, Ca, Fe, and Ni abundances out to ˜ 0.5 r_{500} allows us to accurately compare the distributions of SNIa and SNcc products in clusters and groups. By comparing our results with recent chemo-dynamical simulations, we discuss the interpretation of the profiles in the context of early and late ICM enrichments.
Fuzzy cluster analysis of high-field functional MRI data.

PubMed

Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald

2003-11-01

Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may
First CCD UBVI photometric analysis of six open cluster candidates

NASA Astrophysics Data System (ADS)

Piatti, A. E.; Clariá, J. J.; Ahumada, A. V.

2011-04-01

We have obtained CCD UBVIKC photometry down to V ˜ 22 for the open cluster candidates Haffner 3, Haffner 5, NGC 2368, Haffner 25, Hogg 3 and Hogg 4 and their surrounding fields. None of these objects have been photometrically studied so far. Our analysis shows that these stellar groups are not genuine open clusters since no clear main sequences or other meaningful features can be seen in their colour-magnitude and colour-colour diagrams. We checked for possible differential reddening across the studied fields that could be hiding the characteristics of real open clusters. However, the dust in the directions to these objects appears to be uniformly distributed. Moreover, star counts carried out within and outside the open cluster candidate fields do not support the hypothesis that these objects are real open clusters or even open cluster remnants.
CHRONOS: a time-varying method for microRNA-mediated subpathway enrichment analysis.

PubMed

Vrahatis, Aristidis G; Dimitrakopoulou, Konstantina; Balomenos, Panos; Tsakalidis, Athanasios K; Bezerianos, Anastasios

2016-03-15

In the era of network medicine and the rapid growth of paired time series mRNA/microRNA expression experiments, there is an urgent need for pathway enrichment analysis methods able to capture the time- and condition-specific 'active parts' of the biological circuitry as well as the microRNA impact. Current methods ignore the multiple dynamical 'themes'-in the form of enriched biologically relevant microRNA-mediated subpathways-that determine the functionality of signaling networks across time. To address these challenges, we developed time-vaRying enriCHment integrOmics Subpathway aNalysis tOol (CHRONOS) by integrating time series mRNA/microRNA expression data with KEGG pathway maps and microRNA-target interactions. Specifically, microRNA-mediated subpathway topologies are extracted and evaluated based on the temporal transition and the fold change activity of the linked genes/microRNAs. Further, we provide measures that capture the structural and functional features of subpathways in relation to the complete organism pathway atlas. Our application to synthetic and real data shows that CHRONOS outperforms current subpathway-based methods into unraveling the inherent dynamic properties of pathways. CHRONOS is freely available at http://biosignal.med.upatras.gr/chronos/ tassos.bezerianos@nus.edu.sg Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Common factor analysis versus principal component analysis: choice for symptom cluster research.

PubMed

Kim, Hee-Ju

2008-03-01

The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.
Metabolomic analysis can detect the composition of pasta enriched with fibre after cooking.

PubMed

Beleggia, Romina; Menga, Valeria; Platani, Cristiano; Nigro, Franca; Fragasso, Mariagiovanna; Fares, Clara

2016-07-01

Several studies have demonstrated that metabolomics has a definite place in food quality, nutritional value, and safety issues. The aim of the present study was to determine and compare the metabolites in different pasta samples with fibre, and to investigate the modifications induced in these different kinds of pasta during cooking, using a gas chromatography-mass spectrometry-based metabolomics approach. Differences were seen for some of the amino acids, which were absent in control pasta, while were present both in the commercially available high-fibre pasta (samples A-C) and the enriched pasta (samples D-F). The highest content in reducing sugars was observed in enriched samples in comparison with high-fibre pasta. The presence of stigmasterol in samples enriched with wheat bran was relevant. Cooking decreased all of the metabolites: the high-fibre pasta (A-C) and Control showed losses of amino acids and tocopherols, while for sugars and organic acids, the decrease depended on the pasta sample. The enriched pasta samples (D-F) showed the same decreases with the exception of phytosterols, and in pasta with barley the decrease of saturated fatty acids was not significant as for tocopherols in pasta with oat. Principal component analysis of the metabolites and the pasta discrimination was effective in differentiating the enriched pasta from the commercial pasta, both uncooked and cooked. The study has established that such metabolomic analyses provide useful tools in the evaluation of the changes in nutritional compounds in high-fibre and enriched pasta, both before and after cooking. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

PubMed

Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima

2017-01-01

Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and Git
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data

PubMed Central

Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J.; Intarapanich, Apichart; Tongsima, Sissades

2017-01-01

Background Biochemical methods are available for enriching 5′ ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5′ ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. Results We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5′ ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5′ ends than TSSAR. In general, the transcript 5′ ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. Conclusion ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5′ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a
Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

PubMed Central

Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

2012-01-01

Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

PubMed

Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

2012-10-01

The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

PubMed

Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

2014-04-29

To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Who are the healthy active seniors? A cluster analysis.

PubMed

Lai, Claudia K Y; Chan, Engle Angela; Chin, Kenny C W

2014-12-01

This paper reports a cluster analysis of a sample recruited from a randomized controlled trial that explored the effect of using a life story work approach to improve the psychological outcomes of older people in the community. 238 subjects from community centers were included in this analysis. After statistical testing, 169 seniors were assigned to the active ageing (AG) cluster and 69 to the inactive ageing (IG) cluster. Those in the AG were younger and healthier, with fewer chronic diseases and fewer depressive symptoms than those in the IG. They were more satisfied with their lives, and had higher self-esteem. They met with their family members more frequently, they engaged in more leisure activities and were more likely to have the ability to move freely. In summary, active ageing was observed in people with better health and functional performance. Our results echoed the limited findings reported in the literature.
Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

PubMed Central

Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

2015-01-01

Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700
Tweets clustering using latent semantic analysis

NASA Astrophysics Data System (ADS)

Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

2017-04-01

Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.
The composite sequential clustering technique for analysis of multispectral scanner data

NASA Technical Reports Server (NTRS)

Su, M. Y.

1972-01-01

The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

PubMed

Conley, Samantha

2017-12-01

The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.
Cluster-based analysis of multi-model climate ensembles

NASA Astrophysics Data System (ADS)

Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

2018-06-01

Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and
Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

PubMed

Lukashin, A V; Fuchs, R

2001-05-01

Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
Enrichment Strategies in Pediatric Drug Development: An Analysis of Trials Submitted to the US Food and Drug Administration.

PubMed

Green, Dionna J; Liu, Xiaomei I; Hua, Tianyi; Burnham, Janelle M; Schuck, Robert; Pacanowski, Michael; Yao, Lynne; McCune, Susan K; Burckart, Gilbert J; Zineh, Issam

2017-12-08

Clinical trial enrichment involves prospectively incorporating trial design elements that increase the probability of detecting a treatment effect. The use of enrichment strategies in pediatric drug development has not been systematically assessed. We analyzed the use of enrichment strategies in pediatric trials submitted to the US Food and Drug Administration from 2012-2016. In all, 112 efficacy studies associated with 76 drug development programs were assessed and their overall success rates were 78% and 75%, respectively. Eighty-eight trials (76.8%) employed at least one enrichment strategy; of these, 66.3% employed multiple enrichment strategies. The highest trial success rates were achieved when all three enrichment strategies (practical, predictive, and prognostic) were used together within a single trial (87.5%), while the lowest success rate was observed when no enrichment strategy was used (65.4%). The use of enrichment strategies in pediatric trials was found to be associated with trial and program success in our analysis. © 2017 American Society for Clinical Pharmacology and Therapeutics.
Globular Cluster Abundances from High-resolution, Integrated-light Spectroscopy. II. Expanding the Metallicity Range for Old Clusters and Updated Analysis Techniques

NASA Astrophysics Data System (ADS)

Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

2017-01-01

We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ˜0.1 dex for GCs with metallicities as high as [Fe/H] = -0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe I, Ca I, Si I, Ni I, and Ba II. The elements that show the greatest differences include Mg I and Zr I. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.

The Hottest Horizontal-Branch Stars in Omega Centauri: Late Hot Flasher vs. Helium Enrichment

NASA Technical Reports Server (NTRS)

Moehler, S.; Dreizler, S.; Lanz, T.; Bono, G.; Sweigart, A V.; Calamida, A.; Monelli, M.; Nonino, M.

2007-01-01

UV observations of some massive globular clusters uncovered a significant population of very hot stars below the hot end of the horizontal branch (HB), the so-called blue hook stars. This feature might be explained either by the late hot flasher scenario here stars experience the helium flash while on the white dwarf cooling curve or by the helium-rich sub-population recently postulated to exist in some clusters. Spectroscopic analyses of blue hook stars in omega Cen and NGC 2808 support the late hot flasher scenario, but the stars contain much less helium than expected and the predicted C, N enrichment could not be verified from existing data. We want to determine effective temperatures, surface gravities and abundances of He, C, N in blue hook and canonical extreme horizontal branch (EHB) star candidates. Moderately high resolution spectra of stars at the hot end of the blue horizontal branch in the globular cluster omega Cen were analysed for atmospheric parameters (T(sub eff), log g) and abundances using LTE and Non-LTE model atmospheres. In the temperature range 30,000 K to 50,000 K we find that 37% of our stars are helium-poor (log nHe/nH less than -2), 49% have solar helium abundance within a factor of 3 (-1.5 less than or equal to log nHe/nH less than or equal to -0.5) and 14% are helium rich (log nHe/nH greater than -0.4). We also find carbon enrichment in step with helium enrichment, with a maximum carbon enrichment of 3% by mass. At least 30% of the hottest HB stars in omega Centauri show helium abundances well above the predictions from the helium enrichment scenario (Y = 0.42 corresponding to log nHe/nH approximately equal to -0.74). In addition the most helium-rich stars show strong carbon enrichment as predicted by the late hot flasher scenario. We conclude that the helium-rich HB stars in omega Cen cannot be explained solely by the helium-enrichment scenario invoked to explain the blue main sequence.
Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data.

PubMed

Marco-Ramell, Anna; Palau-Rodriguez, Magali; Alay, Ania; Tulipani, Sara; Urpi-Sarda, Mireia; Sanchez-Pla, Alex; Andres-Lacueva, Cristina

2018-01-02

Bioinformatic tools for the enrichment of 'omics' datasets facilitate interpretation and understanding of data. To date few are suitable for metabolomics datasets. The main objective of this work is to give a critical overview, for the first time, of the performance of these tools. To that aim, datasets from metabolomic repositories were selected and enriched data were created. Both types of data were analysed with these tools and outputs were thoroughly examined. An exploratory multivariate analysis of the most used tools for the enrichment of metabolite sets, based on a non-metric multidimensional scaling (NMDS) of Jaccard's distances, was performed and mirrored their diversity. Codes (identifiers) of the metabolites of the datasets were searched in different metabolite databases (HMDB, KEGG, PubChem, ChEBI, BioCyc/HumanCyc, LipidMAPS, ChemSpider, METLIN and Recon2). The databases that presented more identifiers of the metabolites of the dataset were PubChem, followed by METLIN and ChEBI. However, these databases had duplicated entries and might present false positives. The performance of over-representation analysis (ORA) tools, including BioCyc/HumanCyc, ConsensusPathDB, IMPaLA, MBRole, MetaboAnalyst, Metabox, MetExplore, MPEA, PathVisio and Reactome and the mapping tool KEGGREST, was examined. Results were mostly consistent among tools and between real and enriched data despite the variability of the tools. Nevertheless, a few controversial results such as differences in the total number of metabolites were also found. Disease-based enrichment analyses were also assessed, but they were not found to be accurate probably due to the fact that metabolite disease sets are not up-to-date and the difficulty of predicting diseases from a list of metabolites. We have extensively reviewed the state-of-the-art of the available range of tools for metabolomic datasets, the completeness of metabolite databases, the performance of ORA methods and disease-based analyses
Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.

2014-02-14

Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less
Two worlds collide: image analysis methods for quantifying structural variation in cluster molecular dynamics.

PubMed

Steenbergen, K G; Gaston, N

2014-02-14

Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.
Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.

PubMed

Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R

2014-06-01

Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic
GSL-enriched membrane microdomains in innate immune responses.

PubMed

Nakayama, Hitoshi; Ogawa, Hideoki; Takamori, Kenji; Iwabuchi, Kazuhisa

2013-06-01

Many pathogens target glycosphingolipids (GSLs), which, together with cholesterol, GPI-anchored proteins, and various signaling molecules, cluster on host cell membranes to form GSL-enriched membrane microdomains (lipid rafts). These GSL-enriched membrane microdomains may therefore be involved in host-pathogen interactions. Innate immune responses are triggered by the association of pathogens with phagocytes, such as neutrophils, macrophages and dendritic cells. Phagocytes express a diverse array of pattern-recognition receptors (PRRs), which sense invading microorganisms and trigger pathogen-specific signaling. PRRs can recognize highly conserved pathogen-associated molecular patterns expressed on microorganisms. The GSL lactosylceramide (LacCer, CDw17), which binds to various microorganisms, including Candida albicans, is expressed predominantly on the plasma membranes of human mature neutrophils and forms membrane microdomains together with the Src family tyrosine kinase Lyn. These LacCer-enriched membrane microdomains can mediate superoxide generation, migration, and phagocytosis, indicating that LacCer functions as a PRR in innate immunity. Moreover, the interactions of GSL-enriched membrane microdomains with membrane proteins, such as growth factor receptors, are important in mediating the physiological properties of these proteins. Similarly, we recently found that interactions between LacCer-enriched membrane microdomains and CD11b/CD18 (Mac-1, CR3, or αMβ2-integrin) are significant for neutrophil phagocytosis of non-opsonized microorganisms. This review describes the functional role of LacCer-enriched membrane microdomains and their interactions with CD11b/CD18.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

PubMed

Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

2015-01-01

To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
Parallel Comparison of N-Linked Glycopeptide Enrichment Techniques Reveals Extensive Glycoproteomic Analysis of Plasma Enabled by SAX-ERLIC.

PubMed

Totten, Sarah M; Feasley, Christa L; Bermudez, Abel; Pitteri, Sharon J

2017-03-03

Protein glycosylation is of increasing interest due to its important roles in protein function and aberrant expression with disease. Characterizing protein glycosylation remains analytically challenging due to its low abundance, ion suppression issues, and microheterogeneity at glycosylation sites, especially in complex samples such as human plasma. In this study, the utility of three common N-linked glycopeptide enrichment techniques is compared using human plasma. By analysis on an LTQ-Orbitrap Elite mass spectrometer, electrostatic repulsion hydrophilic interaction liquid chromatography using strong anion exchange solid-phase extraction (SAX-ERLIC) provided the most extensive N-linked glycopeptide enrichment when compared with multilectin affinity chromatography (M-LAC) and Sepharose-HILIC enrichments. SAX-ERLIC enrichment yielded 191 unique glycoforms across 72 glycosylation sites from 48 glycoproteins, which is more than double that detected using other enrichment techniques. The greatest glycoform diversity was observed in SAX-ERLIC enrichment, with no apparent bias toward specific glycan types. SAX-ERLIC enrichments were additionally analyzed by an Orbitrap Fusion Lumos mass spectrometer to maximize glycopeptide identifications for a more comprehensive assessment of protein glycosylation. In these experiments, 829 unique glycoforms were identified across 208 glycosylation sites from 95 plasma glycoproteins, a significant improvement from the initial method comparison and one of the most extensive site-specific glycosylation analysis in immunodepleted human plasma to date. Data are available via ProteomeXchange with identifier PXD005655.
Clustering gene expression regulators: new approach to disease subtyping.

PubMed

Pyatnitskiy, Mikhail; Mazo, Ilya; Shkrob, Maria; Schwartz, Elena; Kotelnikova, Ekaterina

2014-01-01

One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.
Clustering Gene Expression Regulators: New Approach to Disease Subtyping

PubMed Central

Pyatnitskiy, Mikhail; Mazo, Ilya; Shkrob, Maria; Schwartz, Elena; Kotelnikova, Ekaterina

2014-01-01

One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient. PMID:24416320
Impact of Upfront Cellular Enrichment by Laser Capture Microdissection on Protein and Phosphoprotein Drug Target Signaling Activation Measurements in Human Lung Cancer: Implications for Personalized Medicine

PubMed Central

Elisa, Baldelli; B., Haura Eric; Lucio, Crinò; Douglas, Cress W.; Vienna, Ludovini; B., Schabath Matthew; A., Liotta Lance; F., Petricoin Emanuel; Mariaelena, Pierobon

2015-01-01

Purpose The aim of this study was to evaluate whether upfront cellular enrichment via laser capture microdissection is necessary for accurately quantifying predictive biomarkers in non-small cell lung cancer tumors. Experimental design Fifteen snap frozen surgical biopsies were analyzed. Whole tissue lysate and matched highly enriched tumor epithelium via laser capture microdissection (LCM) were obtained for each patient. The expression and activation/phosphorylation levels of 26 proteins were measured by reverse phase protein microarray. Differences in signaling architecture of dissected and undissected matched pairs were visualized using unsupervised clustering analysis, bar graphs, and scatter plots. Results Overall patient matched LCM and undissected material displayed very distinct and differing signaling architectures with 93% of the matched pairs clustering separately. These differences were seen regardless of the amount of starting tumor epithelial content present in the specimen. Conclusions and clinical relevance These results indicate that LCM driven upfront cellular enrichment is necessary to accurately determine the expression/activation levels of predictive protein signaling markers although results should be evaluated in larger clinical settings. Upfront cellular enrichment of the target cell appears to be an important part of the workflow needed for the accurate quantification of predictive protein signaling biomarkers. Larger independent studies are warranted. PMID:25676683
Autoantibodies in pediatric systemic lupus erythematosus: ethnic grouping, cluster analysis, and clinical correlations.

PubMed

Jurencák, Roman; Fritzler, Marvin; Tyrrell, Pascal; Hiraki, Linda; Benseler, Susanne; Silverman, Earl

2009-02-01

(1) To evaluate the spectrum of serum autoantibodies in pediatric-onset systemic lupus erythematosus (pSLE) with a focus on ethnic differences; (2) using cluster analysis, to identify patients with similar autoantibody patterns and to determine their clinical associations. A single-center cohort study of all patients with newly diagnosed pSLE seen over an 8-year period was performed. Ethnicity, clinical, and serological data were prospectively collected from 156/169 patients (92%). The frequencies of 10 selected autoantibodies among ethnic groups were compared. Cluster analysis identified groups of patients with similar autoantibody profiles. Associations of these groups with clinical and laboratory features of pSLE were examined. Among our 5 ethnic groups, there were differences only in the prevalence of anti-U1RNP and anti-Sm antibodies, which occurred more frequently in non-Caucasian patients (p < 0.0001, p < 0.01, respectively). Cluster analysis revealed 3 autoantibody clusters. Cluster 1 consisted of anti-dsDNA antibodies. Cluster 2 consisted of anti-dsDNA, antichromatin, antiribosomal P, anti-U1RNP, anti-Sm, anti-Ro and anti-La autoantibody. Cluster 3 consisted of anti-dsDNA, anti-RNP, and anti-Sm autoantibody. The highest proportion of Caucasians was in cluster 1 (p < 0.05), which was characterized by a mild disease with infrequent major organ involvement compared to cluster 2, which had the highest frequency of nephritis, renal failure, serositis, and hemolytic anemia, or cluster 3, which was characterized by frequent neuropsychiatric disease and nephritis. We observed ethnic differences in autoantibody profiles in pSLE. Autoantibodies tended to cluster together and these clusters were associated with different clinical courses.
Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.

PubMed

Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah

2014-06-01

Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights
Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis.

PubMed

Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I

2016-02-01

Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at
Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome.

PubMed

Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard

2014-07-01

Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were
Cluster Analysis of the Luria-Nebraska Neuropsychological Battery with Learning Disabled Adults.

ERIC Educational Resources Information Center

McCue, Michael; And Others

The study reports a cluster analysis of Luria-Nebraska Neuropsychological Battery sources of 25 learning disabled adults. The cluster analysis suggested the presence of three subgroups within this sample, one having high elevations on the Rhythm, Writing, Reading, and Arithmetic Rhythm scales, the second having an extremely high evelation on the…
Multi-viewpoint clustering analysis

NASA Technical Reports Server (NTRS)

Mehrotra, Mala; Wild, Chris

1993-01-01

In this paper, we address the feasibility of partitioning rule-based systems into a number of meaningful units to enhance the comprehensibility, maintainability and reliability of expert systems software. Preliminary results have shown that no single structuring principle or abstraction hierarchy is sufficient to understand complex knowledge bases. We therefore propose the Multi View Point - Clustering Analysis (MVP-CA) methodology to provide multiple views of the same expert system. We present the results of using this approach to partition a deployed knowledge-based system that navigates the Space Shuttle's entry. We also discuss the impact of this approach on verification and validation of knowledge-based systems.
Cluster Analysis of Vulnerable Groups in Acute Traumatic Brain Injury Rehabilitation.

PubMed

Kucukboyaci, N Erkut; Long, Coralynn; Smith, Michelle; Rath, Joseph F; Bushnik, Tamara

2018-01-06

To analyze the complex relation between various social indicators that contribute to socioeconomic status and health care barriers. Cluster analysis of historical patient data obtained from inpatient visits. Inpatient rehabilitation unit in a large urban university hospital. Adult patients (N=148) receiving acute inpatient care, predominantly for closed head injury. Not applicable. We examined the membership of patients with traumatic brain injury in various "vulnerable group" clusters (eg, homeless, unemployed, racial/ethnic minority) and characterized the rehabilitation outcomes of patients (eg, duration of stay, changes in FIM scores between admission to inpatient stay and discharge). The cluster analysis revealed 4 major clusters (ie, clusters A-D) separated by vulnerable group memberships, with distinct durations of stay and FIM gains during their stay. Cluster B, the largest cluster and also consisting of mostly racial/ethnic minorities, had the shortest duration of hospital stay and one of the lowest FIM improvements among the 4 clusters despite higher FIM scores at admission. In cluster C, also consisting of mostly ethnic minorities with multiple socioeconomic status vulnerabilities, patients were characterized by low cognitive FIM scores at admission and the longest duration of stay, and they showed good improvement in FIM scores. Application of clustering techniques to inpatient data identified distinct clusters of patients who may experience differences in their rehabilitation outcome due to their membership in various "at-risk" groups. The results identified patients (ie, cluster B, with minority patients; and cluster D, with elderly patients) who attain below-average gains in brain injury rehabilitation. The results also suggested that systemic (eg, duration of stay) or clinical service improvements (eg, staff's language skills, ability to offer substance abuse therapy, provide appropriate referrals, liaise with intensive social work services, or plan
Clustering analysis for muon tomography data elaboration in the Muon Portal project

NASA Astrophysics Data System (ADS)

Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

2015-05-01

Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Enrichment and single-cell analysis of circulating tumor cells

PubMed Central

Song, Yanling; Tian, Tian; Shi, Yuanzhi; Liu, Wenli; Zou, Yuan; Khajvand, Tahereh; Wang, Sili; Zhu, Zhi

2017-01-01

Up to 90% of cancer-related deaths are caused by metastatic cancer. Circulating tumor cells (CTCs), a type of cancer cell that spreads through the blood after detaching from a solid tumor, are essential for the establishment of distant metastasis for a given cancer. As a new type of liquid biopsy, analysis of CTCs offers the possibility to avoid invasive tissue biopsy procedures with practical implications for diagnostics. The fundamental challenges of analyzing and profiling CTCs are the extremely low abundances of CTCs in the blood and the intrinsic heterogeneity of CTCs. Various technologies have been proposed for the enrichment and single-cell analysis of CTCs. This review aims to provide in-depth insights into CTC analysis, including various techniques for isolation of CTCs with capture methods based on physical and biochemical principles, and single-cell analysis of CTCs at the genomic, proteomic and phenotypic level, as well as current developmental trends and promising research directions. PMID:28451298

A novel symptom cluster analysis among ambulatory HIV/AIDS patients in Uganda.

PubMed

Namisango, Eve; Harding, Richard; Katabira, Elly T; Siegert, Richard J; Powell, Richard A; Atuhaire, Leonard; Moens, Katrien; Taylor, Steve

2015-01-01

Symptom clusters are gaining importance given HIV/AIDS patients experience multiple, concurrent symptoms. This study aimed to: determine clusters of patients with similar symptom combinations; describe symptom combinations distinguishing the clusters; and evaluate the clusters regarding patient socio-demographic, disease and treatment characteristics, quality of life (QOL) and functional performance. This was a cross-sectional study of 302 adult HIV/AIDS outpatients consecutively recruited at two teaching and referral hospitals in Uganda. Socio-demographic and seven-day period symptom prevalence and distress data were self-reported using the Memorial Symptom Assessment Schedule. QOL was assessed using the Medical Outcome Scale and functional performance using the Karnofsky Performance Scale. Symptom clusters were established using hierarchical cluster analysis with squared Euclidean distances using Ward's clustering methods based on symptom occurrence. Analysis of variance compared clusters on mean QOL and functional performance scores. Patient subgroups were categorised based on symptom occurrence rates. Five symptom occurrence clusters were identified: Cluster 1 (n=107), high-low for sensory discomfort and eating difficulties symptoms; Cluster 2 (n=47), high-low for psycho-gastrointestinal symptoms; Cluster 3 (n=71), high for pain and sensory disturbance symptoms; Cluster 4 (n=35), all high for general HIV/AIDS symptoms; and Cluster 5 (n=48), all low for mood-cognitive symptoms. The all high occurrence cluster was associated with worst functional status, poorest QOL scores and highest symptom-associated distress. Use of antiretroviral therapy was associated with all high symptom occurrence rate (Fisher's exact=4, P<0.001). CD4 count group below 200 was associated with the all high occurrence rate symptom cluster (Fisher's exact=41, P<0.001). Symptom clusters have a differential, affect HIV/AIDS patients' self-reported outcomes, with the subgroup experiencing high
Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

PubMed

Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

2017-01-01

Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.
SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

PubMed

Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

2018-01-01

Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
The Use of Cluster Analysis in Typological Research on Community College Students

ERIC Educational Resources Information Center

Bahr, Peter Riley; Bielby, Rob; House, Emily

2011-01-01

One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

PubMed Central

Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

2015-01-01

Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

PubMed

Yokoyama, Eiji; Uchimura, Masako

2007-11-01

Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

PubMed

Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

2016-04-01

Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.
Bioinformatics/biostatistics: microarray analysis.

PubMed

Eichler, Gabriel S

2012-01-01

The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).
Length bias correction in gene ontology enrichment analysis using logistic regression.

PubMed

Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

2012-01-01

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
[Raman spectroscopy fluorescence background correction and its application in clustering analysis of medicines].

PubMed

Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei

2010-08-01

During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.
Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

PubMed

Lai, Yinglei; Zhang, Fanni; Nayak, Tapan K; Modarres, Reza; Lee, Norman H; McCaffrey, Timothy A

2014-01-01

Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. This study illustrates that we can
Equivalent damage validation by variable cluster analysis

NASA Astrophysics Data System (ADS)

Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

2016-06-01

The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.
Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

PubMed

Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

2015-09-01

We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.
[Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

PubMed

Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

2011-11-01

The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.
CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles

PubMed Central

Peterson, Leif E

2002-01-01

CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
An integrated microfluidic analysis microsystems with bacterial capture enrichment and in-situ impedance detection

NASA Astrophysics Data System (ADS)

Liu, Hai-Tao; Wen, Zhi-Yu; Xu, Yi; Shang, Zheng-Guo; Peng, Jin-Lan; Tian, Peng

2017-09-01

In this paper, an integrated microfluidic analysis microsystems with bacterial capture enrichment and in-situ impedance detection was purposed based on microfluidic chips dielectrophoresis technique and electrochemical impedance detection principle. The microsystems include microfluidic chip, main control module, and drive and control module, and signal detection and processing modulet and result display unit. The main control module produce the work sequence of impedance detection system parts and achieve data communication functions, the drive and control circuit generate AC signal which amplitude and frequency adjustable, and it was applied on the foodborne pathogens impedance analysis microsystems to realize the capture enrichment and impedance detection. The signal detection and processing circuit translate the current signal into impendence of bacteria, and transfer to computer, the last detection result is displayed on the computer. The experiment sample was prepared by adding Escherichia coli standard sample into chicken sample solution, and the samples were tested on the dielectrophoresis chip capture enrichment and in-situ impedance detection microsystems with micro-array electrode microfluidic chips. The experiments show that the Escherichia coli detection limit of microsystems is 5 × 104 CFU/mL and the detection time is within 6 min in the optimization of voltage detection 10 V and detection frequency 500 KHz operating conditions. The integrated microfluidic analysis microsystems laid the solid foundation for rapid real-time in-situ detection of bacteria.
Searching for a Gulf War syndrome using cluster analysis.

PubMed

Everitt, B; Ismail, K; David, A S; Wessely, S

2002-11-01

Gulf veterans report medically unexplained symptoms more frequently than non-Gulf veterans did. We examined whether Gulf and non-Gulf veterans could be distinguished by their patterns of symptom reporting. A k-means cluster analysis was applied to 500 randomly sampled veterans from each of three United Kingdom military cohorts of veterans; those deployed to the Gulf conflict between 1990 and 1991; to the Bosnia peacekeeping mission between 1992 and 1997; and military personnel who were in active service but not deployed to the Gulf (Era). Sociodemographic, health variables and scores for ten symptom groups were calculated. The gap statistic indicated the five-group solution as one that provided a particularly informative description of the structure in the data. Cluster 1 consisted of low scores for all symptom groups. Cluster 2 had veterans with highest symptom scores for musculoskeletal symptoms and high scores for psychiatric symptoms. Cluster 3 had high scores for psychiatric symptoms and marginally elevated scores for the remaining nine groups symptom groups. Cluster 4 had elevated scores for musculoskeletal symptoms only and cluster 5 was distinguishable from the other clusters in having high scores in all symptom groups, especially psychiatric and musculoskeletal. The findings do not support the existence of a unique syndrome affecting a subgroup of Gulf veterans but emphasize the excess of non-specific self-reported ill health in this group.
Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn

2014-07-15

Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

2008-05-12

The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Analysis of RXTE data on Clusters of Galaxies

NASA Technical Reports Server (NTRS)

Petrosian, Vahe

2004-01-01

This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

Improved Methods for the Enrichment and Analysis of Glycated Peptides

PubMed Central

Zhang, Qibin; Schepmoes, Athena A.; Brock, Jonathan W. C.; Wu, Si; Moore, Ronald J.; Purvine, Samuel O.; Baynes, John W.; Smith, Richard D.; Metz, Thomas O.

2009-01-01

Nonenzymatic glycation of tissue proteins has important implications in the development of complications of diabetes mellitus. Herein we report improved methods for the enrichment and analysis of glycated peptides using boronate affinity chromatography and electron-transfer dissociation mass spectrometry, respectively. The enrichment of glycated peptides was improved by replacing an off-line desalting step with an online wash of column-bound glycated peptides using 50 mM ammonium acetate, followed by elution with 100 mM acetic acid. The analysis of glycated peptides by MS/MS was improved by considering only higher charged (≥3) precursor ions during data-dependent acquisition, which increased the number of glycated peptide identifications. Similarly, the use of supplemental collisional activation after electron transfer (ETcaD) resulted in more glycated peptide identifications when the MS survey scan was acquired with enhanced resolution. Acquiring ETD-MS/MS data at a normal MS survey scan rate, in conjunction with the rejection of both 1+ and 2+ precursor ions, increased the number of identified glycated peptides relative to ETcaD or the enhanced MS survey scan rate. Finally, an evaluation of trypsin, Arg-C, and Lys-C showed that tryptic digestion of glycated proteins was comparable to digestion with Lys-C and that both were better than Arg-C in terms of the number of glycated peptides and corresponding glycated proteins identified by LC–MS/MS. PMID:18989935
Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.

PubMed

Gorzalczany, Marian B; Rudzinski, Filip

2017-06-07

This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.
A formal concept analysis approach to consensus clustering of multi-experiment expression data

PubMed Central

2014-01-01

Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological
NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.

PubMed

Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques

2008-07-01

The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.
Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

PubMed

Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

2017-01-01

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
XMM-Newton Observations of the Toothbrush and Sausage Clusters

NASA Astrophysics Data System (ADS)

Kara, S.; Mernier, F.; Ezer, C.; Akamatsu, H.; Ercan, E.

2017-10-01

Galaxy clusters are the largest gravitationally-bound objects in the universe. The member galaxies are embedded in a hot X-ray emitting Intra Cluster Medium (ICM) that has been enriched with metals produced by supernovae over the last billion years. Here we report new results from XMM-Newton archival observations of the merging clusters 1RXSJ0603.3+4213 and CIZA J2242.8+5301. These two clusters, also known as the Toothbrush and Sausage clusters, respectively, show a large radio relic associated with a merger shock North of their respective core. We show the distribution of the metal abundances with respect to the merger structures in these two clusters. The results are derived from spatially resolved X-ray spectra from the EPIC instrument on board XMM-Newton.
Analyses of n-alkanes degrading community dynamics of a high-temperature methanogenic consortium enriched from production water of a petroleum reservoir by a combination of molecular techniques.

PubMed

Zhou, Lei; Li, Kai-Ping; Mbadinga, Serge Maurice; Yang, Shi-Zhong; Gu, Ji-Dong; Mu, Bo-Zhong

2012-08-01

Despite the knowledge on anaerobic degradation of hydrocarbons and signature metabolites in the oil reservoirs, little is known about the functioning microbes and the related biochemical pathways involved, especially about the methanogenic communities. In the present study, a methanogenic consortium enriched from high-temperature oil reservoir production water and incubated at 55 °C with a mixture of long chain n-alkanes (C(15)-C(20)) as the sole carbon and energy sources was characterized. Biodegradation of n-alkanes was observed as methane production in the alkanes-amended methanogenic enrichment reached 141.47 μmol above the controls after 749 days of incubation, corresponding to 17 % of the theoretical total. GC-MS analysis confirmed the presence of putative downstream metabolites probably from the anaerobic biodegradation of n-alkanes and indicating an incomplete conversion of the n-alkanes to methane. Enrichment cultures taken at different incubation times were subjected to microbial community analysis. Both 16S rRNA gene clone libraries and DGGE profiles showed that alkanes-degrading community was dynamic during incubation. The dominant bacterial species in the enrichment cultures were affiliated with Firmicutes members clustering with thermophilic syntrophic bacteria of the genera Moorella sp. and Gelria sp. Other represented within the bacterial community were members of the Leptospiraceae, Thermodesulfobiaceae, Thermotogaceae, Chloroflexi, Bacteroidetes and Candidate Division OP1. The archaeal community was predominantly represented by members of the phyla Crenarchaeota and Euryarchaeota. Corresponding sequences within the Euryarchaeota were associated with methanogens clustering with orders Methanomicrobiales, Methanosarcinales and Methanobacteriales. On the other hand, PCR amplification for detection of functional genes encoding the alkylsuccinate synthase α-subunit (assA) was positive in the enrichment cultures. Moreover, the appearance of a new ass
Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

NASA Astrophysics Data System (ADS)

Yen, Chi-Fu; Sivasankar, Sanjeevi

2018-03-01

Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.
Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

PubMed

Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

2015-01-01

Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to
Insights into the chemical composition of the metal-poor Milky Way halo globular cluster NGC 6426

NASA Astrophysics Data System (ADS)

Hanke, M.; Koch, A.; Hansen, C. J.; McWilliam, A.

2017-03-01

We present our detailed spectroscopic analysis of the chemical composition of four red giant stars in the halo globular cluster NGC 6426. We obtained high-resolution spectra using the Magellan2/MIKE spectrograph, from which we derived equivalent widths and subsequently computed abundances of 24 species of 22 chemical elements. For the purpose of measuring equivalent widths, we developed a new semi-automated tool, called EWCODE. We report a mean Fe content of [Fe/H] =-2.34 ± 0.05 dex (stat.) in accordance with previous studies. At a mean α-abundance of [(Mg, Si, Ca)/3 Fe] = 0.39 ± 0.03 dex, NGC 6426 falls on the trend drawn by the Milky Way halo and other globular clusters at comparably low metallicities. The distribution of the lighter α-elements as well as the enhanced ratio [Zn/Fe] = 0.39 dex could originate from hypernova enrichment of the pre-cluster medium. We find tentative evidence for a spread in the elements Mg, Si, and Zn, indicating an enrichment scenario, where ejecta of evolved massive stars of a slightly older population have polluted a newly born younger one. The heavy element abundances in this cluster fit well into the picture of metal-poor globular clusters, which in that respect appear to be remarkably homogeneous. The pattern of the neutron-capture elements heavier than Zn points toward an enrichment history governed by the r-process with little, if any, sign of s-process contributions. This finding is supported by the striking similarity of our program stars to the metal-poor field star HD 108317. This paper includes data gathered with the 6.5-m Magellan Telescopes located at Las Campanas Observatory, Chile.Equivalent widths and full Table 2 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/599/A97
Distinct Phenotypes of Cigarette Smokers Identified by Cluster Analysis of Patients with Severe Asthma.

PubMed

Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu

2015-12-01

Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including cigarette smokers. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Twelve clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters were identified, including two characterized by high pack-year exposure to cigarette smoking and low FEV1/FVC. There were marked differences between the two clusters of cigarette smokers. One had high levels of circulating eosinophils, high IgE levels, and a high sinus disease score. The other was characterized by low levels of the same parameters. Sputum analysis revealed increased levels of IL-5 in the former cluster and increased levels of IL-6 and osteopontin in the latter. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 1 year later. This study reveals two distinct phenotypes of severe asthma in current and former cigarette smokers with potentially different biological pathways contributing to fixed airflow limitation. Clinical trial registered with www.umin.ac.jp (000003254).
The dynamics of cyclone clustering in re-analysis and a high-resolution climate model

NASA Astrophysics Data System (ADS)

Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len

2017-04-01

Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.
N-of-1-pathways MixEnrich: advancing precision medicine via single-subject analysis in discovering dynamic changes of transcriptomes.

PubMed

Li, Qike; Schissler, A Grant; Gardeux, Vincent; Achour, Ikbel; Kenost, Colleen; Berghout, Joanne; Li, Haiquan; Zhang, Hao Helen; Lussier, Yves A

2017-05-24

Transcriptome analytic tools are commonly used across patient cohorts to develop drugs and predict clinical outcomes. However, as precision medicine pursues more accurate and individualized treatment decisions, these methods are not designed to address single-patient transcriptome analyses. We previously developed and validated the N-of-1-pathways framework using two methods, Wilcoxon and Mahalanobis Distance (MD), for personal transcriptome analysis derived from a pair of samples of a single patient. Although, both methods uncover concordantly dysregulated pathways, they are not designed to detect dysregulated pathways with up- and down-regulated genes (bidirectional dysregulation) that are ubiquitous in biological systems. We developed N-of-1-pathways MixEnrich, a mixture model followed by a gene set enrichment test, to uncover bidirectional and concordantly dysregulated pathways one patient at a time. We assess its accuracy in a comprehensive simulation study and in a RNA-Seq data analysis of head and neck squamous cell carcinomas (HNSCCs). In presence of bidirectionally dysregulated genes in the pathway or in presence of high background noise, MixEnrich substantially outperforms previous single-subject transcriptome analysis methods, both in the simulation study and the HNSCCs data analysis (ROC Curves; higher true positive rates; lower false positive rates). Bidirectional and concordant dysregulated pathways uncovered by MixEnrich in each patient largely overlapped with the quasi-gold standard compared to other single-subject and cohort-based transcriptome analyses. The greater performance of MixEnrich presents an advantage over previous methods to meet the promise of providing accurate personal transcriptome analysis to support precision medicine at point of care.
Sun Protection Belief Clusters: Analysis of Amazon Mechanical Turk Data.

PubMed

Santiago-Rivas, Marimer; Schnur, Julie B; Jandorf, Lina

2016-12-01

This study aimed (i) to determine whether people could be differentiated on the basis of their sun protection belief profiles and individual characteristics and (ii) explore the use of a crowdsourcing web service for the assessment of sun protection beliefs. A sample of 500 adults completed an online survey of sun protection belief items using Amazon Mechanical Turk. A two-phased cluster analysis (i.e., hierarchical and non-hierarchical K-means) was utilized to determine clusters of sun protection barriers and facilitators. Results yielded three distinct clusters of sun protection barriers and three distinct clusters of sun protection facilitators. Significant associations between gender, age, sun sensitivity, and cluster membership were identified. Results also showed an association between barrier and facilitator cluster membership. The results of this study provided a potential alternative approach to developing future sun protection promotion initiatives in the population. Findings add to our knowledge regarding individuals who support, oppose, or are ambivalent toward sun protection and inform intervention research by identifying distinct subtypes that may best benefit from (or have a higher need for) skin cancer prevention efforts.
Comprehensive viral enrichment enables sensitive respiratory virus genomic identification and analysis by next generation sequencing.

PubMed

O'Flaherty, Brigid M; Li, Yan; Tao, Ying; Paden, Clinton R; Queen, Krista; Zhang, Jing; Dinwiddie, Darrell L; Gross, Stephen M; Schroth, Gary P; Tong, Suxiang

2018-06-01

Next generation sequencing (NGS) technologies have revolutionized the genomics field and are becoming more commonplace for identification of human infectious diseases. However, due to the low abundance of viral nucleic acids (NAs) in relation to host, viral identification using direct NGS technologies often lacks sufficient sensitivity. Here, we describe an approach based on two complementary enrichment strategies that significantly improves the sensitivity of NGS-based virus identification. To start, we developed two sets of DNA probes to enrich virus NAs associated with respiratory diseases. The first set of probes spans the genomes, allowing for identification of known viruses and full genome sequencing, while the second set targets regions conserved among viral families or genera, providing the ability to detect both known and potentially novel members of those virus groups. Efficiency of enrichment was assessed by NGS testing reference virus and clinical samples with known infection. We show significant improvement in viral identification using enriched NGS compared to unenriched NGS. Without enrichment, we observed an average of 0.3% targeted viral reads per sample. However, after enrichment, 50%-99% of the reads per sample were the targeted viral reads for both the reference isolates and clinical specimens using both probe sets. Importantly, dramatic improvements on genome coverage were also observed following virus-specific probe enrichment. The methods described here provide improved sensitivity for virus identification by NGS, allowing for a more comprehensive analysis of disease etiology. © 2018 O'Flaherty et al.; Published by Cold Spring Harbor Laboratory Press.
Gathering Real World Evidence with Cluster Analysis for Clinical Decision Support.

PubMed

Xia, Eryu; Liu, Haifeng; Li, Jing; Mei, Jing; Li, Xuejun; Xu, Enliang; Li, Xiang; Hu, Gang; Xie, Guotong; Xu, Meilin

2017-01-01

Clinical decision support systems are information technology systems that assist clinical decision-making tasks, which have been shown to enhance clinical performance. Cluster analysis, which groups similar patients together, aims to separate patient cases into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. Useful as it is, the application of cluster analysis in clinical decision support systems is less reported. Here, we describe the usage of cluster analysis in clinical decision support systems, by first dividing patient cases into similar groups and then providing diagnosis or treatment suggestions based on the group profiles. This integration provides data for clinical decisions and compiles a wide range of clinical practices to inform the performance of individual clinicians. We also include an example usage of the system under the scenario of blood lipid management in type 2 diabetes. These efforts represent a step toward promoting patient-centered care and enabling precision medicine.
GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew

2017-01-10

We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clustersmore » may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.« less
Chaotic map clustering algorithm for EEG analysis

NASA Astrophysics Data System (ADS)

Bellotti, R.; De Carlo, F.; Stramaglia, S.

2004-03-01

The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Instability of Hierarchical Cluster Analysis Due to Input Order of the Data: The PermuCLUSTER Solution

ERIC Educational Resources Information Center

van der Kloot, Willem A.; Spaans, Alexander M. J.; Heiser, Willem J.

2005-01-01

Hierarchical agglomerative cluster analysis (HACA) may yield different solutions under permutations of the input order of the data. This instability is caused by ties, either in the initial proximity matrix or arising during agglomeration. The authors recommend to repeat the analysis on a large number of random permutations of the rows and columns…
Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

PubMed

Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

2014-07-01

Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

NASA Astrophysics Data System (ADS)

Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

2017-12-01

Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.
MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

PubMed Central

Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

2014-01-01

Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

PubMed

Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

2016-08-31

Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed
A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

PubMed

Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

2015-01-01

Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.
Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

ERIC Educational Resources Information Center

Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

2016-01-01

Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…
A cluster analysis investigation of workaholism as a syndrome.

PubMed

Aziz, Shahnaz; Zickar, Michael J

2006-01-01

Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.
Sejong Open Cluster Survey (SOS). 0. Target Selection and Data Analysis

NASA Astrophysics Data System (ADS)

Sung, Hwankyung; Lim, Beomdu; Bessell, Michael S.; Kim, Jinyoung S.; Hur, Hyeonoh; Chun, Moo-Young; Park, Byeong-Gon

2013-06-01

Star clusters are superb astrophysical laboratories containing cospatial and coeval samples of stars with similar chemical composition. We initiate the Sejong Open cluster Survey (SOS) - a project dedicated to providing homogeneous photometry of a large number of open clusters in the SAAO Johnson-Cousins' UBVI system. To achieve our main goal, we pay much attention to the observation of standard stars in order to reproduce the SAAO standard system. Many of our targets are relatively small sparse clusters that escaped previous observations. As clusters are considered building blocks of the Galactic disk, their physical properties such as the initial mass function, the pattern of mass segregation, etc. give valuable information on the formation and evolution of the Galactic disk. The spatial distribution of young open clusters will be used to revise the local spiral arm structure of the Galaxy. In addition, the homogeneous data can also be used to test stellar evolutionary theory, especially concerning rare massive stars. In this paper we present the target selection criteria, the observational strategy for accurate photometry, and the adopted calibrations for data analysis such as color-color relations, zero-age main sequence relations, Sp - M_V relations, Sp - T_{eff} relations, Sp - color relations, and T_{eff} - BC relations. Finally we provide some data analysis such as the determination of the reddening law, the membership selection criteria, and distance determination.
LBT/MODS spectroscopy of globular clusters in the irregular galaxy NGC 4449

NASA Astrophysics Data System (ADS)

Annibali, F.; Morandi, E.; Watkins, L. L.; Tosi, M.; Aloisi, A.; Buzzoni, A.; Cusano, F.; Fumana, M.; Marchetti, A.; Mignoli, M.; Mucciarelli, A.; Romano, D.; van der Marel, R. P.

2018-05-01

We present intermediate-resolution (R ˜ 1000) spectra in the ˜3500-10 000 Å range of 14 globular clusters in the Magellanic irregular galaxy NGC 4449 acquired with the Multi-Object Double Spectrograph on the Large Binocular Telescope. We derived Lick indices in the optical and the Ca II triplet index in the near-infrared in order to infer the clusters' stellar population properties. The inferred cluster ages are typically older than ˜9 Gyr, although ages are derived with large uncertainties. The clusters exhibit intermediate metallicities, in the range -1.2 ≲ [Fe/H] ≲ -0.7, and typically sub-solar [α/Fe] ratios, with a peak at ˜-0.4. These properties suggest that (i) during the first few Gyr NGC 4449 formed stars slowly and inefficiently, with galactic winds having possibly contributed to the expulsion of the α-elements, and (ii) globular clusters in NGC 4449 formed relatively `late', from a medium already enriched in the products of Type Ia supernovae. The majority of clusters appear also underabundant in CN compared to Milky Way halo globular clusters, perhaps because of the lack of a conspicuous N-enriched, second generation of stars like that observed in Galactic globular clusters. Using the cluster velocities, we infer the dynamical mass of NGC 4449 inside 2.88 kpc to be M(<2.88 kpc) = 3.15^{+3.16}_{-0.75} × 10^9 M_{\\odot }. We also report the serendipitous discovery of a planetary nebula within one of the targeted clusters, a rather rare event.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Text grouping in patent analysis using adaptive K-means clustering algorithm

NASA Astrophysics Data System (ADS)

Shanie, Tiara; Suprijadi, Jadi; Zulhanif

2017-03-01

Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.
Visual verification and analysis of cluster detection for molecular dynamics.

PubMed

Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

2007-01-01

A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.
Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

PubMed

Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

2015-01-01

In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.
Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

PubMed

Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

2017-02-01

Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
How do components of evidence-based psychological treatment cluster in practice? A survey and cluster analysis.

PubMed

Gifford, Elizabeth V; Tavakoli, Sara; Weingardt, Kenneth R; Finney, John W; Pierson, Heather M; Rosen, Craig S; Hagedorn, Hildi J; Cook, Joan M; Curran, Geoff M

2012-01-01

Evidence-based psychological treatments (EBPTs) are clusters of interventions, but it is unclear how providers actually implement these clusters in practice. A disaggregated measure of EBPTs was developed to characterize clinicians' component-level evidence-based practices and to examine relationships among these practices. Survey items captured components of evidence-based treatments based on treatment integrity measures. The Web-based survey was conducted with 75 U.S. Department of Veterans Affairs (VA) substance use disorder (SUD) practitioners and 149 non-VA community-based SUD practitioners. Clinician's self-designated treatment orientations were positively related to their endorsement of those EBPT components; however, clinicians used components from a variety of EBPTs. Hierarchical cluster analysis indicated that clinicians combined and organized interventions from cognitive-behavioral therapy, the community reinforcement approach, motivational interviewing, structured family and couples therapy, 12-step facilitation, and contingency management into clusters including empathy and support, treatment engagement and activation, abstinence initiation, and recovery maintenance. Understanding how clinicians use EBPT components may lead to improved evidence-based practice dissemination and implementation. Published by Elsevier Inc.
Descriptive Statistics and Cluster Analysis for Extreme Rainfall in Java Island

NASA Astrophysics Data System (ADS)

E Komalasari, K.; Pawitan, H.; Faqih, A.

2017-03-01

This study aims to describe regional pattern of extreme rainfall based on maximum daily rainfall for period 1983 to 2012 in Java Island. Descriptive statistics analysis was performed to obtain centralization, variation and distribution of maximum precipitation data. Mean and median are utilized to measure central tendency data while Inter Quartile Range (IQR) and standard deviation are utilized to measure variation of data. In addition, skewness and kurtosis used to obtain shape the distribution of rainfall data. Cluster analysis using squared euclidean distance and ward method is applied to perform regional grouping. Result of this study show that mean (average) of maximum daily rainfall in Java Region during period 1983-2012 is around 80-181mm with median between 75-160mm and standard deviation between 17 to 82. Cluster analysis produces four clusters and show that western area of Java tent to have a higher annual maxima of daily rainfall than northern area, and have more variety of annual maximum value.
Predicting healthcare outcomes in prematurely born infants using cluster analysis.

PubMed

MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne

2018-05-23

Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.
Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

PubMed

Gold, David L; Coombes, Kevin R; Wang, Jing; Mallick, Bani

2007-03-01

Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is 'enriched' for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental.
Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases

NASA Astrophysics Data System (ADS)

Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.

2017-09-01

The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.
Cluster analysis of S. Cerevisiae nucleosome binding sites

NASA Astrophysics Data System (ADS)

Suvorova, Y.; Korotkov, E.

2017-12-01

It is well known that major part of a eukaryotic genome is wrapped around histone proteins forming nucleosomes. It was also demonstrated that the DNA sequence itself is playing an important role in the nucleosome positioning process. In this work, a cluster analysis of 67 517 nucleosome binding sites from the S. Cerevisiae genome was carried out. The classification method is based on the self-adjusting dinucleotides position weight matrix. As a result, 135 significant clusters were discovered that contain 43225 sequences (which constitutes 64% of the initial set). The meaning of the found classes is discussed, as well as the possibility of the further usage.
HICOSMO - X-ray analysis of a complete sample of galaxy clusters

NASA Astrophysics Data System (ADS)

Schellenberger, G.; Reiprich, T.

2017-10-01

Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

Evaluating biomarkers for prognostic enrichment of clinical trials.

PubMed

Kerr, Kathleen F; Roth, Jeremy; Zhu, Kehao; Thiessen-Philbrook, Heather; Meisner, Allison; Wilson, Francis Perry; Coca, Steven; Parikh, Chirag R

2017-12-01

A potential use of biomarkers is to assist in prognostic enrichment of clinical trials, where only patients at relatively higher risk for an outcome of interest are eligible for the trial. We investigated methods for evaluating biomarkers for prognostic enrichment. We identified five key considerations when considering a biomarker and a screening threshold for prognostic enrichment: (1) clinical trial sample size, (2) calendar time to enroll the trial, (3) total patient screening costs and the total per-patient trial costs, (4) generalizability of trial results, and (5) ethical evaluation of trial eligibility criteria. Items (1)-(3) are amenable to quantitative analysis. We developed the Biomarker Prognostic Enrichment Tool for evaluating biomarkers for prognostic enrichment at varying levels of screening stringency. We demonstrate that both modestly prognostic and strongly prognostic biomarkers can improve trial metrics using Biomarker Prognostic Enrichment Tool. Biomarker Prognostic Enrichment Tool is available as a webtool at http://prognosticenrichment.com and as a package for the R statistical computing platform. In some clinical settings, even biomarkers with modest prognostic performance can be useful for prognostic enrichment. In addition to the quantitative analysis provided by Biomarker Prognostic Enrichment Tool, investigators must consider the generalizability of trial results and evaluate the ethics of trial eligibility criteria.
Groundwater source contamination mechanisms: Physicochemical profile clustering, risk factor analysis and multivariate modelling

NASA Astrophysics Data System (ADS)

Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.

2014-04-01

An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.
Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

PubMed

Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

2017-07-01

The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

PubMed

Laetsch, Dominik R; Blaxter, Mark L

2017-10-05

The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.
Size-Based Enrichment Technologies for Non-cancerous Tumor-Derived Cells in Blood.

PubMed

Mong, Jamie; Tan, Min-Han

2018-05-01

Enumeration of circulating tumor cells (CTCs) in the bloodstream can predict prognosis and survival in cancer patients. However, CTC rarity and heterogeneity pose challenges in using them as biomarkers. Recent publications have reported new classes of circulating, non-cancerous tumor-derived cells present in cancer patients but not in healthy controls; these include cancer-associated macrophages, tumor-endothelial clusters (TECs), and cancer-associated fibroblasts (CAFs). Well-established marker-dependent CTC enrichment technologies will miss this group of circulating cells. To maximize our chance of finding useful circulating biomarkers in cancer patients, we propose the use of size-based enrichment technologies to isolate both cancerous and non-cancerous cells in circulation. We review their biological properties and discuss device features to consider in their enrichment. Copyright © 2018 Elsevier Ltd. All rights reserved.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
LITHIUM-RICH GIANTS IN GLOBULAR CLUSTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kirby, Evan N.; Cohen, Judith G.; Guhathakurta, Puragra

Although red giants deplete lithium on their surfaces, some giants are Li-rich. Intermediate-mass asymptotic giant branch (AGB) stars can generate Li through the Cameron–Fowler conveyor, but the existence of Li-rich, low-mass red giant branch (RGB) stars is puzzling. Globular clusters are the best sites to examine this phenomenon because it is straightforward to determine membership in the cluster and to identify the evolutionary state of each star. In 72 hours of Keck/DEIMOS exposures in 25 clusters, we found four Li-rich RGB and two Li-rich AGB stars. There were 1696 RGB and 125 AGB stars with measurements or upper limits consistentmore » with normal abundances of Li. Hence, the frequency of Li-richness in globular clusters is (0.2 ± 0.1)% for the RGB, (1.6 ± 1.1)% for the AGB, and (0.3 ± 0.1)% for all giants. Because the Li-rich RGB stars are on the lower RGB, Li self-generation mechanisms proposed to occur at the luminosity function bump or He core flash cannot explain these four lower RGB stars. We propose the following origin for Li enrichment: (1) All luminous giants experience a brief phase of Li enrichment at the He core flash. (2) All post-RGB stars with binary companions on the lower RGB will engage in mass transfer. This scenario predicts that 0.1% of lower RGB stars will appear Li-rich due to mass transfer from a recently Li-enhanced companion. This frequency is at the lower end of our confidence interval.« less
Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting.

PubMed

Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy

2016-08-01

To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.
Functional Analysis of Human Hub Proteins and Their Interactors Involved in the Intrinsic Disorder-Enriched Interactions

PubMed Central

Hu, Gang; Wu, Zhonghua

2017-01-01

Some of the intrinsically disordered proteins and protein regions are promiscuous interactors that are involved in one-to-many and many-to-one binding. Several studies have analyzed enrichment of intrinsic disorder among the promiscuous hub proteins. We extended these works by providing a detailed functional characterization of the disorder-enriched hub protein-protein interactions (PPIs), including both hubs and their interactors, and by analyzing their enrichment among disease-associated proteins. We focused on the human interactome, given its high degree of completeness and relevance to the analysis of the disease-linked proteins. We quantified and investigated numerous functional and structural characteristics of the disorder-enriched hub PPIs, including protein binding, structural stability, evolutionary conservation, several categories of functional sites, and presence of over twenty types of posttranslational modifications (PTMs). We showed that the disorder-enriched hub PPIs have a significantly enlarged number of disordered protein binding regions and long intrinsically disordered regions. They also include high numbers of targeting, catalytic, and many types of PTM sites. We empirically demonstrated that these hub PPIs are significantly enriched among 11 out of 18 considered classes of human diseases that are associated with at least 100 human proteins. Finally, we also illustrated how over a dozen specific human hubs utilize intrinsic disorder for their promiscuous PPIs. PMID:29257115
The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

NASA Astrophysics Data System (ADS)

Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

2018-04-01

The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

PubMed Central

2011-01-01

Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot
A view of the H-band light-element chemical patterns in globular clusters under the AGB self-enrichment scenario

NASA Astrophysics Data System (ADS)

Dell'Agli, F.; García-Hernández, D. A.; Ventura, P.; Mészáros, Sz; Masseron, T.; Fernández-Trincado, J. G.; Tang, B.; Shetrone, M.; Zamora, O.; Lucatello, S.

2018-04-01

We discuss the self-enrichment scenario by asymptotic giant branch (AGB) stars for the formation of multiple populations in globular clusters (GCs) by analysing data set of giant stars observed in nine Galactic GCs, covering a wide range of metallicities and for which the simultaneous measurements of C, N, O, Mg, Al, and Si are available. To this aim, we calculated six sets of AGB models, with the same chemical composition as the stars belonging to the first generation of each GC. We find that the AGB yields can reproduce the set of observations available, not only in terms of the degree of contamination shown by stars in each GC but, more important, also the observed trend with metallicity, which agrees well with the predictions from AGB evolution modelling. While further observational evidences are required to definitively fix the main actors in the pollution of the interstellar medium from which new generation of stars formed in GCs, the present results confirm that the gas ejected by stars of mass in the range 4 M_{⊙} ≤ M ≤ 8 M_{⊙} during the AGB phase share the same chemical patterns traced by stars in GCs.
Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

PubMed

Ben-Sasson, Ayelet; Podoly, Tamar Yonit

2017-02-01

Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a
Perspectives on Intracluster Enrichment and the Stellar Initial Mass Function in Elliptical Galaxies

NASA Technical Reports Server (NTRS)

Lowenstein, Michael

2013-01-01

The amount of metals in the Intracluster Medium (ICM) in rich galaxy clusters exceeds that expected based on the observed stellar population by a large factor. We quantify this discrepancy--which we term the "cluster elemental abundance paradox"--and investigate the required properties of the ICM-enriching population. The necessary enhancement in metal enrichment may, in principle, originate in the observed stellar population if a larger fraction of stars in the supernova-progenitor mass range form from an initial mass function (IMF) that is either bottom-light or top-heavy, with the latter in some conflict with observed ICM abundance ratios. Other alternatives that imply more modest revisions to the IMF, mass return and remnant fractions, and primordial fraction, posit an increase in the fraction of 3-8 solar mass stars that explode as SNIa or assume that there are more stars than conventionally thought--although the latter implies a high star formation efficiency. We discuss the feasibility of these various solutions and the implications for the diversity of star formation, the process of elliptical galaxy formation, and the nature of this hidden source of ICM metal enrichment in light of recent evidence of an elliptical galaxy IMF that, because it is skewed to low masses, deepens the paradox.
Transcriptional Regulatory Network Analysis of MYB Transcription Factor Family Genes in Rice.

PubMed

Smita, Shuchi; Katiyar, Amit; Chinnusamy, Viswanathan; Pandey, Dev M; Bansal, Kailash C

2015-01-01

MYB transcription factor (TF) is one of the largest TF families and regulates defense responses to various stresses, hormone signaling as well as many metabolic and developmental processes in plants. Understanding these regulatory hierarchies of gene expression networks in response to developmental and environmental cues is a major challenge due to the complex interactions between the genetic elements. Correlation analyses are useful to unravel co-regulated gene pairs governing biological process as well as identification of new candidate hub genes in response to these complex processes. High throughput expression profiling data are highly useful for construction of co-expression networks. In the present study, we utilized transcriptome data for comprehensive regulatory network studies of MYB TFs by "top-down" and "guide-gene" approaches. More than 50% of OsMYBs were strongly correlated under 50 experimental conditions with 51 hub genes via "top-down" approach. Further, clusters were identified using Markov Clustering (MCL). To maximize the clustering performance, parameter evaluation of the MCL inflation score (I) was performed in terms of enriched GO categories by measuring F-score. Comparison of co-expressed cluster and clads analyzed from phylogenetic analysis signifies their evolutionarily conserved co-regulatory role. We utilized compendium of known interaction and biological role with Gene Ontology enrichment analysis to hypothesize function of coexpressed OsMYBs. In the other part, the transcriptional regulatory network analysis by "guide-gene" approach revealed 40 putative targets of 26 OsMYB TF hubs with high correlation value utilizing 815 microarray data. The putative targets with MYB-binding cis-elements enrichment in their promoter region, functional co-occurrence as well as nuclear localization supports our finding. Specially, enrichment of MYB binding regions involved in drought-inducibility implying their regulatory role in drought response in rice
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-07-01

In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor
EviNet: a web platform for network enrichment analysis with flexible definition of gene sets.

PubMed

Jeggari, Ashwini; Alekseenko, Zhanna; Petrov, Iurii; Dias, José M; Ericson, Johan; Alexeyenko, Andrey

2018-06-09

The new web resource EviNet provides an easily run interface to network enrichment analysis for exploration of novel, experimentally defined gene sets. The major advantages of this analysis are (i) applicability to any genes found in the global network rather than only to those with pathway/ontology term annotations, (ii) ability to connect genes via different molecular mechanisms rather than within one high-throughput platform, and (iii) statistical power sufficient to detect enrichment of very small sets, down to individual genes. The users' gene sets are either defined prior to upload or derived interactively from an uploaded file by differential expression criteria. The pathways and networks used in the analysis can be chosen from the collection menu. The calculation is typically done within seconds or minutes and the stable URL is provided immediately. The results are presented in both visual (network graphs) and tabular formats using jQuery libraries. Uploaded data and analysis results are kept in separated project directories not accessible by other users. EviNet is available at https://www.evinet.org/.
Outcome-Driven Cluster Analysis with Application to Microarray Data.

PubMed

Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

2015-01-01

One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

PubMed

Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

2015-11-05

Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (P<0.001), independent of age, height, weight, and running speed. When these two groups were compared to PFP runners, one cluster exhibited greater while the other exhibited reduced peak knee abduction angles (P<0.05). The variability observed in running patterns across this sample could be the result of different gait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

PubMed Central

Liu, Jingxian; Wu, Kefeng

2017-01-01

The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

Person mobility in the design and analysis of cluster-randomized cohort prevention trials.

PubMed

Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard

2012-06-01

Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.
A nonparametric clustering technique which estimates the number of clusters

NASA Technical Reports Server (NTRS)

Ramey, D. B.

1983-01-01

In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

PubMed Central

Sinyor, Mark; Schaffer, Ayal; Streiner, David L

2014-01-01

Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321
FLOCK cluster analysis of mast cell event clustering by high-sensitivity flow cytometry predicts systemic mastocytosis.

PubMed

Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty

2015-11-01

In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.
Using Enrichment Clusters to Address the Needs of Culturally and Linguistically Diverse Learners

ERIC Educational Resources Information Center

Allen, Jennifer K.; Robbins, Margaret A.; Payne, Yolanda Denise; Brown, Katherine Backes

2016-01-01

Using data from teacher interviews, classroom observations, and a professional development workshop, this article explains how one component of the schoolwide enrichment model (SEM) has been implemented at a culturally diverse elementary school serving primarily Latina/o and African American students. Based on a broadened conception of giftedness,…
Cluster headache and the hypocretin receptor 2 reconsidered: a genetic association study and meta-analysis.

PubMed

Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje

2015-08-01

Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.
Cluster Analysis of International Information and Social Development.

ERIC Educational Resources Information Center

Lau, Jesus

1990-01-01

Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…
Hubble Admires a Youthful Globular Star Cluster

NASA Image and Video Library

2017-12-08

Hubble sees an unusal global cluster that is enriching the interstellar medium with metals Globular clusters offer some of the most spectacular sights in the night sky. These ornate spheres contain hundreds of thousands of stars, and reside in the outskirts of galaxies. The Milky Way contains over 150 such clusters — and the one shown in this NASA/ESA Hubble Space Telescope image, named NGC 362, is one of the more unusual ones. As stars make their way through life they fuse elements together in their cores, creating heavier and heavier elements — known in astronomy as metals — in the process. When these stars die, they flood their surroundings with the material they have formed during their lifetimes, enriching the interstellar medium with metals. Stars that form later therefore contain higher proportions of metals than their older relatives. By studying the different elements present within individual stars in NGC 362, astronomers discovered that the cluster boasts a surprisingly high metal content, indicating that it is younger than expected. Although most globular clusters are much older than the majority of stars in their host galaxy, NGC 362 bucks the trend, with an age lying between 10 and 11 billion years old. For reference, the age of the Milky Way is estimated to be above 13 billion years. This image, in which you can view NGC 362’s individual stars, was taken by Hubble’s Advanced Camera for Surveys (ACS). Credit: ESA/Hubble& NASA NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
Electrochemical and genomic analysis of novel electroactive isolates obtained via potentiostatic enrichment from tropical sediment

NASA Astrophysics Data System (ADS)

Doyle, Lucinda E.; Yung, Pui Yi; Mitra, Sumitra D.; Wuertz, Stefan; Williams, Rohan B. H.; Lauro, Federico M.; Marsili, Enrico

2017-07-01

Enrichment of electrochemically-active microorganisms (EAM) to date has mostly relied on microbial fuel cells fed with wastewater. This study aims to enrich novel EAM by exposing tropical sediment, not frequently reported in the literature, to sustained anodic potentials. Voltamperometric techniques and electrochemical impedance spectroscopy, performed over a wide range of potentials, characterise extracellular electron transfer (EET) over time. Applied potential is found to affect biofilm electrochemical signature. Geobacter metallireducens is heavily enriched on the electrodes, as determined by metagenomic and metatranscriptomic analysis, in the first report of the species in a lactate-fed system. Two novel isolates are grown in pure culture from the enrichment, identified by 16S rRNA gene sequencing as Aeromonas and Enterobacter, respectively. The names proposed are Aeromonas sp. CL-1 and Enterobacter sp. EA-1. Both isolates are capable of EET on carbon felt and screen-printed carbon electrodes without the addition of exogenous redox mediators. Enterobacter sp. EA-1 can also perform mediated electron transfer using the soluble redox mediator 2-hydroxy-1,4-naphthoquinone (HNQ). Both isolates are able to use acetate and lactate as electron donors. This work outlines a comprehensive methodology for characterising novel EAM from unconventional inocula.
PHOTONICS AND NANOTECHNOLOGY Pulsed laser ablation of binary semiconductors: mechanisms of vaporisation and cluster formation

NASA Astrophysics Data System (ADS)

Bulgakov, A. V.; Evtushenko, A. B.; Shukhov, Yu G.; Ozerov, I.; Marin, W.

2010-12-01

Formation of small clusters during pulsed ablation of two binary semiconductors, zinc oxide and indium phosphide, in vacuum by UV, visible, and IR laser radiation is comparatively studied. The irradiation conditions favourable for generation of neutral and charged ZnnOm and InnPm clusters of different stoichiometry in the ablation products are found. The size and composition of the clusters, their expansion dynamics and reactivity are analysed by time-of-flight mass spectrometry. A particular attention is paid to the mechanisms of ZnO and InP ablation as a function of laser fluence, with the use of different ablation models. It is established that ZnO evapourates congruently in a wide range of irradiation conditions, while InP ablation leads to enrichment of the target surface with indium. It is shown that this radically different character of semiconductor ablation determines the composition of the nanostructures formed: zinc oxide clusters are mainly stoichiometric, whereas InnPm particles are significantly enriched with indium.
A systematic analysis of genomic changes in Tg2576 mice.

PubMed

Tan, Lu; Wang, Xiong; Ni, Zhong-Fei; Zhu, Xiuming; Wu, Wei; Zhu, Ling-Qiang; Liu, Dan

2013-06-01

Alzheimer's disease (AD) is an age-related neurodegenerative disorder characterized by intelligence decline, behavioral disorders and cognitive disability. The purpose of this study was to investigate gene expression in AD, based on published microarray data on Tg2576 mice. Hierarchical Cluster Analysis and Gene Ontology were employed to group genes together on the basis of their product characteristics and annotation data. Genes with prominent alterations were clustered into apoptosis and axon guidance pathways. Based on our findings and those of previous studies, we propose that the mitochondria-mediated apoptotic pathway plays a crucial role in the neuronal loss and synaptic dysfunction associated with AD. Furthermore, based on the findings of Positional Gene Enrichment analysis and Gene Set Enrichment analysis, we propose that the regulation of transcription of AD genes may be an important pathogenic factor in this neurodegenerative disease. Our results highlight the importance of genes that could subsequently be examined for their potential as prognostic markers for AD.
An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.

PubMed

Botía, Juan A; Vandrovcova, Jana; Forabosco, Paola; Guelfi, Sebastian; D'Sa, Karishma; Hardy, John; Lewis, Cathryn M; Ryten, Mina; Weale, Michael E

2017-04-12

Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ). We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.
Expressed sequence enrichment for candidate gene analysis of citrus tristeza virus resistance.

PubMed

Bernet, G P; Bretó, M P; Asins, M J

2004-02-01

Several studies have reported markers linked to a putative resistance gene from Poncirus trifoliata ( Ctv-R) located at linkage group 4 that confers resistance against one of the most important citrus pathogens, citrus tristeza virus (CTV). To be successful in both marker-assisted selection and transformation experiments, its accurate mapping is needed. Several factors may affect its localization, among them two are considered here: the definition of resistance and the genetic background of progeny. Two progenies derived from P. trifoliata, by self-pollination and by crossing with sour orange ( Citrus aurantium), a citrus rootstock well-adapted to arid and semi-arid areas, were used for linkage group-4 marker enrichment. Two new methodologies were used to enrich this region with expressed sequences. The enrichment of group 4 resulted in the fusion of several C. aurantium linkage groups. The new one A(7+3+4) is now saturated with 48 markers including expressed sequences. Surprisingly, sour orange was as resistant to the CTV isolate tested as was P. trifoliata, and three hybrids that carry Ctv-R, as deduced from its flanking markers, are susceptible to CTV. The new linkage maps were used to map Ctv-R under the hypothesis of monogenic inheritance. Its position on linkage group 4 of P. trifoliata differs from the location previously reported in other progenies. The genetic analysis of virus-plant interaction in the family derived from C. aurantium after a CTV chronic infection showed the segregation of five types of interaction, which is not compatible with the hypothesis of a single gene controlling resistance. Two major issues are discussed: another type of genetic analysis of CTV resistance is needed to avoid the assumption of monogenic inheritance, and transferring Ctv-R from P. trifoliata to sour orange might not avoid the CTV decline of sweet orange trees.
Chandra Observation of the Cluster of Galaxies MS 0839.9+2938 at z = 0.194: The Central Excess Iron and Type Ia Supernova Enrichment

NASA Astrophysics Data System (ADS)

Wang, Yu; Xu, Haiguang; Zhang, Zhongli; Xu, Yueheng; Wu, Xiang-Ping; Xue, Sui-Jian; Li, Zongwei

2005-09-01

We present the Chandra study of the intermediate-redshift (z=0.194) cluster of galaxies MS 0839.9+2938. By performing both projected and deprojected spectral analyses, we find that the gas temperature is approximately constant at about 4 keV within 130-444 h-170 kpc. In the inner regions, the gas temperature decreases toward the center, reaching <~3 keV in the central 37 h-170 kpc. This implies that the lower and upper limits of the mass deposit rate are 9-34 and 96-126 Msolar yr-1, respectively, within 74 h-170 kpc, where the gas is significantly colder. Along with the temperature drop, we detect a significant inward iron abundance increase from about 0.4 Zsolar in the outer regions to ~=1 Zsolar within the central 37 h-170 kpc. Thus, MS 0839.9+2938 is the cluster showing the most significant central iron excess at z>~0.2. We argue that most of the excess iron should have been contributed by SNe Ia. Using the observed SN Ia rate and stellar mass loss rate, we estimate that the time needed to enrich the central region with excess iron is 6.4-7.9 Gyr, which is similar to those found for nearby clusters. Coinciding with the optical extension of the cD galaxy (up to about 30 h-170 kpc), the observed X-ray surface brightness profile exhibits an excess beyond the distribution expected by either the β model or the Navarro-Frenk-White (NFW) model and can be well fitted with an empirical two-β model that leads to a relatively flatter mass profile in the innermost region.
Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

PubMed

Anandakrishnan, Ramu; Onufriev, Alexey

2008-03-01

In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering

PubMed Central

Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

2015-01-01

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Convex clustering: an attractive alternative to hierarchical clustering.

PubMed

Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth

2015-05-01

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data.

PubMed

Fidaner, Işık Barış; Cankorur-Cetinkaya, Ayca; Dikicioglu, Duygu; Kirdar, Betul; Cemgil, Ali Taylan; Oliver, Stephen G

2016-02-01

Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. sgo24@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Chemical pre-processing of cluster galaxies over the past 10 billion years in the IllustrisTNG simulations

NASA Astrophysics Data System (ADS)

Gupta, Anshu; Yuan, Tiantian; Torrey, Paul; Vogelsberger, Mark; Martizzi, Davide; Tran, Kim-Vy H.; Kewley, Lisa J.; Marinacci, Federico; Nelson, Dylan; Pillepich, Annalisa; Hernquist, Lars; Genel, Shy; Springel, Volker

2018-06-01

We use the IllustrisTNG simulations to investigate the evolution of the mass-metallicity relation (MZR) for star-forming cluster galaxies as a function of the formation history of their cluster host. The simulations predict an enhancement in the gas-phase metallicities of star-forming cluster galaxies (109 < M* < 1010 M⊙ h-1) at z ≤ 1.0 in comparisons to field galaxies. This is qualitatively consistent with observations. We find that the metallicity enhancement of cluster galaxies appears prior to their infall into the central cluster potential, indicating for the first time a systematic `chemical pre-processing' signature for infalling cluster galaxies. Namely, galaxies that will fall into a cluster by z = 0 show a ˜0.05 dex enhancement in the MZR compared to field galaxies at z ≤ 0.5. Based on the inflow rate of gas into cluster galaxies and its metallicity, we identify that the accretion of pre-enriched gas is the key driver of the chemical evolution of such galaxies, particularly in the stellar mass range (109 < M* < 1010 M⊙ h-1). We see signatures of an environmental dependence of the ambient/inflowing gas metallicity that extends well outside the nominal virial radius of clusters. Our results motivate future observations looking for pre-enrichment signatures in dense environments.
The application of cluster analysis in the intercomparison of loop structures in RNA.

PubMed

Huang, Hung-Chung; Nagaswamy, Uma; Fox, George E

2005-04-01

We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.

The application of cluster analysis in the intercomparison of loop structures in RNA

PubMed Central

HUANG, HUNG-CHUNG; NAGASWAMY, UMA; FOX, GEORGE E.

2005-01-01

We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence. PMID:15769871
Minimum number of clusters and comparison of analysis methods for cross sectional stepped wedge cluster randomised trials with binary outcomes: A simulation study.

PubMed

Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick

2017-03-09

Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.
Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

PubMed

Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

2018-06-07

Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
Subgroups of physically abusive parents based on cluster analysis of parenting behavior and affect.

PubMed

Haskett, Mary E; Smith Scott, Susan; Sabourin Ward, Caryn

2004-10-01

Cluster analysis of observed parenting and self-reported discipline was used to categorize 83 abusive parents into subgroups. A 2-cluster solution received support for validity. Cluster 1 parents were relatively warm, positive, sensitive, and engaged during interactions with their children, whereas Cluster 2 parents were relatively negative, disengaged or intrusive, and insensitive. Further, clusters differed in emotional health, parenting stress, perceptions of children, and problem solving. Children of parents in the 2 clusters differed on several indexes of social adjustment. Cluster 1 parents were similar to nonabusive parents (n = 66) on parenting and related constructs, but Cluster 2 parents differed from nonabusive parents on all clustering variables and many validation variables. Results highlight clinically relevant diversity in parenting practices and functioning among abusive parents. ((c) 2004 APA, all rights reserved).
Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

PubMed

Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

2017-09-01

We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians
High-dimensional cluster analysis with the Masked EM Algorithm

PubMed Central

Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

2014-01-01

Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
Preliminary Cluster Analysis For Several Representatives Of Genus Kerivoula (Chiroptera: Vespertilionidae) in Borneo

NASA Astrophysics Data System (ADS)

Hasan, Noor Haliza; Abdullah, M. T.

2008-01-01

The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.
Phenotypes of asthma in low-income children and adolescents: cluster analysis.

PubMed

Cabral, Anna Lucia Barros; Sousa, Andrey Wirgues; Mendes, Felipe Augusto Rodrigues; Carvalho, Celso Ricardo Fernandes de

2017-01-01

Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. We included 306 children and adolescents (6-18 years of age) with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Three clusters were defined for our population. Cluster 1 (n = 94) included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87) included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108) included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability. Estudos que caracterizam fenótipos de asma predominantemente incluem adultos ou foram realizados em crianças e adolescentes de países desenvolvidos; portanto, sua aplicabilidade em outras populações, tais como as de países em desenvolvimento, permanece indeterminada. Nosso
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.

PubMed

Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

2017-09-27

A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis

PubMed Central

Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

2017-01-01

A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation. PMID:28953231
Motor-Enriched Learning Activities Can Improve Mathematical Performance in Preadolescent Children.

PubMed

Beck, Mikkel M; Lind, Rune R; Geertsen, Svend S; Ritz, Christian; Lundbye-Jensen, Jesper; Wienecke, Jacob

2016-01-01

Objective: An emerging field of research indicates that physical activity can benefit cognitive functions and academic achievements in children. However, less is known about how academic achievements can benefit from specific types of motor activities (e.g., fine and gross) integrated into learning activities. Thus, the aim of this study was to investigate whether fine or gross motor activity integrated into math lessons (i.e., motor-enrichment) could improve children's mathematical performance. Methods: A 6-week within school cluster-randomized intervention study investigated the effects of motor-enriched mathematical teaching in Danish preadolescent children ( n = 165, age = 7.5 ± 0.02 years). Three groups were included: a control group (CON), which received non-motor enriched conventional mathematical teaching, a fine motor math group (FMM) and a gross motor math group (GMM), which received mathematical teaching enriched with fine and gross motor activity, respectively. The children were tested before (T0), immediately after (T1) and 8 weeks after the intervention (T2). A standardized mathematical test (50 tasks) was used to evaluate mathematical performance. Furthermore, it was investigated whether motor-enriched math was accompanied by different effects in low and normal math performers. Additionally, the study investigated the potential contribution of cognitive functions and motor skills on mathematical performance. Results: All groups improved their mathematical performance from T0 to T1. However, from T0 to T1, the improvement was significantly greater in GMM compared to FMM (1.87 ± 0.71 correct answers) ( p = 0.02). At T2 no significant differences in mathematical performance were observed. A subgroup analysis revealed that normal math-performers benefitted from GMM compared to both CON 1.78 ± 0.73 correct answers ( p = 0.04) and FMM 2.14 ± 0.72 correct answers ( p = 0.008). These effects were not observed in low math-performers. The effects were
Motor-Enriched Learning Activities Can Improve Mathematical Performance in Preadolescent Children

PubMed Central

Beck, Mikkel M.; Lind, Rune R.; Geertsen, Svend S.; Ritz, Christian; Lundbye-Jensen, Jesper; Wienecke, Jacob

2016-01-01

Objective: An emerging field of research indicates that physical activity can benefit cognitive functions and academic achievements in children. However, less is known about how academic achievements can benefit from specific types of motor activities (e.g., fine and gross) integrated into learning activities. Thus, the aim of this study was to investigate whether fine or gross motor activity integrated into math lessons (i.e., motor-enrichment) could improve children's mathematical performance. Methods: A 6-week within school cluster-randomized intervention study investigated the effects of motor-enriched mathematical teaching in Danish preadolescent children (n = 165, age = 7.5 ± 0.02 years). Three groups were included: a control group (CON), which received non-motor enriched conventional mathematical teaching, a fine motor math group (FMM) and a gross motor math group (GMM), which received mathematical teaching enriched with fine and gross motor activity, respectively. The children were tested before (T0), immediately after (T1) and 8 weeks after the intervention (T2). A standardized mathematical test (50 tasks) was used to evaluate mathematical performance. Furthermore, it was investigated whether motor-enriched math was accompanied by different effects in low and normal math performers. Additionally, the study investigated the potential contribution of cognitive functions and motor skills on mathematical performance. Results: All groups improved their mathematical performance from T0 to T1. However, from T0 to T1, the improvement was significantly greater in GMM compared to FMM (1.87 ± 0.71 correct answers) (p = 0.02). At T2 no significant differences in mathematical performance were observed. A subgroup analysis revealed that normal math-performers benefitted from GMM compared to both CON 1.78 ± 0.73 correct answers (p = 0.04) and FMM 2.14 ± 0.72 correct answers (p = 0.008). These effects were not observed in low math-performers. The effects were partly
Language Learner Motivational Types: A Cluster Analysis Study

ERIC Educational Resources Information Center

Papi, Mostafa; Teimouri, Yasser

2014-01-01

The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…
Study on behaviors and performances of universal N-glycopeptide enrichment methods.

PubMed

Xue, Yu; Xie, Juanjuan; Fang, Pan; Yao, Jun; Yan, Guoquan; Shen, Huali; Yang, Pengyuan

2018-04-16

Glycosylation is a crucial process in protein biosynthesis. However, the analysis of glycopeptides through MS remains challenging due to the microheterogeneity and macroheterogeneity of the glycoprotein. Selective enrichment of glycopeptides from complex samples prior to MS analysis is essential for successful glycoproteome research. In this work, we systematically investigated the behaviors and performances of boronic acid chemistry, ZIC-HILIC, and PGC of glycopeptide enrichment to promote understanding of these methods. We also optimized boronic acid chemistry and ZIC-HILIC enrichment methods and applied them to enrich glycopeptides from mouse liver. The intact N-glycopeptides were interpreted using the in-house analysis software pGlyco 2.0. We found that boronic acid chemistry in this study preferred to capture glycopeptides with high mannose glycans, ZIC-HILIC enriched most N-glycopeptides and did not show significant preference during enrichment and PGC was not suitable for separating glycopeptides with a long amino acid sequence. We performed a detailed study on the behaviors and performances of boronic acid chemistry, ZIC-HILIC, and PGC enrichment methods and provide a better understanding of enrichment methods for further glycoproteomics research.
Clustering Financial Time Series by Network Community Analysis

NASA Astrophysics Data System (ADS)

Piccardi, Carlo; Calatroni, Lisa; Bertoni, Fabio

In this paper, we describe a method for clustering financial time series which is based on community analysis, a recently developed approach for partitioning the nodes of a network (graph). A network with N nodes is associated to the set of N time series. The weight of the link (i, j), which quantifies the similarity between the two corresponding time series, is defined according to a metric based on symbolic time series analysis, which has recently proved effective in the context of financial time series. Then, searching for network communities allows one to identify groups of nodes (and then time series) with strong similarity. A quantitative assessment of the significance of the obtained partition is also provided. The method is applied to two distinct case-studies concerning the US and Italy Stock Exchange, respectively. In the US case, the stability of the partitions over time is also thoroughly investigated. The results favorably compare with those obtained with the standard tools typically used for clustering financial time series, such as the minimal spanning tree and the hierarchical tree.
Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

ERIC Educational Resources Information Center

Raker, Jeffrey R.; Holme, Thomas A.

2014-01-01

A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…
Fuzzy cluster analysis of air quality in Beijing district

NASA Astrophysics Data System (ADS)

Liu, Hongkai

2018-02-01

The principle of fuzzy clustering analysis is applied in this article, by using the method of transitive closure, the main air pollutants in 17 districts of Beijing from 2014 to 2016 were classified. The results of the analysis reflects the nearly three year’s changes of the main air pollutants in Beijing. This can provide the scientific for atmospheric governance in the Beijing area and digital support.
Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis.

PubMed

Brindle, R C; Ginty, A T; Jones, A; Phillips, A C; Roseboom, T J; Carroll, D; Painter, R C; de Rooij, S R

2016-12-01

Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure reactivity patterns and hypertension in a large prospective cohort (age range 55-60 years). Four clusters emerged with statistically different systolic and diastolic blood pressure and heart rate reactivity patterns. Cluster 1 was characterised by a relatively exaggerated blood pressure and heart rate response while the blood pressure and heart rate responses of cluster 2 were relatively modest and in line with the sample mean. Cluster 3 was characterised by blunted cardiovascular stress reactivity across all variables and cluster 4, by an exaggerated blood pressure response and modest heart rate response. Membership to cluster 4 conferred an increased risk of hypertension at 5-year follow-up (hazard ratio=2.98 (95% CI: 1.50-5.90), P<0.01) that survived adjustment for a host of potential confounding variables. These results suggest that the cardiac reactivity plays a potentially important role in the link between blood pressure reactivity and hypertension and support the use of multivariate approaches to stress psychophysiology.
Lithium-rich Giants in Globular Clusters

NASA Astrophysics Data System (ADS)

Kirby, Evan N.; Guhathakurta, Puragra; Zhang, Andrew J.; Hong, Jerry; Guo, Michelle; Guo, Rachel; Cohen, Judith G.; Cunha, Katia

2016-03-01

Although red giants deplete lithium on their surfaces, some giants are Li-rich. Intermediate-mass asymptotic giant branch (AGB) stars can generate Li through the Cameron-Fowler conveyor, but the existence of Li-rich, low-mass red giant branch (RGB) stars is puzzling. Globular clusters are the best sites to examine this phenomenon because it is straightforward to determine membership in the cluster and to identify the evolutionary state of each star. In 72 hours of Keck/DEIMOS exposures in 25 clusters, we found four Li-rich RGB and two Li-rich AGB stars. There were 1696 RGB and 125 AGB stars with measurements or upper limits consistent with normal abundances of Li. Hence, the frequency of Li-richness in globular clusters is (0.2 ± 0.1)% for the RGB, (1.6 ± 1.1)% for the AGB, and (0.3 ± 0.1)% for all giants. Because the Li-rich RGB stars are on the lower RGB, Li self-generation mechanisms proposed to occur at the luminosity function bump or He core flash cannot explain these four lower RGB stars. We propose the following origin for Li enrichment: (1) All luminous giants experience a brief phase of Li enrichment at the He core flash. (2) All post-RGB stars with binary companions on the lower RGB will engage in mass transfer. This scenario predicts that 0.1% of lower RGB stars will appear Li-rich due to mass transfer from a recently Li-enhanced companion. This frequency is at the lower end of our confidence interval. The data presented herein were obtained at the W. M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W. M. Keck Foundation.
Application of Geostatistical Methods and Machine Learning for spatio-temporal Earthquake Cluster Analysis

NASA Astrophysics Data System (ADS)

Schaefer, A. M.; Daniell, J. E.; Wenzel, F.

2014-12-01

Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.

Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

NASA Astrophysics Data System (ADS)

Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

2018-04-01

Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

PubMed Central

Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

2015-01-01

The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694
Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations.

PubMed

Gupta, Mayetri; Cheung, Ching-Lung; Hsu, Yi-Hsiang; Demissie, Serkalem; Cupples, L Adrienne; Kiel, Douglas P; Karasik, David

2011-06-01

Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated
Analysis of microbial community and nitrogen transition with enriched nitrifying soil microbes for organic hydroponics.

PubMed

Saijai, Sakuntala; Ando, Akinori; Inukai, Ryuya; Shinohara, Makoto; Ogawa, Jun

2016-06-27

Nitrifying microbial consortia were enriched from bark compost in a water system by regulating the amounts of organic nitrogen compounds and by controlling the aeration conditions with addition of CaCO 3 for maintaining suitable pH. Repeated enrichment showed reproducible mineralization of organic nitrogen via the conversion of ammonium ions ([Formula: see text]) and nitrite ions ([Formula: see text]) into nitrate ions ([Formula: see text]). The change in microbial composition during the enrichment was investigated by PCR-DGGE analysis with a focus on prokaryote, ammonia-oxidizing bacteria, nitrite-oxidizing bacteria, and eukaryote cell types. The microbial transition had a simple profile and showed clear relation to nitrogen ions transition. Nitrosomonas and Nitrobacter were mainly detected during [Formula: see text] and [Formula: see text] oxidation, respectively. These results revealing representative microorganisms acting in each ammonification and nitrification stages will be valuable for the development of artificial simple microbial consortia for organic hydroponics that consisted of identified heterotrophs and autotrophic nitrifying bacteria.
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-11-01

In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of
Stressful jobs and non-stressful jobs: a cluster analysis of office jobs.

PubMed

Carayon, P

1994-02-01

The purpose of the study was to determine if office jobs could be characterized by a small number of combinations of stressors that could be related to job-title information and self-report of psychological strain. Two-hundred-and-sixty-two office workers from three public service organizations provided data on nine job stressors and seven indicators of psychological strain. Using cluster analysis on the nine stressors, office jobs were classified into three clusters. The first cluster included jobs with high skill utilization, task clarity, job control and social support and low future ambiguity, but also high on job demands such as quantitative work-load, attention and work pressure. The second cluster included jobs with high demands and future ambiguity and low skill utilization, task clarity, job control and social support. The third cluster was intermediary between the first two clusters. The three clusters were related to job-title information. The second cluster was the highest on a range of psychological strain indicators, while the other two clusters were high on certain strain indicators but low on others. The study showed that office jobs could be characterized by a small number of combinations of stressors that were related to job-title information and psychological strain.
Acoustic Enrichment of Extracellular Vesicles from Biological Fluids.

PubMed

Ku, Anson; Lim, Hooi Ching; Evander, Mikael; Lilja, Hans; Laurell, Thomas; Scheding, Stefan; Ceder, Yvonne

2018-06-11

Extracellular vesicles (EVs) have emerged as a rich source of biomarkers providing diagnostic and prognostic information in diseases such as cancer. Large-scale investigations into the contents of EVs in clinical cohorts are warranted, but a major obstacle is the lack of a rapid, reproducible, efficient, and low-cost methodology to enrich EVs. Here, we demonstrate the applicability of an automated acoustic-based technique to enrich EVs, termed acoustic trapping. Using this technology, we have successfully enriched EVs from cell culture conditioned media and urine and blood plasma from healthy volunteers. The acoustically trapped samples contained EVs ranging from exosomes to microvesicles in size and contained detectable levels of intravesicular microRNAs. Importantly, this method showed high reproducibility and yielded sufficient quantities of vesicles for downstream analysis. The enrichment could be obtained from a sample volume of 300 μL or less, an equivalent to 30 min of enrichment time, depending on the sensitivity of downstream analysis. Taken together, acoustic trapping provides a rapid, automated, low-volume compatible, and robust method to enrich EVs from biofluids. Thus, it may serve as a novel tool for EV enrichment from large number of samples in a clinical setting with minimum sample preparation.
Microfluidic droplet enrichment for targeted sequencing

PubMed Central

Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

2015-01-01

Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629
Information-dependent enrichment analysis reveals time-dependent transcriptional regulation of the estrogen pathway of toxicity.

PubMed

Pendse, Salil N; Maertens, Alexandra; Rosenberg, Michael; Roy, Dipanwita; Fasani, Rick A; Vantangoli, Marguerite M; Madnick, Samantha J; Boekelheide, Kim; Fornace, Albert J; Odwin, Shelly-Ann; Yager, James D; Hartung, Thomas; Andersen, Melvin E; McMullen, Patrick D

2017-04-01

The twenty-first century vision for toxicology involves a transition away from high-dose animal studies to in vitro and computational models (NRC in Toxicity testing in the 21st century: a vision and a strategy, The National Academies Press, Washington, DC, 2007). This transition requires mapping pathways of toxicity by understanding how in vitro systems respond to chemical perturbation. Uncovering transcription factors/signaling networks responsible for gene expression patterns is essential for defining pathways of toxicity, and ultimately, for determining the chemical modes of action through which a toxicant acts. Traditionally, transcription factor identification is achieved via chromatin immunoprecipitation studies and summarized by calculating which transcription factors are statistically associated with up- and downregulated genes. These lists are commonly determined via statistical or fold-change cutoffs, a procedure that is sensitive to statistical power and may not be as useful for determining transcription factor associations. To move away from an arbitrary statistical or fold-change-based cutoff, we developed, in the context of the Mapping the Human Toxome project, an enrichment paradigm called information-dependent enrichment analysis (IDEA) to guide identification of the transcription factor network. We used a test case of activation in MCF-7 cells by 17β estradiol (E2). Using this new approach, we established a time course for transcriptional and functional responses to E2. ERα and ERβ were associated with short-term transcriptional changes in response to E2. Sustained exposure led to recruitment of additional transcription factors and alteration of cell cycle machinery. TFAP2C and SOX2 were the transcription factors most highly correlated with dose. E2F7, E2F1, and Foxm1, which are involved in cell proliferation, were enriched only at 24 h. IDEA should be useful for identifying candidate pathways of toxicity. IDEA outperforms gene set enrichment
Clustering analysis of moving target signatures

NASA Astrophysics Data System (ADS)

Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

2010-04-01

Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis.

PubMed

Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar

2015-03-01

Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.
Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

PubMed

Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

2017-12-01

To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
[Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

PubMed

Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

2015-12-01

We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.
Spatiotemporal Clustering Analysis of Malaria Infection in Pakistan.

PubMed

Umer, Muhammad Farooq; Zofeen, Shumaila; Majeed, Abdul; Hu, Wenbiao; Qi, Xin; Zhuang, Guihua

2018-06-07

Despite tremendous progress, malaria remains a serious public health problem in Pakistan. Very few studies have been done on spatiotemporal evaluation of malaria infection in Pakistan. The study aimed to detect the spatiotemporal pattern of malaria infection at the district level in Pakistan, and to identify the clusters of high-risk disease areas in the country. Annual data on malaria for two dominant species ( Plasmodium falciparum , Plasmodium vivax ) and mixed infections from 2011 to 2016 were obtained from the Directorate of Malaria Control Program, Pakistan. Population data were collected from the Pakistan Bureau of Statistics. A geographical information system was used to display the spatial distribution of malaria at the district level throughout Pakistan. Purely spatiotemporal clustering analysis was performed to identify the high-risk areas of malaria infection in Pakistan. A total of 1,593,409 positive cases were included in this study over a period of 6 years (2011⁻2016). The maximum number of P . vivax cases (474,478) were reported in Khyber Pakhtunkhwa (KPK). The highest burden of P . falciparum (145,445) was in Balochistan, while the highest counts of mixed Plasmodium cases were reported in Sindh (22,421) and Balochistan (22,229), respectively. In Balochistan, incidence of all three types of malaria was very high. Cluster analysis showed that primary clusters of P . vivax malaria were in the same districts in 2014, 2015 and 2016 (total 24 districts, 12 in Federally Administered Tribal Areas (FATA), 9 in KPK, 2 in Punjab and 1 in Balochistan); those of P . falciparum malaria were unchanged in 2012 and 2013 (total 18 districts, all in Balochistan), and mixed infections remained the same in 2014 and 2015 (total 7 districts, 6 in Balochistan and 1 in FATA). This study indicated that the transmission cycles of malaria infection vary in different spatiotemporal settings in Pakistan. Efforts in controlling P . vivax malaria in particular need to be
Motif enrichment tool.

PubMed

Blatti, Charles; Sinha, Saurabh

2014-07-01

The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
A QUANTITATIVE ANALYSIS OF DISTANT OPEN CLUSTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Janes, Kenneth A.; Hoq, Sadia

2011-03-15

The oldest open star clusters are important for tracing the history of the Galactic disk, but many of the more distant clusters are heavily reddened and projected against the rich stellar background of the Galaxy. We have undertaken an investigation of several distant clusters (Berkeley 19, Berkeley 44, King 25, NGC 6802, NGC 6827, Berkeley 52, Berkeley 56, NGC 7142, NGC 7245, and King 9) to develop procedures for separating probable cluster members from the background field. We next created a simple quantitative approach for finding approximate cluster distances, reddenings, and ages. We first conclude that with the possible exceptionmore » of King 25 they are probably all physical clusters. We also find that for these distant clusters our typical errors are about {+-}0.07 in E(B - V), {+-}0.15 in log(age), and {+-}0.25 in (m - M){sub o}. The clusters range in age from 470 Myr to 7 Gyr and range from 7.1 to 16.4 kpc from the Galactic center.« less
Development of microsatellite markers from an enriched genomic library for genetic analysis of melon (Cucumis melo L.)

PubMed Central

Ritschel, Patricia Silva; Lins, Tulio Cesar de Lima; Tristan, Rodrigo Lourenço; Buso, Gláucia Salles Cortopassi; Buso, José Amauri; Ferreira, Márcio Elias

2004-01-01

Background Despite the great advances in genomic technology observed in several crop species, the availability of molecular tools such as microsatellite markers has been limited in melon (Cucumis melo L.) and cucurbit species. The development of microsatellite markers will have a major impact on genetic analysis and breeding of melon, especially on the generation of marker saturated genetic maps and implementation of marker assisted breeding programs. Genomic microsatellite enriched libraries can be an efficient alternative for marker development in such species. Results Seven hundred clones containing microsatellite sequences from a Tsp-AG/TC microsatellite enriched library were identified and one-hundred and forty-four primer pairs designed and synthesized. When 67 microsatellite markers were tested on a panel of melon and other cucurbit accessions, 65 revealed DNA polymorphisms among the melon accessions. For some cucurbit species, such as Cucumis sativus, up to 50% of the melon microsatellite markers could be readily used for DNA polymophism assessment, representing a significant reduction of marker development costs. A random sample of 25 microsatellite markers was extracted from the new microsatellite marker set and characterized on 40 accessions of melon, generating an allelic frequency database for the species. The average expected heterozygosity was 0.52, varying from 0.45 to 0.70, indicating that a small set of selected markers should be sufficient to solve questions regarding genotype identity and variety protection. Genetic distances based on microsatellite polymorphism were congruent with data obtained from RAPD marker analysis. Mapping analysis was initiated with 55 newly developed markers and most primers showed segregation according to Mendelian expectations. Linkage analysis detected linkage between 56% of the markers, distributed in nine linkage groups. Conclusions Genomic library microsatellite enrichment is an efficient procedure for marker
Severe or life-threatening asthma exacerbation: patient heterogeneity identified by cluster analysis.

PubMed

Sekiya, K; Nakatani, E; Fukutomi, Y; Kaneda, H; Iikura, M; Yoshida, M; Takahashi, K; Tomii, K; Nishikawa, M; Kaneko, N; Sugino, Y; Shinkai, M; Ueda, T; Tanikawa, Y; Shirai, T; Hirabayashi, M; Aoki, T; Kato, T; Iizuka, K; Homma, S; Taniguchi, M; Tanaka, H

2016-08-01

Severe or life-threatening asthma exacerbation is one of the worst outcomes of asthma because of the risk of death. To date, few studies have explored the potential heterogeneity of this condition. To examine the clinical characteristics and heterogeneity of patients with severe or life-threatening asthma exacerbation. This was a multicentre, prospective study of patients with severe or life-threatening asthma exacerbation and pulse oxygen saturation < 90% who were admitted to 17 institutions across Japan. Cluster analysis was performed using variables from patient- and physician-orientated structured questionnaires. Analysis of data from 175 patients with severe or life-threatening asthma exacerbation revealed five distinct clusters. Cluster 1 (n = 27) was younger-onset asthma with severe symptoms at baseline, including limitation of activities, a higher frequency of treatment with oral corticosteroids and short-acting beta-agonists, and a higher frequency of asthma hospitalizations in the past year. Cluster 2 (n = 35) was predominantly composed of elderly females, with the highest frequency of comorbid, chronic hyperplastic rhinosinusitis/nasal polyposis, and a long disease duration. Cluster 3 (n = 40) was allergic asthma without inhaled corticosteroid use at baseline. Patients in this cluster had a higher frequency of atopy, including allergic rhinitis and furred pet hypersensitivity, and a better prognosis during hospitalization compared with the other clusters. Cluster 4 (n = 34) was characterized by elderly males with concomitant chronic obstructive pulmonary disease (COPD). Although cluster 5 (n = 39) had very mild symptoms at baseline according to the patient questionnaires, 41% had previously been hospitalized for asthma. This study demonstrated that significant heterogeneity exists among patients with severe or life-threatening asthma exacerbation. Differences were observed in the severity of asthma symptoms and use of inhaled corticosteroids at baseline
Clustering ENTLN sferics to improve TGF temporal analysis

NASA Astrophysics Data System (ADS)

Pradhan, E.; Briggs, M. S.; Stanbro, M.; Cramer, E.; Heckman, S.; Roberts, O.

2017-12-01

Using TGFs detected with Fermi Gamma-ray Burst Monitor (GBM) and simultaneous radio sferics detected by Earth Network Total Lightning Network (ENTLN), we establish a temporal co-relation between them. The first step is to find ENTLN strokes that that are closely associated to GBM TGFs. We then identify all the related strokes in the lightning flash that the TGF-associated-stroke belongs to. After trying several algorithms, we found out that the DBSCAN clustering algorithm was best for clustering related ENTLN strokes into flashes. The operation of DBSCAN was optimized using a single seperation measure that combined time and distance seperation. Previous analysis found that these strokes show three timescales with respect to the gamma-ray time. We will use the improved identification of flashes to research this.
Cluster: Mission Overview and End-of-Life Analysis

NASA Technical Reports Server (NTRS)

Pallaschke, S.; Munoz, I.; Rodriquez-Canabal, J.; Sieg, D.; Yde, J. J.

2007-01-01

The Cluster mission is part of the scientific programme of the European Space Agency (ESA) and its purpose is the analysis of the Earth's magnetosphere. The Cluster project consists of four satellites. The selected polar orbit has a shape of 4.0 and 19.2 Re which is required for performing measurements near the cusp and the tail of the magnetosphere. When crossing these regions the satellites form a constellation which in most of the cases so far has been a regular tetrahedron. The satellite operations are carried out by the European Space Operations Centre (ESOC) at Darmstadt, Germany. The paper outlines the future orbit evolution and the envisaged operations from a Flight Dynamics point of view. In addition a brief summary of the LEOP and routine operations is included beforehand.

Cluster-based exposure variation analysis

PubMed Central

2013-01-01

Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate
Transcriptome profiling analysis reveals biomarkers in colon cancer samples of various differentiation

PubMed Central

Yu, Tonghu; Zhang, Huaping; Qi, Hong

2018-01-01

The aim of the present study was to investigate more colon cancer-related genes in different stages. Gene expression profile E-GEOD-62932 was extracted for differentially expressed gene (DEG) screening. Series test of cluster analysis was used to obtain significant trending models. Based on the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases, functional and pathway enrichment analysis were processed and a pathway relation network was constructed. Gene co-expression network and gene signal network were constructed for common DEGs. The DEGs with the same trend were clustered and in total, 16 clusters with statistical significance were obtained. The screened DEGs were enriched into small molecule metabolic process and metabolic pathways. The pathway relation network was constructed with 57 nodes. A total of 328 common DEGs were obtained. Gene signal network was constructed with 71 nodes. Gene co-expression network was constructed with 161 nodes and 211 edges. ABCD3, CPT2, AGL and JAM2 are potential biomarkers for the diagnosis of colon cancer. PMID:29928385
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

PubMed Central

Kuleshov, Maxim V.; Jones, Matthew R.; Rouillard, Andrew D.; Fernandez, Nicolas F.; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L.; Jagodnik, Kathleen M.; Lachmann, Alexander; McDermott, Michael G.; Monteiro, Caroline D.; Gundersen, Gregory W.; Ma'ayan, Avi

2016-01-01

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr. PMID:27141961
Spitzer Clusters

NASA Astrophysics Data System (ADS)

Krick, Kessica

examine the effect that evolutions of cluster redshift and dynamical state have on SFG and AGN in groups and clusters. In addition to environment, we will study the timescales of chemical enrichment of the ICM, using the SFG and AGN as tracers of processes that can transport metals outside of galaxies. Cosmological parameters can be measured based on observing galaxy clusters as signposts of the growth of structure in the universe. The best way to select a redshift independent sample is to use the SZ effect with mm observations to detect a shift in the cosmic microwave background spectrum as those photons scatter off hot gas in clusters. However, such mm observations are contaminated by the emission of SFG and AGN. We intend to characterize the magnitude of this effect on SZ surveys by understanding the frequency, radial distribution, and redshift distribution of these galaxies in clusters. Lastly, a compiled cluster catalog of all Spitzer observed clusters would be useful to the broader astronomical community. We plan to incorporate ancillary multi-wavelength data, where available, and to both publish our catalog in journals, and work with NED to make the catalog easily accessible in an efficient manner by the community.
Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu; Hammond, Karl D.

We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes ofmore » helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.« less
Microfluidic-Based Enrichment and Retrieval of Circulating Tumor Cells for RT-PCR Analysis.

PubMed

Gogoi, Priya; Sepehri, Saedeh; Chow, Will; Handique, Kalyan; Wang, Yixin

2017-01-01

Molecular analysis of circulating tumor cells (CTCs) is hindered by low sensitivity and high level of background leukocytes of currently available CTC enrichment technologies. We have developed a novel device to enrich and retrieve CTCs from blood samples by using a microfluidic chip. The Celsee PREP100 device captures CTCs with high sensitivity and allows the captured CTCs to be retrieved for molecular analysis. It uses the microfluidic chip which has approximately 56,320 capture chambers. Based on differences in cell size and deformability, each chamber ensures that small blood escape while larger CTCs of varying sizes are trapped and isolated in the chambers. In this report, we used the Celsee PREP100 to capture cancer cells spiked into normal donor blood samples. We were able to show that the device can capture as low as 10 cells with high reproducibility. The captured CTCs were retrieved from the microfluidic chip. The cell recovery rate of this back-flow procedure is 100% and the level of remaining background leukocytes is very low (about 300-400 cells). RNA from the retrieved cells are extracted and converted to cDNA, and gene expression analysis of selected cancer markers can be carried out by using RT-PCR assays. The sensitive and easy-to-use Celsee PREP100 system represents a promising technology for capturing and molecular characterization of CTCs.
Tetraspanin-enriched microdomains: a functional unit in cell plasma membranes.

PubMed

Yáñez-Mó, María; Barreiro, Olga; Gordon-Alonso, Mónica; Sala-Valdés, Mónica; Sánchez-Madrid, Francisco

2009-09-01

Membrane lipids and proteins are non-randomly distributed and are unable to diffuse freely in the plane of the membrane. This is because of multiple constraints imposed both by the cortical cytoskeleton and by the preference of lipids and proteins to cluster into diverse and specialized membrane domains, including tetraspanin-enriched microdomains, glycosylphosphatidyl inositol-linked proteins nanodomains and caveolae, among others. Recent biophysical characterization of tetraspanin-enriched microdomains suggests that they might be specially suited for the regulation of avidity of adhesion receptors and the compartmentalization of enzymatic activities. Moreover, modulation by tetraspanins of the function of adhesion receptors involved in inflammation, lymphocyte activation, cancer and pathogen infection suggests potential as therapeutic targets. This review explores this emerging picture of tetraspanin microdomains and discusses the implications for cell adhesion, proteolysis and pathogenesis.
Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

NASA Astrophysics Data System (ADS)

Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

2017-08-01

Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
The history of chemical enrichment in the intracluster medium from cosmological simulations

NASA Astrophysics Data System (ADS)

Biffi, V.; Planelles, S.; Borgani, S.; Fabjan, D.; Rasia, E.; Murante, G.; Tornatore, L.; Dolag, K.; Granato, G. L.; Gaspari, M.; Beck, A. M.

2017-06-01

The distribution of metals in the intracluster medium (ICM) of galaxy clusters provides valuable information on their formation and evolution, on the connection with the cosmic star formation and on the effects of different gas processes. By analysing a sample of simulated galaxy clusters, we study the chemical enrichment of the ICM, its evolution, and its relation with the physical processes included in the simulation and with the thermal properties of the core. These simulations, consisting of re-simulations of 29 Lagrangian regions performed with an upgraded version of the smoothed particle hydrodynamics (SPH) gadget-3 code, have been run including two different sets of baryonic physics: one accounts for radiative cooling, star formation, metal enrichment and supernova (SN) feedback, and the other one further includes the effects of feedback from active galactic nuclei (AGN). In agreement with observations, we find an anti-correlation between entropy and metallicity in cluster cores, and similar radial distributions of heavy-element abundances and abundance ratios out to large cluster-centric distances (˜R180). In the outskirts, namely outside of ˜0.2 R180, we find a remarkably homogeneous metallicity distribution, with almost flat profiles of the elements produced by either SNIa or SNII. We investigated the origin of this phenomenon and discovered that it is due to the widespread displacement of metal-rich gas by early (z > 2-3) AGN powerful bursts, acting on small high-redshift haloes. Our results also indicate that the intrinsic metallicity of the hot gas for this sample is on average consistent with no evolution between z = 2 and z = 0, across the entire radial range.
Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding

PubMed Central

Rajan, Aruna; Freddolino, Peter L.; Schulten, Klaus

2010-01-01

Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous. PMID:20419160
Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.

PubMed

Rajan, Aruna; Freddolino, Peter L; Schulten, Klaus

2010-04-15

Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.
Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

ERIC Educational Resources Information Center

Ho, Hsuan-Fu; Hung, Chia-Chi

2008-01-01

Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…
Magnetite-doped polydimethylsiloxane (PDMS) for phosphopeptide enrichment.

PubMed

Sandison, Mairi E; Jensen, K Tveen; Gesellchen, F; Cooper, J M; Pitt, A R

2014-10-07

Reversible phosphorylation plays a key role in numerous biological processes. Mass spectrometry-based approaches are commonly used to analyze protein phosphorylation, but such analysis is challenging, largely due to the low phosphorylation stoichiometry. Hence, a number of phosphopeptide enrichment strategies have been developed, including metal oxide affinity chromatography (MOAC). Here, we describe a new material for performing MOAC that employs a magnetite-doped polydimethylsiloxane (PDMS), that is suitable for the creation of microwell array and microfluidic systems to enable low volume, high throughput analysis. Incubation time and sample loading were explored and optimized and demonstrate that the embedded magnetite is able to enrich phosphopeptides. This substrate-based approach is rapid, straightforward and suitable for simultaneously performing multiple, low volume enrichments.
Anthropogenic signature of sediment organic matter probed by UV-Visible and fluorescence spectroscopy and the association with heavy metal enrichment.

PubMed

He, Wei; Lee, Jong-Hyun; Hur, Jin

2016-05-01

Sediment organic matter (SOM) was extracted in an alkaline solution from 43 stream sediments in order to explore the anthropogenic signatures. The SOM spectroscopic characteristics including excitation-emission matrix (EEM)-parallel factor analysis (PARAFAC) were compared for five sampling site groups classified by the anthropogenic variables of land use, population density, the loadings of organics and nutrients, and metal enrichment. The conventional spectroscopic characteristics including specific UV absorbance, absorbance ratio, and humification index did not properly discriminate among the different cluster groups except in the case of metal enrichment. Of the four decomposed PARAFAC components, humic-like and tryptophan-like fluorescence responded negatively and positively, respectively, to increasing degrees of the anthropogenic variables except for land use. The anthropogenic enrichment of heavy metals was positively associated with the abundance of tryptophan-like component. In contrast, humic-like component, known to be mostly responsible for metal binding, exhibited a decreasing trend corresponding with metal enrichment. These conflicting trends can be attributed to the overwhelmed effects of the coupled discharges of heavy metals and organic pollutants into sediments. Our study suggests that the PARAFAC components can be used as functional signatures to probe the anthropogenic influences on sediments. Copyright © 2016 Elsevier Ltd. All rights reserved.
Weighing the Giants - I. Weak-lensing masses for 51 massive galaxy clusters: project overview, data analysis methods and cluster images

NASA Astrophysics Data System (ADS)

von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

2014-03-01

This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic
The origin of the Milky Way globular clusters

NASA Astrophysics Data System (ADS)

Renaud, Florent; Agertz, Oscar; Gieles, Mark

2017-03-01

We present a cosmological zoom-in simulation of a Milky Way-like galaxy used to explore the formation and evolution of star clusters. We investigate in particular the origin of the bimodality observed in the colour and metallicity of globular clusters, and the environmental evolution through cosmic times in the form of tidal tensors. Our results self-consistently confirm previous findings that the blue, metal-poor clusters form in satellite galaxies that are accreted on to the Milky Way, while the red, metal-rich clusters form mostly in situ, or, to a lower extent, in massive, self-enriched galaxies merging with the Milky Way. By monitoring the tidal fields these populations experience, we find that clusters formed in situ (generally centrally concentrated) feel significantly stronger tides than the accreted ones, both in the present day, and when averaged over their entire life. Furthermore, we note that the tidal field experienced by Milky Way clusters is significantly weaker in the past than at present day, confirming that it is unlikely that a power-law cluster initial mass function like that of young massive clusters, is transformed into the observed peaked distribution in the Milky Way with relaxation-driven evaporation in a tidal field.
The identification of credit card encoders by hierarchical cluster analysis of the jitters of magnetic stripes.

PubMed

Leung, S C; Fung, W K; Wong, K H

1999-01-01

The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.
A hierarchical model for clustering m(6)A methylation peaks in MeRIP-seq data.

PubMed

Cui, Xiaodong; Meng, Jia; Zhang, Shaowu; Rao, Manjeet K; Chen, Yidong; Huang, Yufei

2016-08-22

The recent advent of the state-of-art high throughput sequencing technology, known as Methylated RNA Immunoprecipitation combined with RNA sequencing (MeRIP-seq) revolutionizes the area of mRNA epigenetics and enables the biologists and biomedical researchers to have a global view of N (6)-Methyladenosine (m(6)A) on transcriptome. Yet there is a significant need for new computation tools for processing and analysing MeRIP-Seq data to gain a further insight into the function and m(6)A mRNA methylation. We developed a novel algorithm and an open source R package ( http://compgenomics.utsa.edu/metcluster ) for uncovering the potential types of m(6)A methylation by clustering the degree of m(6)A methylation peaks in MeRIP-Seq data. This algorithm utilizes a hierarchical graphical model to model the reads account variance and the underlying clusters of the methylation peaks. Rigorous statistical inference is performed to estimate the model parameter and detect the number of clusters. MeTCluster is evaluated on both simulated and real MeRIP-seq datasets and the results demonstrate its high accuracy in characterizing the clusters of methylation peaks. Our algorithm was applied to two different sets of real MeRIP-seq datasets and reveals a novel pattern that methylation peaks with less peak enrichment tend to clustered in the 5' end of both in both mRNAs and lncRNAs, whereas those with higher peak enrichment are more likely to be distributed in CDS and towards the 3'end of mRNAs and lncRNAs. This result might suggest that m(6)A's functions could be location specific. In this paper, a novel hierarchical graphical model based algorithm was developed for clustering the enrichment of methylation peaks in MeRIP-seq data. MeTCluster is written in R and is publicly available.
Abell 1142 and the Missing Central Galaxy – A Cluster in Transition?

NASA Astrophysics Data System (ADS)

Jones, Alexander; Su, Yuanyuan; Buote, David; Forman, William; van Weeren, Reinout; Jones, Christine; Gastaldello, Fabio; Kraft, Ralph; Randall, Scott

2018-01-01

Two types of galaxy clusters exist: cool core (CC) clusters which exhibit centrally-peaked metallicity and X-ray emission and non-cool core (NCC) clusters, possessing comparably homogeneous metallicity and X-ray emission distributions. However, the origin of this dichotomy is still unknown. The current prevailing theories state that either there is a primordial entropy limit, above which a CC is unable to form, or that clusters can change type through major mergers and radiative cooling. Abell 1142 is a galaxy cluster that can provide a unique probe of the root of this cluster-type division. It is formed of two merging sub-clusters, each with its own brightest cluster galaxies (BCG). Its enriched X-ray centroid (possible CC remnant) lies between these two BCGs. We present the thermal and chemical distributions of this system using deep (180ks) XMM-Newton observations to shed light on the role of mergers in the evolution of galaxy clusters.
Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

ERIC Educational Resources Information Center

Chan, Julia Y. K.; Bauer, Christopher F.

2014-01-01

The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

ERIC Educational Resources Information Center

Hale, Robert L.; Dougherty, Donna

1988-01-01

Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…
High-resolution spectroscopic observations of binary stars and yellow stragglers in three open clusters: NGC 2360, NGC 3680, and NGC 5822

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sales Silva, J. V.; Peña Suárez, V. J.; Katime Santrich, O. J.

2014-11-01

Binary stars in open clusters are very useful targets in constraining the nucleosynthesis process. The luminosities of the stars are known because the distances of the clusters are also known, so chemical peculiarities can be linked directly to the evolutionary status of a star. In addition, binary stars offer the opportunity to verify a relationship between them and the straggler population in both globular and open clusters. We carried out a detailed spectroscopic analysis to derive the atmospheric parameters for 16 red giants in binary systems and the chemical composition of 11 of them in the open clusters NGC 2360,more » NGC 3680, and NGC 5822. We obtained abundances of C, N, O, Na, Mg, Al, Ca, Si, Ti, Ni, Cr, Y, Zr, La, Ce, and Nd. The atmospheric parameters of the studied stars and their chemical abundances were determined using high-resolution optical spectroscopy. We employ the local thermodynamic equilibrium model atmospheres of Kurucz and the spectral analysis code MOOG. The abundances of the light elements were derived using the spectral synthesis technique. We found that the stars NGC 2360-92 and 96, NGC 3680-34, and NGC 5822-4 and 312 are yellow straggler stars. We show that the spectra of NGC 5822-4 and 312 present evidence of contamination by an A-type star as a secondary star. For the other yellow stragglers, evidence of contamination is given by the broad wings of the Hα. Detection of yellow straggler stars is important because the observed number can be compared with the number predicted by simulations of binary stellar evolution in open clusters. We also found that the other binary stars are not s-process enriched, which may suggest that in these binaries the secondary star is probably a faint main-sequence object. The lack of any s-process enrichment is very useful in setting constraints for the number of white dwarfs in the open cluster, a subject that is related to the birthrate of these kinds of stars in open clusters and also to the age
Bacterial isolates from polysaccharide enrichments cluster by host origin for Firmicutes but not Bacteroidetes.

USDA-ARS?s Scientific Manuscript database

The intestinal microbiota allows mammals to recover energy stored in plant biomass through fermentation of plant cell walls, primarily cellulose and hemicellulose. Bacteria were isolated from 8 week continuous culture enrichments with cellulose and xylan/pectin from cow (C, n=4), goat (G, n=4), huma...
Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

PubMed

Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

2014-01-01

Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

PubMed Central

Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

2014-01-01

Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683
Analysis of risk factors for cluster behavior of dental implant failures.

PubMed

Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

2017-08-01

Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.
Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

NASA Astrophysics Data System (ADS)

Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

2017-01-01

This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.
AMOEBA clustering revisited. [cluster analysis, classification, and image display program

NASA Technical Reports Server (NTRS)

Bryant, Jack

1990-01-01

A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.
Optimizing disinfection by-product monitoring points in a distribution system using cluster analysis.

PubMed

Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J

2018-06-04

Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.
Identification of different nutritional status groups in institutionalized elderly people by cluster analysis.

PubMed

López-Contreras, María José; López, Maria Ángeles; Canteras, Manuel; Candela, María Emilia; Zamora, Salvador; Pérez-Llamas, Francisca

2014-03-01

To apply a cluster analysis to groups of individuals of similar characteristics in an attempt to identify undernutrition or the risk of undernutrition in this population. A cross-sectional study. Seven public nursing homes in the province of Murcia, on the Mediterranean coast of Spain. 205 subjects aged 65 and older (131 women and 74 men). Dietary intake (energy and nutrients), anthropometric (body mass index, skinfold thickness, mid-arm muscle circumference, mid-arm muscle area, corrected arm muscle area, waist to hip ratio) and biochemical and haematological (serum albumin, transferrin, total cholesterol, total lymphocyte count). Variables were analyzed by cluster analysis. The results of the cluster analysis, including intake, anthropometric and analytical data showed that, of the 205 elderly subjects, 66 (32.2%) were over - weight/obese, 72 (35.1%) had an adequate nutritional status and 67 (32.7%) were undernourished or at risk of undernutrition. The undernourished or at risk of undernutrition group showed the lowest values for dietary intake and the anthropometric and analytical parameters measured. Our study shows that cluster analysis is a useful statistical method for assessing the nutritional status of institutionalized elderly populations. In contrast, use of the specific reference values frequently described in the literature might fail to detect real cases of undernourishment or those at risk of undernutrition. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Analysis of genetic association using hierarchical clustering and cluster validation indices.

PubMed

Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

2017-10-01

It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.
A microfluidic device for label-free, physical capture of circulating tumor cell-clusters

PubMed Central

Sarioglu, A. Fatih; Aceto, Nicola; Kojic, Nikola; Donaldson, Maria C.; Zeinali, Mahnaz; Hamza, Bashar; Engstrom, Amanda; Zhu, Huili; Sundaresan, Tilak K.; Miyamoto, David T.; Luo, Xi; Bardia, Aditya; Wittner, Ben S.; Ramaswamy, Sridhar; Shioda, Toshi; Ting, David T.; Stott, Shannon L.; Kapur, Ravi; Maheswaran, Shyamala; Haber, Daniel A.; Toner, Mehmet

2015-01-01

Cancer cells metastasize through the bloodstream either as single migratory circulating tumor cells (CTCs) or as multicellular groupings (CTC-clusters). Existing technologies for CTC enrichment are designed primarily to isolate single CTCs, and while CTC-clusters are detectable in some cases, their true prevalence and significance remain to be determined. Here, we developed a microchip technology (Cluster-Chip) specifically designed to capture CTC-clusters independent of tumor-specific markers from unprocessed blood. CTC-clusters are isolated through specialized bifurcating traps under low shear-stress conditions that preserve their integrity and even two-cell clusters are captured efficiently. Using the Cluster-Chip, we identify CTC-clusters in 30–40% of patients with metastatic cancers of the breast, prostate and melanoma. RNA sequencing of CTC-clusters confirms their tumor origin and identifies leukocytes within the clusters as tissue-derived macrophages. Together, the development of a device for efficient capture of CTC-clusters will enable detailed characterization of their biological properties and role in cancer metastasis. PMID:25984697
Cluster signal-to-noise analysis for evaluation of the information content in an image.

PubMed

Weerawanich, Warangkana; Shimizu, Mayumi; Takeshita, Yohei; Okamura, Kazutoshi; Yoshida, Shoko; Yoshiura, Kazunori

2018-01-01

(1) To develop an observer-free method of analysing image quality related to the observer performance in the detection task and (2) to analyse observer behaviour patterns in the detection of small mass changes in cone-beam CT images. 13 observers detected holes in a Teflon phantom in cone-beam CT images. Using the same images, we developed a new method, cluster signal-to-noise analysis, to detect the holes by applying various cut-off values using ImageJ and reconstructing cluster signal-to-noise curves. We then evaluated the correlation between cluster signal-to-noise analysis and the observer performance test. We measured the background noise in each image to evaluate the relationship with false positive rates (FPRs) of the observers. Correlations between mean FPRs and intra- and interobserver variations were also evaluated. Moreover, we calculated true positive rates (TPRs) and accuracies from background noise and evaluated their correlations with TPRs from observers. Cluster signal-to-noise curves were derived in cluster signal-to-noise analysis. They yield the detection of signals (true holes) related to noise (false holes). This method correlated highly with the observer performance test (R 2 = 0.9296). In noisy images, increasing background noise resulted in higher FPRs and larger intra- and interobserver variations. TPRs and accuracies calculated from background noise had high correlation with actual TPRs from observers; R 2 was 0.9244 and 0.9338, respectively. Cluster signal-to-noise analysis can simulate the detection performance of observers and thus replace the observer performance test in the evaluation of image quality. Erroneous decision-making increased with increasing background noise.
Understanding the Support Needs of People with Intellectual and Related Developmental Disabilities through Cluster Analysis and Factor Analysis of Statewide Data

ERIC Educational Resources Information Center

Viriyangkura, Yuwadee

2014-01-01

Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…
LA-ICP-MS analysis of isolated phosphatic grains indicates selective rare earth element enrichment during reworking and transport processes

NASA Astrophysics Data System (ADS)

Auer, Gerald; Reuter, Markus; Hauzenberger, Christoph A.; Piller, Werner E.

2016-04-01

water chemistry under certain well constrained circumstances of primary authigenesis. Are these conditions not met, REE patterns are more likely to reflect complex enrichment processes that likely already started to occur during reworking over geologically relatively short time frames. Similarities in the REE patterns of clearly detrital and biogenic phosphate further suggest that the often observed 'hat-shaped' pattern in biogenic phosphates can easily result from increased middle REE (Neodymium to Holmium) scavenging during taphonomic processes prior to final deposition. Finally, cluster analysis coupled with sedimentological considerations proved a valuable tool for the characterization of REE patterns of phosphates in terms of their formation conditions and depositional history, such as the distinction of phosphates formed in situ from reworked and transported phosphate grains.
Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

ERIC Educational Resources Information Center

Brown, S. J.; White, S.; Power, N.

2015-01-01

A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…
Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

NASA Astrophysics Data System (ADS)

Takahashi, A.; Hashimoto, M.

2015-12-01

Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.
Witnessing the Formation of a Brightest Cluster Galaxy in a Nearby X-ray Cluster

NASA Astrophysics Data System (ADS)

Rasmussen, Jesper; Mulchaey, John S.; Bai, Lei; Ponman, Trevor J.; Raychaudhury, Somak; Dariush, Ali

2010-07-01

The central dominant galaxies in galaxy clusters constitute the most massive and luminous galaxies in the universe. Despite this, the formation of these brightest cluster galaxies (BCGs) and the impact of this on the surrounding cluster environment remain poorly understood. Here we present multiwavelength observations of the nearby poor X-ray cluster MZ 10451, in which both processes can be studied in unprecedented detail. Chandra observations of the intracluster medium (ICM) in the cluster core, which harbors two optically bright early-type galaxies in the process of merging, show that the system has retained a cool core and a central metal excess. This suggests that any merger-induced ICM heating and mixing remain modest at this stage. Tidally stripped stars seen around either galaxy likely represent an emerging intracluster light component, and the central ICM abundance enhancement may have a prominent contribution from in situ enrichment provided by these stars. The smaller of the merging galaxies shows evidence for having retained a hot gas halo, along with tentative evidence for some obscured star formation, suggesting that not all BCG major mergers at low redshift are completely dissipationless. Both galaxies are slightly offset from the peak of the ICM emission, with all three lying on an axis that roughly coincides with the large-scale elongation of the ICM. Our data are consistent with a picture in which central BCGs are built up by mergers close to the cluster core, by galaxies infalling on radial orbits aligned with the cosmological filaments feeding the cluster. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.
Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

PubMed

Meghani, Salimah H; Knafl, George J

2017-02-10

To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern
Data Clustering

NASA Astrophysics Data System (ADS)

Wagstaff, Kiri L.

2012-03-01

On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

Bacterial community analysis in chlorpyrifos enrichment cultures via DGGE and use of bacterial consortium for CP biodegradation.

PubMed

Akbar, Shamsa; Sultan, Sikander; Kertesz, Michael

2014-10-01

The organophosphate pesticide chlorpyrifos (CP) has been used extensively since the 1960s for insect control. However, its toxic effects on mammals and persistence in environment necessitate its removal from contaminated sites, biodegradation studies of CP-degrading microbes are therefore of immense importance. Samples from a Pakistani agricultural soil with an extensive history of CP application were used to prepare enrichment cultures using CP as sole carbon source for bacterial community analysis and isolation of CP metabolizing bacteria. Bacterial community analysis (denaturing gradient gel electrophoresis) revealed that the dominant genera enriched under these conditions were Pseudomonas, Acinetobacter and Stenotrophomonas, along with lower numbers of Sphingomonas, Agrobacterium and Burkholderia. Furthermore, it revealed that members of Bacteroidetes, Firmicutes, α- and γ-Proteobacteria and Actinobacteria were present at initial steps of enrichment whereas β-Proteobacteria appeared in later steps and only Proteobacteria were selected by enrichment culturing. However, when CP-degrading strains were isolated from this enrichment culture, the most active organisms were strains of Acinetobacter calcoaceticus, Pseudomonas mendocina and Pseudomonas aeruginosa. These strains degraded 6-7.4 mg L(-1) day(-1) of CP when cultivated in mineral medium, while the consortium of all four strains degraded 9.2 mg L(-1) day(-1) of CP (100 mg L(-1)). Addition of glucose as an additional C source increased the degradation capacity by 8-14 %. After inoculation of contaminated soil with CP (200 mg kg(-1)) disappearance rates were 3.83-4.30 mg kg(-1) day(-1) for individual strains and 4.76 mg kg(-1) day(-1) for the consortium. These results indicate that these organisms are involved in the degradation of CP in soil and represent valuable candidates for in situ bioremediation of contaminated soils and waters.
Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

PubMed

Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

2016-07-01

Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Tox21 Enricher: Web-based Chemical/Biological Functional Annotation Analysis Tool Based on Tox21 Toxicity Screening Platform.

PubMed

Hur, Junguk; Danes, Larson; Hsieh, Jui-Hua; McGregor, Brett; Krout, Dakota; Auerbach, Scott

2018-05-01

The US Toxicology Testing in the 21st Century (Tox21) program was established to develop more efficient and human-relevant toxicity assessment methods. The Tox21 program screens >10,000 chemicals using quantitative high-throughput screening (qHTS) of assays that measure effects on toxicity pathways. To date, more than 70 assays have yielded >12 million concentration-response curves. The patterns of activity across assays can be used to define similarity between chemicals. Assuming chemicals with similar activity profiles have similar toxicological properties, we may infer toxicological properties based on its neighbourhood. One approach to inference is chemical/biological annotation enrichment analysis. Here, we present Tox21 Enricher, a web-based chemical annotation enrichment tool for the Tox21 toxicity screening platform. Tox21 Enricher identifies over-represented chemical/biological annotations among lists of chemicals (neighbourhoods), facilitating the identification of the toxicological properties and mechanisms in the chemical set. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

PubMed

Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

2017-11-01

The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.
Comparative study on gene set and pathway topology-based enrichment methods.

PubMed

Bayerlová, Michaela; Jung, Klaus; Kramer, Frank; Klemm, Florian; Bleckmann, Annalen; Beißbarth, Tim

2015-10-22

Enrichment analysis is a popular approach to identify pathways or sets of genes which are significantly enriched in the context of differentially expressed genes. The traditional gene set enrichment approach considers a pathway as a simple gene list disregarding any knowledge of gene or protein interactions. In contrast, the new group of so called pathway topology-based methods integrates the topological structure of a pathway into the analysis. We comparatively investigated gene set and pathway topology-based enrichment approaches, considering three gene set and four topological methods. These methods were compared in two extensive simulation studies and on a benchmark of 36 real datasets, providing the same pathway input data for all methods. In the benchmark data analysis both types of methods showed a comparable ability to detect enriched pathways. The first simulation study was conducted with KEGG pathways, which showed considerable gene overlaps between each other. In this study with original KEGG pathways, none of the topology-based methods outperformed the gene set approach. Therefore, a second simulation study was performed on non-overlapping pathways created by unique gene IDs. Here, methods accounting for pathway topology reached higher accuracy than the gene set methods, however their sensitivity was lower. We conducted one of the first comprehensive comparative works on evaluating gene set against pathway topology-based enrichment methods. The topological methods showed better performance in the simulation scenarios with non-overlapping pathways, however, they were not conclusively better in the other scenarios. This suggests that simple gene set approach might be sufficient to detect an enriched pathway under realistic circumstances. Nevertheless, more extensive studies and further benchmark data are needed to systematically evaluate these methods and to assess what gain and cost pathway topology information introduces into enrichment analysis. Both
Constraints on the Parental Melts of Enriched Shergottites from Image Analysis and High Pressure Experiments

NASA Technical Reports Server (NTRS)

Collinet, M.; Medard, E.; Devouard, B.; Peslier, A.

2012-01-01

Martian basalts can be classified in at least two geochemically different families: enriched and depleted shergottites. Enriched shergottites are characterized by higher incompatible element concentrations and initial Sr-87/Sr-86 and lower initial Nd-143/Nd-144 and Hf-176/Hf-177 than depleted shergottites [e.g. 1, 2]. It is now generally admitted that shergottites result from the melting of at least two distinct mantle reservoirs [e.g. 2, 3]. Some of the olivine-phyric shergottites (either depleted or enriched), the most magnesian Martian basalts, could represent primitive melts, which are of considerable interest to constrain mantle sources. Two depleted olivine-phyric shergottites, Yamato (Y) 980459 and Northwest Africa (NWA) 5789, are in equilibrium with their most magnesian olivine (Fig. 1) and their bulk rock compositions are inferred to represent primitive melts [4, 5]. Larkman Nunatak (LAR) 06319 [3, 6, 7] and NWA 1068 [8], the most magnesian enriched basalts, have bulk Mg# that are too high to be in equilibrium with their olivine megacryst cores. Parental melt compositions have been estimated by subtracting the most magnesian olivine from the bulk rock composition, assuming that olivine megacrysts have partially accumulated [3, 9]. However, because this technique does not account for the actual petrography of these meteorites, we used image analysis to study these rocks history, reconstruct their parent magma and understand the nature of olivine megacrysts.
Social Media Use and Depression and Anxiety Symptoms: A Cluster Analysis.

PubMed

Shensa, Ariel; Sidani, Jaime E; Dew, Mary Amanda; Escobar-Viera, César G; Primack, Brian A

2018-03-01

Individuals use social media with varying quantity, emotional, and behavioral at- tachment that may have differential associations with mental health outcomes. In this study, we sought to identify distinct patterns of social media use (SMU) and to assess associations between those patterns and depression and anxiety symptoms. In October 2014, a nationally-representative sample of 1730 US adults ages 19 to 32 completed an online survey. Cluster analysis was used to identify patterns of SMU. Depression and anxiety were measured using respective 4-item Patient-Reported Outcome Measurement Information System (PROMIS) scales. Multivariable logistic regression models were used to assess associations between clus- ter membership and depression and anxiety. Cluster analysis yielded a 5-cluster solu- tion. Participants were characterized as "Wired," "Connected," "Diffuse Dabblers," "Concentrated Dabblers," and "Unplugged." Membership in 2 clusters - "Wired" and "Connected" - increased the odds of elevated depression and anxiety symptoms (AOR = 2.7, 95% CI = 1.5-4.7; AOR = 3.7, 95% CI = 2.1-6.5, respectively, and AOR = 2.0, 95% CI = 1.3-3.2; AOR = 2.0, 95% CI = 1.3-3.1, respectively). SMU pattern characterization of a large population suggests 2 pat- terns are associated with risk for depression and anxiety. Developing educational interventions that address use patterns rather than single aspects of SMU (eg, quantity) would likely be useful.
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

NASA Astrophysics Data System (ADS)

Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

2012-12-01

The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system
Cluster Analysis of Longidorus Species (Nematoda: Longidoridae), a New Approach in Species Identification

PubMed Central

Ye, Weimin; Robbins, R. T.

2004-01-01

Hierarchical cluster analysis based on female morphometric character means including body length, distance from vulva opening to anterior end, head width, odontostyle length, esophagus length, body width, tail length, and tail width were used to examine the morphometric relationships and create dendrograms for (i) 62 populations belonging to 9 Longidorus species from Arkansas, (ii) 137 published Longidorus species, and (iii) 137 published Longidorus species plus 86 populations of 16 Longidorus species from Arkansas and various other locations by using JMP 4.02 software (SAS Institute, Cary, NC). Cluster analysis dendograms visually illustrated the grouping and morphometric relationships of the species and populations. It provided a computerized statistical approach to assist by helping to identify and distinguish species, by indicating morphometric relationships among species, and by assisting with new species diagnosis. The preliminary species identification can be accomplished by running cluster analysis for unknown species together with the data matrix of known published Longidorus species. PMID:19262809
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

ERIC Educational Resources Information Center

Carter, Randy L.; And Others

1989-01-01

The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
Dwarf Galaxies in the Coma Cluster. II. Photometry and Analysis

NASA Astrophysics Data System (ADS)

Secker, J.; Harris, W. E.; Plummer, J. D.

1997-12-01

We use the data set derived in our previous paper (Secker & Harris 1997) to study the dwarf galaxy population in the central =~ 700 arcmin(2) of the Coma cluster, the majority of which are early-type dwarf elliptical (dE) galaxies. Analysis of the statistically-decontaminated dE galaxy sequence in the color-magnitude diagram reveals that the mean dE color at R = 18.0 mag is (B-R) =~ 1.4 mag, but that a highly significant trend of color with magnitude exists (Delta (B-R)/Delta R = -0.056+/-0.002 mag) in the sense that fainter dEs are bluer and thus presumably more metal-poor. The mean color of the faintest dEs in our sample is (B-R) =~ 1.15 mag, consistent with a color measurement of the diffuse intracluster light in the Coma core. This intracluster light could then have originated from the tidal disruption of faint dEs in the cluster core. The total galaxy luminosity function (LF) is well modeled as the sum of a log-normal distribution for the giant galaxies, and a Schechter function for the dE galaxies with a faint-end slope alpha = -1.41+/-0.05. This value of alpha is consistent with those measured for the Virgo and Fornax clusters. The spatial distribution of the faint dE galaxies (19.0 < R <= 22.5 mag) is well fit by a standard King model with a central surface density of Sigma_0 = 1.17 dEs arcmin(-2) and a core radius R_c = 22.15 arcmin ( =~ 0.46h(-1) Mpc). This core is significantly larger than the R_c = 13.71 arcmin ( =~ 0.29h(-1) Mpc) found for the cluster giants and the brighter dEs (R <= 19.0 mag), again consistent with the idea that faint dEs in the dense core have been disrupted. Finally, we find that most dEs belong to the general Coma cluster potential rather than as satellites of individual giant galaxies: An analysis of the number counts around 10 cluster giants reveals that they each have on average 4+/- 1 dE companions within a projected radius of 13.9h(-1) kpc. (SECTION: Galaxies)
Hierarchical Spatio-temporal Visual Analysis of Cluster Evolution in Electrocorticography Data

DOE PAGES

Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...

2016-10-02

Here, we present ECoG ClusterFlow, a novel interactive visual analysis tool for the exploration of high-resolution Electrocorticography (ECoG) data. Our system detects and visualizes dynamic high-level structures, such as communities, using the time-varying spatial connectivity network derived from the high-resolution ECoG data. ECoG ClusterFlow provides a multi-scale visualization of the spatio-temporal patterns underlying the time-varying communities using two views: 1) an overview summarizing the evolution of clusters over time and 2) a hierarchical glyph-based technique that uses data aggregation and small multiples techniques to visualize the propagation of clusters in their spatial domain. ECoG ClusterFlow makes it possible 1) tomore » compare the spatio-temporal evolution patterns across various time intervals, 2) to compare the temporal information at varying levels of granularity, and 3) to investigate the evolution of spatial patterns without occluding the spatial context information. Lastly, we present case studies done in collaboration with neuroscientists on our team for both simulated and real epileptic seizure data aimed at evaluating the effectiveness of our approach.« less
Improving Cluster Analysis with Automatic Variable Selection Based on Trees

DTIC Science & Technology

2014-12-01

regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis

PubMed Central

Xu, Rui; Zhen, Zonglei; Liu, Jia

2010-01-01

Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081
The methodology of multi-viewpoint clustering analysis

NASA Technical Reports Server (NTRS)

Mehrotra, Mala; Wild, Chris

1993-01-01

One of the greatest challenges facing the software engineering community is the ability to produce large and complex computer systems, such as ground support systems for unmanned scientific missions, that are reliable and cost effective. In order to build and maintain these systems, it is important that the knowledge in the system be suitably abstracted, structured, and otherwise clustered in a manner which facilitates its understanding, manipulation, testing, and utilization. Development of complex mission-critical systems will require the ability to abstract overall concepts in the system at various levels of detail and to consider the system from different points of view. Multi-ViewPoint - Clustering Analysis MVP-CA methodology has been developed to provide multiple views of large, complicated systems. MVP-CA provides an ability to discover significant structures by providing an automated mechanism to structure both hierarchically (from detail to abstract) and orthogonally (from different perspectives). We propose to integrate MVP/CA into an overall software engineering life cycle to support the development and evolution of complex mission critical systems.
A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data

PubMed Central

Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio

2010-01-01

Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432
Multi-scale visual analysis of time-varying electrocorticography data via clustering of brain regions

DOE PAGES

Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...

2017-06-06

There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.

PubMed

Cheng, Min; Zhang, Wen-jian; Xing, Tian-yu; Yan, Xiao-hong; Li, Yu-mao; Li, Hui; Wang, Ning

2016-08-01

miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19
Analysis of Medium-Chain-Length Polyhydroxyalkanoate-Producing Bacteria in Activated Sludge Samples Enriched by Aerobic Periodic Feeding.

PubMed

Lee, Sun Hee; Kim, Jae Hee; Chung, Chung-Wook; Kim, Do Young; Rhee, Young Ha

2018-04-01

Analysis of mixed microbial populations responsible for the production of medium-chain-length polyhydroxyalkanoates (MCL-PHAs) under periodic substrate feeding in a sequencing batch reactor (SBR) was conducted. Regardless of activated sludge samples and the different MCL alkanoic acids used as the sole external carbon substrate, denaturing gradient gel electrophoresis analysis indicated that Pseudomonas aeruginosa was the dominant bacterium enriched during the SBR process. Several P. aeruginosa strains were isolated from the enriched activated sludge samples. The isolates were subdivided into two groups, one that produced only MCL-PHAs and another that produced both MCL- and short-chain-length PHAs. The SBR periodic feeding experiments with five representative MCL-PHA-producing Pseudomonas species revealed that P. aeruginosa has an advantage over other species that enables it to become dominant in the bacterial community.
Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

PubMed

Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

2011-05-01

Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.

Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

ERIC Educational Resources Information Center

Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

2015-01-01

This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…
Exploring patterns enriched in a dataset with contrastive principal component analysis.

PubMed

Abid, Abubakar; Zhang, Martin J; Bagaria, Vivek K; Zou, James

2018-05-30

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
Thermodynamic Analysis of Oxygen-Enriched Direct Smelting of Jamesonite Concentrate

NASA Astrophysics Data System (ADS)

Zhang, Zhong-Tang; Dai, Xi; Zhang, Wen-Hai

2017-12-01

Thermodynamic analysis of oxygen-enriched direct smelting of jamesonite concentrate is reported in this article. First, the occurrence state of lead, antimony and other metallic elements in the smelting process was investigated theoretically. Then, the verification test was carried out. The results indicate that lead and antimony mainly exist in the alloy in the form of metallic lead and metallic antimony. Simultaneously, lead and antimony were also oxidized into the slag in the form of lead-antimony oxide. Iron and copper could be oxidized into the slag in the form of oxides in addition to combining with antimony in the alloy, while zinc was mainly oxidized into the slag in the form of zinc oxide. The verification test indicates that the main phases in the alloy contain metallic lead, metallic antimony and a small amount of Cu2Sb, FeSb2 intermetallic compounds, and the slag is mainly composed of kirschsteinite, fayalite and zinc oxide, in agreement with the thermodynamic analysis.
Failure Mode Identification Through Clustering Analysis

NASA Technical Reports Server (NTRS)

Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)

2002-01-01

Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.
Galaxy formation through hierarchical clustering

NASA Astrophysics Data System (ADS)

White, Simon D. M.; Frenk, Carlos S.

1991-09-01

Analytic methods for studying the formation of galaxies by gas condensation within massive dark halos are presented. The present scheme applies to cosmogonies where structure grows through hierarchical clustering of a mixture of gas and dissipationless dark matter. The simplest models consistent with the current understanding of N-body work on dissipationless clustering, and that of numerical and analytic work on gas evolution and cooling are adopted. Standard models for the evolution of the stellar population are also employed, and new models for the way star formation heats and enriches the surrounding gas are constructed. Detailed results are presented for a cold dark matter universe with Omega = 1 and H(0) = 50 km/s/Mpc, but the present methods are applicable to other models. The present luminosity functions contain significantly more faint galaxies than are observed.
Combining Multiobjective Optimization and Cluster Analysis to Study Vocal Fold Functional Morphology

PubMed Central

Palaparthi, Anil; Riede, Tobias

2017-01-01

Morphological design and the relationship between form and function have great influence on the functionality of a biological organ. However, the simultaneous investigation of morphological diversity and function is difficult in complex natural systems. We have developed a multiobjective optimization (MOO) approach in association with cluster analysis to study the form-function relation in vocal folds. An evolutionary algorithm (NSGA-II) was used to integrate MOO with an existing finite element model of the laryngeal sound source. Vocal fold morphology parameters served as decision variables and acoustic requirements (fundamental frequency, sound pressure level) as objective functions. A two-layer and a three-layer vocal fold configuration were explored to produce the targeted acoustic requirements. The mutation and crossover parameters of the NSGA-II algorithm were chosen to maximize a hypervolume indicator. The results were expressed using cluster analysis and were validated against a brute force method. Results from the MOO and the brute force approaches were comparable. The MOO approach demonstrated greater resolution in the exploration of the morphological space. In association with cluster analysis, MOO can efficiently explore vocal fold functional morphology. PMID:24771563
Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis

DTIC Science & Technology

2016-10-01

AWARD NUMBER: W81XWH-15-2-0032 TITLE: Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis PRINCIPAL...4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis 5b...Public Release; Distribution Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT The subject of the project is FY14 PRMRP Topic Area – Tinnitus . The broad
A method of using cluster analysis to study statistical dependence in multivariate data

NASA Technical Reports Server (NTRS)

Borucki, W. J.; Card, D. H.; Lyle, G. C.

1975-01-01

A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

PubMed

Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

2016-02-01

Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.
Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

NASA Astrophysics Data System (ADS)

Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

2015-03-01

Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.
Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling

NASA Astrophysics Data System (ADS)

Rahman, Md. Habibur; Matin, M. A.; Salma, Umma

2017-12-01

The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.
Five Genes Encoding Surface-Exposed LPXTG Proteins Are Enriched in Hospital-Adapted Enterococcus faecium Clonal Complex 17 Isolates▿

PubMed Central

Hendrickx, Antoni P. A.; van Wamel, Willem J. B.; Posthuma, George; Bonten, Marc J. M.; Willems, Rob J. L.

2007-01-01

Most Enterococcus faecium isolates associated with hospital outbreaks and invasive infections belong to a distinct genetic subpopulation called clonal complex 17 (CC17). It has been postulated that the genetic evolution of CC17 involves the acquisition of various genes involved in antibiotic resistance, metabolic pathways, and virulence. To gain insight into additional genes that may have favored the rapid emergence of this nosocomial pathogen, we aimed to identify surface-exposed LPXTG cell wall-anchored proteins (CWAPs) specifically enriched in CC17 E. faecium. Using PCR and Southern and dot blot hybridizations, 131 E. faecium isolates (40 CC17 and 91 non-CC17) were screened for the presence of 22 putative CWAP genes identified from the E. faecium TX0016 genome. Five genes encoding LPXTG surface proteins were specifically enriched in E. faecium CC17 isolates. These five LPXTG surface protein genes were found in 28 to 40 (70 to 100%) of CC17 and in only 7 to 24 (8 to 26%) of non-CC17 isolates (P < 0.05). Three of these CWAP genes clustered together on the E. faecium TX0016 genome, which may comprise a novel enterococcal pathogenicity island covering E. faecium contig 609. Expression at the mRNA level was demonstrated, and immunotransmission electron microscopy revealed an association of the five LPXTG surface proteins with the cell wall. Minimal spanning tree analysis based on the presence and absence of 22 CWAP genes revealed grouping of all 40 CC17 strains together with 18 hospital-derived but evolutionary unrelated non-CC17 isolates in a distinct CWAP-enriched cluster, suggesting horizontal transfer of CWAP genes and a role of these CWAPs in hospital adaptation. PMID:17873043
Cluster analysis and subgrouping to investigate inter-individual variability to non-invasive brain stimulation: a systematic review.

PubMed

Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour

2018-01-12

Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
Health and disease phenotyping in old age using a cluster network analysis.

PubMed

Valenzuela, Jesus Felix; Monterola, Christopher; Tong, Victor Joo Chuan; Ng, Tze Pin; Larbi, Anis

2017-11-15

Human ageing is a complex trait that involves the synergistic action of numerous biological processes that interact to form a complex network. Here we performed a network analysis to examine the interrelationships between physiological and psychological functions, disease, disability, quality of life, lifestyle and behavioural risk factors for ageing in a cohort of 3,270 subjects aged ≥55 years. We considered associations between numerical and categorical descriptors using effect-size measures for each variable pair and identified clusters of variables from the resulting pairwise effect-size network and minimum spanning tree. We show, by way of a correspondence analysis between the two sets of clusters, that they correspond to coarse-grained and fine-grained structure of the network relationships. The clusters obtained from the minimum spanning tree mapped to various conceptual domains and corresponded to physiological and syndromic states. Hierarchical ordering of these clusters identified six common themes based on interactions with physiological systems and common underlying substrates of age-associated morbidity and disease chronicity, functional disability, and quality of life. These findings provide a starting point for indepth analyses of ageing that incorporate immunologic, metabolomic and proteomic biomarkers, and ultimately offer low-level-based typologies of healthy and unhealthy ageing.
A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

PubMed

Murray, Nicholas P; Hunfalvay, Melissa

2017-02-01

Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
Structures of small Pd Pt bimetallic clusters by Monte Carlo simulation

NASA Astrophysics Data System (ADS)

Cheng, Daojian; Huang, Shiping; Wang, Wenchuan

2006-11-01

Segregation phenomena of Pd-Pt bimetallic clusters with icosahedral and decahedral structures are investigated by using Monte Carlo method based on the second-moment approximation of the tight-binding (TB-SMA) potentials. The simulation results indicate that the Pd atoms generally lie on the surface of the smaller clusters. The three-shell onion-like structures are observed in 55-atom Pd-Pt bimetallic clusters, in which a single Pd atom is located in the center, and the Pt atoms are in the middle shell, while the Pd atoms are enriched on the surface. With the increase of Pd mole fraction in 55-atom Pd-Pt bimetallic clusters, the Pd atoms occupy the vertices of clusters first, then edge and center sites, and finally the interior shell. It is noticed that some decahedral structures can be transformed into the icosahedron-like structure at 300 and 500 K. Comparisons are made with previous experiments and theoretical studies of Pd-Pt bimetallic clusters.
Time series clustering analysis of health-promoting behavior

NASA Astrophysics Data System (ADS)

Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

2013-10-01

Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

PubMed

Kristunas, Caroline; Morris, Tom; Gray, Laura

2017-11-15

To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review

PubMed Central

Morris, Tom; Gray, Laura

2017-01-01

Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637
Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

NASA Technical Reports Server (NTRS)

Ballew, G.

1977-01-01

The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.