analyses cluster analysis: Topics by Science.gov

Sample records for analyses cluster analysis

Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.

PubMed

Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban

2017-05-01

Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

PubMed

van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

2017-01-01

In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

ERIC Educational Resources Information Center

DiStefano, Christine; Kamphaus, R. W.

2006-01-01

Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
X-ray and optical substructures of the DAFT/FADA survey clusters

NASA Astrophysics Data System (ADS)

Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.

2013-04-01

We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.
Use of multiple cluster analysis methods to explore the validity of a community outcomes concept map.

PubMed

Orsi, Rebecca

2017-02-01

Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

PubMed Central

2010-01-01

Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Defining objective clusters for rabies virus sequences using affinity propagation clustering

PubMed Central

Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

2018-01-01

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361
A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

PubMed

Murray, Nicholas P; Hunfalvay, Melissa

2017-02-01

Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
A Cluster Analytic Approach to Identifying Predictors and Moderators of Psychosocial Treatment for Bipolar Depression: Results from STEP-BD

PubMed Central

Deckersbach, Thilo; Peters, Amy T.; Sylvia, Louisa G.; Gold, Alexandra K.; da Silva Magalhaes, Pedro Vieira; Henry, David B.; Frank, Ellen; Otto, Michael W.; Berk, Michael; Dougherty, Darin D.; Nierenberg, Andrew A.; Miklowitz, David J.

2016-01-01

Background We sought to address how predictors and moderators of psychotherapy for bipolar depression – identified individually in prior analyses – can inform the development of a metric for prospectively classifying treatment outcome in intensive psychotherapy (IP) versus collaborative care (CC) adjunctive to pharmacotherapy in the Systematic Treatment Enhancement Program (STEP-BD) study. Methods We conducted post-hoc analyses on 135 STEP-BD participants using cluster analysis to identify subsets of participants with similar clinical profiles and investigated this combined metric as a moderator and predictor of response to IP. We used agglomerative hierarchical cluster analyses and k-means clustering to determine the content of the clinical profiles. Logistic regression and Cox proportional hazard models were used to evaluate whether the resulting clusters predicted or moderated likelihood of recovery or time until recovery. Results The cluster analysis yielded a two-cluster solution: 1) “less-recurrent/severe” and 2) “chronic/recurrent.” Rates of recovery in IP were similar for less-recurrent/severe and chronic/recurrent participants. Less-recurrent/severe patients were more likely than chronic/recurrent patients to achieve recovery in CC (p = .040, OR = 4.56). IP yielded a faster recovery for chronic/recurrent participants, whereas CC led to recovery sooner in the less-recurrent/severe cluster (p = .034, OR = 2.62). Limitations Cluster analyses require list-wise deletion of cases with missing data so we were unable to conduct analyses on all STEP-BD participants. Conclusions A well-powered, parametric approach can distinguish patients based on illness history and provide clinicians with symptom profiles of patients that confer differential prognosis in CC vs. IP. PMID:27289316
Business and Marketing Cluster. Task Analyses.

ERIC Educational Resources Information Center

Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum and Resource Center.

Developed in Virginia, this publication contains task analysis guides to support selected tech prep programs that prepare students for careers in the business and marketing cluster. Guides are included for accounting systems, legal systems administration, office systems technology, and retail marketing. Each task analyses guide has the following…
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data

NASA Astrophysics Data System (ADS)

Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.

2014-12-01

We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

PubMed

Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

2018-01-01

In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
Coordinate based random effect size meta-analysis of neuroimaging studies.

PubMed

Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J

2017-06-01

Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.
X-ray aspects of the DAFT/FADA clusters

NASA Astrophysics Data System (ADS)

Guennou, L.; Durret, F.; Lima Neto, G. B.; Adami, C.

2012-12-01

We have undertaken the DAFT/FADA survey with the aim of applying constraints on dark energy based on weak lensing tomography as well as obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range [0.4,0.9] for which there are HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. We present preliminary results on the coupled X-ray and dynamical analyses of these clusters.
Cross-scale analysis of cluster correspondence using different operational neighborhoods

NASA Astrophysics Data System (ADS)

Lu, Yongmei; Thill, Jean-Claude

2008-09-01

Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.
Using Cluster Analysis to Compartmentalize a Large Managed Wetland Based on Physical, Biological, and Climatic Geospatial Attributes.

PubMed

Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael

2018-04-27

Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.
A Model-Based Cluster Analysis of Maternal Emotion Regulation and Relations to Parenting Behavior.

PubMed

Shaffer, Anne; Whitehead, Monica; Davis, Molly; Morelen, Diana; Suveg, Cynthia

2017-10-15

In a diverse community sample of mothers (N = 108) and their preschool-aged children (M age = 3.50 years), this study conducted person-oriented analyses of maternal emotion regulation (ER) based on a multimethod assessment incorporating physiological, observational, and self-report indicators. A model-based cluster analysis was applied to five indicators of maternal ER: maternal self-report, observed negative affect in a parent-child interaction, baseline respiratory sinus arrhythmia (RSA), and RSA suppression across two laboratory tasks. Model-based cluster analyses revealed four maternal ER profiles, including a group of mothers with average ER functioning, characterized by socioeconomic advantage and more positive parenting behavior. A dysregulated cluster demonstrated the greatest challenges with parenting and dyadic interactions. Two clusters of intermediate dysregulation were also identified. Implications for assessment and applications to parenting interventions are discussed. © 2017 Family Process Institute.
Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

PubMed Central

Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

2014-01-01

Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182
A Systems Biology Approach for Identifying Hepatotoxicant Groups Based on Similarity in Mechanisms of Action and Chemical Structure.

PubMed

Hebels, Dennie G A J; Rasche, Axel; Herwig, Ralf; van Westen, Gerard J P; Jennen, Danyel G J; Kleinjans, Jos C S

2016-01-01

When evaluating compound similarity, addressing multiple sources of information to reach conclusions about common pharmaceutical and/or toxicological mechanisms of action is a crucial strategy. In this chapter, we describe a systems biology approach that incorporates analyses of hepatotoxicant data for 33 compounds from three different sources: a chemical structure similarity analysis based on the 3D Tanimoto coefficient, a chemical structure-based protein target prediction analysis, and a cross-study/cross-platform meta-analysis of in vitro and in vivo human and rat transcriptomics data derived from public resources (i.e., the diXa data warehouse). Hierarchical clustering of the outcome scores of the separate analyses did not result in a satisfactory grouping of compounds considering their known toxic mechanism as described in literature. However, a combined analysis of multiple data types may hypothetically compensate for missing or unreliable information in any of the single data types. We therefore performed an integrated clustering analysis of all three data sets using the R-based tool iClusterPlus. This indeed improved the grouping results. The compound clusters that were formed by means of iClusterPlus represent groups that show similar gene expression while simultaneously integrating a similarity in structure and protein targets, which corresponds much better with the known mechanism of action of these toxicants. Using an integrative systems biology approach may thus overcome the limitations of the separate analyses when grouping liver toxicants sharing a similar mechanism of toxicity.
[Visual field progression in glaucoma: cluster analysis].

PubMed

Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

2012-11-01

Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

Percolation analyses of observed and simulated galaxy clustering

NASA Astrophysics Data System (ADS)

Bhavsar, S. P.; Barrow, J. D.

1983-11-01

A percolation cluster analysis is performed on equivalent regions of the CFA redshift survey of galaxies and the 4000 body simulations of gravitational clustering made by Aarseth, Gott and Turner (1979). The observed and simulated percolation properties are compared and, unlike correlation and multiplicity function analyses, favour high density (Omega = 1) models with n = - 1 initial data. The present results show that the three-dimensional data are consistent with the degree of filamentary structure present in isothermal models of galaxy formation at the level of percolation analysis. It is also found that the percolation structure of the CFA data is a function of depth. Percolation structure does not appear to be a sensitive probe of intrinsic filamentary structure.
Spatio-temporal analysis of wildfire ignitions in the St. Johns River Water Management District, Florida

Treesearch

Marc G. Genton; David T. Butry; Marcia L. Gumpertz; Jeffrey P. Prestemon

2006-01-01

We analyse the spatio-temporal structure of wildfire ignitions in the St. Johns River Water Management District in north-eastern Florida. We show, using tools to analyse point patterns (e.g. the L-function), that wildfire events occur in clusters. Clustering of these events correlates with irregular distribution of fire ignitions, including lightning...
Sulfur in Cometary Dust

NASA Technical Reports Server (NTRS)

Fomenkova, M. N.

1997-01-01

The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
Non-targeted analyses of animal plasma: betaine and choline represent the nutritional and metabolic status.

PubMed

Katayama, K; Sato, T; Arai, T; Amao, H; Ohta, Y; Ozawa, T; Kenyon, P R; Hickson, R E; Tazaki, H

2013-02-01

Simple liquid chromatography-mass spectrometry (LC-MS) was applied to non-targeted metabolic analyses to discover new metabolic markers in animal plasma. Principle component analysis (PCA) and partial least squares-discriminate analysis (PLS-DA) were used to analyse LC-MS multivariate data. PCA clearly generated two separate clusters for artificially induced diabetic mice and healthy control mice. PLS-DA of time-course changes in plasma metabolites of chicks after feeding generated three clusters (pre- and immediately after feeding, 0.5-3 h after feeding and 4 h after feeding). Two separate clusters were also generated for plasma metabolites of pregnant Angus heifers with differing live-weight change profiles (gaining or losing). The accompanying PLS-DA loading plot detailed the metabolites that contribute the most to the cluster separation. In each case, the same highly hydrophilic metabolite was strongly correlated to the group separation. The metabolite was identified as betaine by LC-MS/MS. This result indicates that betaine and its metabolic precursor, choline, may be useful biomarkers to evaluate the nutritional and metabolic status of animals. © 2011 Blackwell Verlag GmbH.
Whole Genome Sequence and Phylogenetic Analysis Show Helicobacter pylori Strains from Latin America Have Followed a Unique Evolution Pathway

PubMed Central

Muñoz-Ramírez, Zilia Y.; Mendez-Tenorio, Alfonso; Kato, Ikuko; Bravo, Maria M.; Rizzato, Cosmeri; Thorell, Kaisa; Torres, Roberto; Aviles-Jimenez, Francisco; Camorlinga, Margarita; Canzian, Federico; Torres, Javier

2017-01-01

Helicobacter pylori (HP) genetics may determine its clinical outcomes. Despite high prevalence of HP infection in Latin America (LA), there have been no phylogenetic studies in the region. We aimed to understand the structure of HP populations in LA mestizo individuals, where gastric cancer incidence remains high. The genome of 107 HP strains from Mexico, Nicaragua and Colombia were analyzed with 59 publicly available worldwide genomes. To study bacterial relationship on whole genome level we propose a virtual hybridization technique using thousands of high-entropy 13 bp DNA probes to generate fingerprints. Phylogenetic virtual genome fingerprint (VGF) was compared with Multi Locus Sequence Analysis (MLST) and with phylogenetic analyses of cagPAI virulence island sequences. With MLST some Nicaraguan and Mexican strains clustered close to Africa isolates, whereas European isolates were spread without clustering and intermingled with LA isolates. VGF analysis resulted in increased resolution of populations, separating European from LA strains. Furthermore, clusters with exclusively Colombian, Mexican, or Nicaraguan strains were observed, where the Colombian cluster separated from Europe, Asia, and Africa, while Nicaraguan and Mexican clades grouped close to Africa. In addition, a mixed large LA cluster including Mexican, Colombian, Nicaraguan, Peruvian, and Salvadorian strains was observed; all LA clusters separated from the Amerind clade. With cagPAI sequence analyses LA clades clearly separated from Europe, Asia and Amerind, and Colombian strains formed a single cluster. A NeighborNet analyses suggested frequent and recent recombination events particularly among LA strains. Results suggests that in the new world, H. pylori has evolved to fit mestizo LA populations, already 500 years after the Spanish colonization. This co-adaption may account for regional variability in gastric cancer risk. PMID:28293542
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

PubMed Central

Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.

2017-01-01

High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland.

PubMed

Sasidharan, Lekshmi; Wu, Kun-Feng; Menendez, Monica

2015-12-01

One of the major challenges in traffic safety analyses is the heterogeneous nature of safety data, due to the sundry factors involved in it. This heterogeneity often leads to difficulties in interpreting results and conclusions due to unrevealed relationships. Understanding the underlying relationship between injury severities and influential factors is critical for the selection of appropriate safety countermeasures. A method commonly employed to address systematic heterogeneity is to focus on any subgroup of data based on the research purpose. However, this need not ensure homogeneity in the data. In this paper, latent class cluster analysis is applied to identify homogenous subgroups for a specific crash type-pedestrian crashes. The manuscript employs data from police reported pedestrian (2009-2012) crashes in Switzerland. The analyses demonstrate that dividing pedestrian severity data into seven clusters helps in reducing the systematic heterogeneity of the data and to understand the hidden relationships between crash severity levels and socio-demographic, environmental, vehicle, temporal, traffic factors, and main reason for the crash. The pedestrian crash injury severity models were developed for the whole data and individual clusters, and were compared using receiver operating characteristics curve, for which results favored clustering. Overall, the study suggests that latent class clustered regression approach is suitable for reducing heterogeneity and revealing important hidden relationships in traffic safety analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Cluster analysis of phytoplankton data collected from the National Stream Quality Accounting Network in the Tennessee River basin, 1974-81

USGS Publications Warehouse

Stephens, D.W.; Wangsgard, J.B.

1988-01-01

A computer program, Numerical Taxonomy System of Multivariate Statistical Programs (NTSYS), was used with interfacing software to perform cluster analyses of phytoplankton data stored in the biological files of the U.S. Geological Survey. The NTSYS software performs various types of statistical analyses and is capable of handling a large matrix of data. Cluster analyses were done on phytoplankton data collected from 1974 to 1981 at four national Stream Quality Accounting Network stations in the Tennessee River basin. Analysis of the changes in clusters of phytoplankton genera indicated possible changes in the water quality of the French Broad River near Knoxville, Tennessee. At this station, the most common diatom groups indicated a shift in dominant forms with some of the less common diatoms being replaced by green and blue-green algae. There was a reduction in genera variability between 1974-77 and 1979-81 sampling periods. Statistical analysis of chloride and dissolved solids confirmed that concentrations of these substances were smaller in 1974-77 than in 1979-81. At Pickwick Landing Dam, the furthest downstream station used in the study, there was an increase in the number of genera of ' rare ' organisms with time. The appearance of two groups of green and blue-green algae indicated that an increase in temperature or nutrient concentrations occurred from 1974 to 1981, but this could not be confirmed using available water quality data. Associations of genera forming the phytoplankton communities at three stations on the Tennessee River were found to be seasonal. Nodal analysis of combined data from all four stations used in the study did not identify any seasonal or temporal patterns during 1974-81. Cluster analysis using the NYSYS programs was effective in reducing the large phytoplankton data set to a manageable size and provided considerable insight into the structure of phytoplankton communities in the Tennessee River basin. Problems encountered using cluster analysis were the subjectivity introduced in the definition of meaningful clusters, and the lack of taxonomic identification to the species level. (Author 's abstract)
Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

PubMed

Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

2017-12-01

To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

PubMed

Griss, Johannes; Perez-Riverol, Yasset; The, Matthew; Käll, Lukas; Vizcaíno, Juan Antonio

2018-05-04

In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
[Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

PubMed

Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

2016-09-01

To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire(SGRQ) score, acute exacerbation in the past one year, PEF variability and allergic dermatitis (P<0.05). (2) Four clusters were also identified by two-step cluster analysis as followings, cluster 1, COPD patients with moderate to severe airflow limitation; cluster 2, asthma and COPD patients with heavy smoking, airflow limitation and increased airways reversibility; cluster 3, patients having less smoking and normal pulmonary function with wheezing but no chronic cough; cluster 4, chronic bronchitis patients with normal pulmonary function and chronic cough. Significant differences were revealed regarding gender distribution, respiratory symptoms, pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, MMEF% pred, DLCO/VA% pred, RV% pred, PEF variability, total serum IgE level, cumulative tobacco cigarette consumption (pack-years), and SGRQ score (P<0.05). By different cluster analyses, distinct clinical phenotypes of chronic airway diseases are identified. Thus, individualized treatments may guide doctors to provide based on different phenotypes.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

PubMed Central

Diaz-Ordaz, Karla; Bartlett, Jonathan W

2016-01-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

PubMed

Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

2017-06-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Clusters of midlife women by physical activity and their racial/ethnic differences.

PubMed

Im, Eun-Ok; Ko, Young; Chee, Eunice; Chee, Wonshik; Mao, Jun James

2017-04-01

The purpose of this study was to identify clusters of midlife women by physical activity and to determine racial/ethnic differences in physical activities in each cluster. This was a secondary analysis of the data from 542 women (157 non-Hispanic [NH] Whites, 127 Hispanics, 135 NH African Americans, and 123 NH Asian) in a larger Internet study on midlife women's attitudes toward physical activity. The instruments included the Barriers to Health Activities Scale, the Physical Activity Assessment Inventory, the Questions on Attitudes toward Physical Activity, Subjective Norm, Perceived Behavioral Control, and Behavioral Intention, and the Kaiser Physical Activity Survey. The data were analyzed using hierarchical cluster analyses, analysis of variance, and multinominal logistic analyses. A three-cluster solution was adopted: cluster 1 (high active living and sports/exercise activity group; 48%), cluster 2 (high household/caregiving and occupational activity group; 27%), and cluster 3 (low active living and sports/exercise activity group; 26%). There were significant racial/ethnic differences in occupational activities of clusters 1 and 3 (all P < 0.01). Compared with cluster 1, cluster 2 tended to have lower family income, less access to health care, higher unemployment, higher perceived barriers scores, and lower social influences scores (all P < 0.01). Compared with cluster 1, cluster 3 tended to have greater obesity, less access to health care, higher perceived barriers scores, more negative attitudes toward physical activity, and lower self-efficacy scores (all P < 0.01). Midlife women's unique patterns of physical activity and their associated factors need to be considered in future intervention development.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA

2008-05-12

The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Use of Spatial Epidemiology and Hot Spot Analysis to Target Women Eligible for Prenatal Women, Infants, and Children Services

PubMed Central

Krawczyk, Christopher; Gradziel, Pat; Geraghty, Estella M.

2014-01-01

Objectives. We used a geographic information system and cluster analyses to determine locations in need of enhanced Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) Program services. Methods. We linked documented births in the 2010 California Birth Statistical Master File with the 2010 data from the WIC Integrated Statewide Information System. Analyses focused on the density of pregnant women who were eligible for but not receiving WIC services in California’s 7049 census tracts. We used incremental spatial autocorrelation and hot spot analyses to identify clusters of WIC-eligible nonparticipants. Results. We detected clusters of census tracts with higher-than-expected densities, compared with the state mean density of WIC-eligible nonparticipants, in 21 of 58 (36.2%) California counties (P < .05). In subsequent county-level analyses, we located neighborhood-level clusters of higher-than-expected densities of eligible nonparticipants in Sacramento, San Francisco, Fresno, and Los Angeles Counties (P < .05). Conclusions. Hot spot analyses provided a rigorous and objective approach to determine the locations of statistically significant clusters of WIC-eligible nonparticipants. Results helped inform WIC program and funding decisions, including the opening of new WIC centers, and offered a novel approach for targeting public health services. PMID:24354821
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia

NASA Astrophysics Data System (ADS)

Novak, Vibor; Renema, Willem

2018-01-01

To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
Cluster signal-to-noise analysis for evaluation of the information content in an image.

PubMed

Weerawanich, Warangkana; Shimizu, Mayumi; Takeshita, Yohei; Okamura, Kazutoshi; Yoshida, Shoko; Yoshiura, Kazunori

2018-01-01

(1) To develop an observer-free method of analysing image quality related to the observer performance in the detection task and (2) to analyse observer behaviour patterns in the detection of small mass changes in cone-beam CT images. 13 observers detected holes in a Teflon phantom in cone-beam CT images. Using the same images, we developed a new method, cluster signal-to-noise analysis, to detect the holes by applying various cut-off values using ImageJ and reconstructing cluster signal-to-noise curves. We then evaluated the correlation between cluster signal-to-noise analysis and the observer performance test. We measured the background noise in each image to evaluate the relationship with false positive rates (FPRs) of the observers. Correlations between mean FPRs and intra- and interobserver variations were also evaluated. Moreover, we calculated true positive rates (TPRs) and accuracies from background noise and evaluated their correlations with TPRs from observers. Cluster signal-to-noise curves were derived in cluster signal-to-noise analysis. They yield the detection of signals (true holes) related to noise (false holes). This method correlated highly with the observer performance test (R 2 = 0.9296). In noisy images, increasing background noise resulted in higher FPRs and larger intra- and interobserver variations. TPRs and accuracies calculated from background noise had high correlation with actual TPRs from observers; R 2 was 0.9244 and 0.9338, respectively. Cluster signal-to-noise analysis can simulate the detection performance of observers and thus replace the observer performance test in the evaluation of image quality. Erroneous decision-making increased with increasing background noise.
Minimum number of clusters and comparison of analysis methods for cross sectional stepped wedge cluster randomised trials with binary outcomes: A simulation study.

PubMed

Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick

2017-03-09

Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

PubMed

Laetsch, Dominik R; Blaxter, Mark L

2017-10-05

The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadgu, Teklu; Appel, Gordon John

Sandia National Laboratories (SNL) continued evaluation of total system performance assessment (TSPA) computing systems for the previously considered Yucca Mountain Project (YMP). This was done to maintain the operational readiness of the computing infrastructure (computer hardware and software) and knowledge capability for total system performance assessment (TSPA) type analysis, as directed by the National Nuclear Security Administration (NNSA), DOE 2010. This work is a continuation of the ongoing readiness evaluation reported in Lee and Hadgu (2014) and Hadgu et al. (2015). The TSPA computing hardware (CL2014) and storage system described in Hadgu et al. (2015) were used for the currentmore » analysis. One floating license of GoldSim with Versions 9.60.300, 10.5 and 11.1.6 was installed on the cluster head node, and its distributed processing capability was mapped on the cluster processors. Other supporting software were tested and installed to support the TSPA-type analysis on the server cluster. The current tasks included verification of the TSPA-LA uncertainty and sensitivity analyses, and preliminary upgrade of the TSPA-LA from Version 9.60.300 to the latest version 11.1. All the TSPA-LA uncertainty and sensitivity analyses modeling cases were successfully tested and verified for the model reproducibility on the upgraded 2014 server cluster (CL2014). The uncertainty and sensitivity analyses used TSPA-LA modeling cases output generated in FY15 based on GoldSim Version 9.60.300 documented in Hadgu et al. (2015). The model upgrade task successfully converted the Nominal Modeling case to GoldSim Version 11.1. Upgrade of the remaining of the modeling cases and distributed processing tasks will continue. The 2014 server cluster and supporting software systems are fully operational to support TSPA-LA type analysis.« less
Assessment of hybridization among wild and cultivated Vigna unguiculata subspecies revealed by arbitrarily primed polymerase chain reaction analysis

PubMed Central

Vijaykumar, Archana; Saini, Ajay; Jawali, Narendra

2012-01-01

Background and aims Intra-species hybridization and incompletely homogenized ribosomal RNA repeat units have earlier been reported in 21 accessions of Vigna unguiculata from six subspecies using internal transcribed spacer (ITS) and 5S intergenic spacer (IGS) analyses. However, the relationships among these accessions were not clear from these analyses. We therefore assessed intra-species hybridization in the same set of accessions. Methodology Arbitrarily primed polymerase chain reaction (AP-PCR) analysis was carried out using 12 primers. The PCR products were resolved on agarose gels and the DNA fragments were scored manually. Genetic relationships were inferred by TREECON software using unweighted paired group method with arithmetic averages (UPGMA) cluster analysis evaluated by bootstrapping and compared with previous analyses based on ITS and 5S IGS. Principal results A total of 202 (86 %) fragments were found to be polymorphic and used for generating a genetic distance matrix. Twenty-one V. unguiculata accessions were grouped into three main clusters. The cultivated subspecies (var. unguiculata) and most of its wild progenitors (var. spontanea) were placed in cluster I along with ssp. pubescens and ssp. stenophylla. Whereas var. spontanea were grouped with ssp. alba and ssp. tenuis accessions in cluster II, ssp. alba and ssp. baoulensis were included in cluster III. Close affinities of ssp. unguiculata, ssp. alba and ssp. tenuis suggested inter-subspecies hybridization. Conclusions Multi-locus AP-PCR analysis reveals that intra-species hybridization is prevalent among V. unguiculata subspecies and suggests that grouping of accessions from two different subspecies is not solely due to the similarity in the ITS and 5S IGS regions but also due to other regions of the genome. PMID:22619698
Analysis of genetic diversity in banana cultivars (Musa cvs.) from the South of Oman using AFLP markers and classification by phylogenetic, hierarchical clustering and principal component analyses*

PubMed Central

Opara, Umezuruike Linus; Jacobson, Dan; Al-Saady, Nadiya Abubakar

2010-01-01

Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs. This study employed amplified fragment length polymorphism (AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman. Using 12 primer combinations, a total of 1094 bands were scored, of which 1012 were polymorphic. Eighty-two unique markers were identified, which revealed the distinct separation of the seven cultivars. The results obtained show that AFLP can be used to differentiate the banana cultivars. Further classification by phylogenetic, hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis. Based on the analytical results, a consensus dendrogram of the banana cultivars is presented. PMID:20443211
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

PubMed

Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

2015-01-01

To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
Ecological characteristics of Simulium breeding sites in West Africa.

PubMed

Cheke, Robert A; Young, Stephen; Garms, Rolf

2017-03-01

Twenty-nine taxa of Simulium were identified amongst 527 collections of larvae and pupae from untreated rivers and streams in Liberia (362 collections in 1967-71 & 1989), Togo (125 in 1979-81), Benin (35 in 1979-81) and Ghana (5 in 1980-81). Presence or absence of associations between different taxa were used to group them into six clusters using Ward agglomerative hierarchical cluster analysis. Environmental data associated with the pre-imaginal habitats were then analysed in relation to the six clusters by one way ANOVA. The results revealed significant effects in determining the clusters of maximum river width (all P<0.001 unless stated otherwise), water temperature, dry bulb air temperature, relative humidity, altitude, type of water (on a range from trickle to large river), water level, slope, current, vegetation, light conditions, discharge, length of breeding area, environs, terrain, river bed type (P<0.01), and the supports to which the insects were attached (P<0.01). When four non-significant contributors (wet bulb temperature, river features, height of waterfall and depth) were excluded and the reduced data-set analysed by principal components analysis (PCA), the first two principal components (PCs) accounted for 87% of the variance, with geographical features dominant in PC1 and hydrological characteristics in PC2. The analyses also revealed the ecological characteristics of each taxon's pre-imaginal habitats, which are discussed with particular reference to members of the Simulium damnosum species complex, whose breeding site distributions were further analysed by canonical correspondence analysis (CCA), a method also applied to the data on non-vector species. Copyright © 2016 Elsevier B.V. All rights reserved.
Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method.

PubMed

Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

2014-07-01

The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways and calculating marginal STR distributions. We found that the Y-STR haplotypes from e.g. Finland were relatively homogeneous as opposed to the relatively heterogeneous Y-STR haplotypes from e.g. Lublin, Eastern Poland and Berlin, Germany. We demonstrated that the observed distributions of alleles at each locus were similar to the expected ones. We also compared pairwise distances between geographically separated samples from Africa with those obtained using the AMOVA method and found good agreement. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Cluster Analysis of Adolescent Blogs

ERIC Educational Resources Information Center

Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

2012-01-01

Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…
An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

PubMed

van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

2009-06-01

Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

PubMed

Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

2011-05-01

Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.
Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer.

PubMed

Robertson, A Gordon; Kim, Jaegil; Al-Ahmadie, Hikmat; Bellmunt, Joaquim; Guo, Guangwu; Cherniack, Andrew D; Hinoue, Toshinori; Laird, Peter W; Hoadley, Katherine A; Akbani, Rehan; Castro, Mauro A A; Gibb, Ewan A; Kanchi, Rupa S; Gordenin, Dmitry A; Shukla, Sachet A; Sanchez-Vega, Francisco; Hansel, Donna E; Czerniak, Bogdan A; Reuter, Victor E; Su, Xiaoping; de Sa Carvalho, Benilton; Chagas, Vinicius S; Mungall, Karen L; Sadeghi, Sara; Pedamallu, Chandra Sekhar; Lu, Yiling; Klimczak, Leszek J; Zhang, Jiexin; Choo, Caleb; Ojesina, Akinyemi I; Bullman, Susan; Leraas, Kristen M; Lichtenberg, Tara M; Wu, Catherine J; Schultz, Nicholaus; Getz, Gad; Meyerson, Matthew; Mills, Gordon B; McConkey, David J; Weinstein, John N; Kwiatkowski, David J; Lerner, Seth P

2017-10-19

We report a comprehensive analysis of 412 muscle-invasive bladder cancers characterized by multiple TCGA analytical platforms. Fifty-eight genes were significantly mutated, and the overall mutational load was associated with APOBEC-signature mutagenesis. Clustering by mutation signature identified a high-mutation subset with 75% 5-year survival. mRNA expression clustering refined prior clustering analyses and identified a poor-survival "neuronal" subtype in which the majority of tumors lacked small cell or neuroendocrine histology. Clustering by mRNA, long non-coding RNA (lncRNA), and miRNA expression converged to identify subsets with differential epithelial-mesenchymal transition status, carcinoma in situ scores, histologic features, and survival. Our analyses identified 5 expression subtypes that may stratify response to different treatments. Copyright © 2017 Elsevier Inc. All rights reserved.
[Cluster analysis applicability to fitness evaluation of cosmonauts on long-term missions of the International space station].

PubMed

Egorov, A D; Stepantsov, V I; Nosovskiĭ, A M; Shipov, A A

2009-01-01

Cluster analysis was applied to evaluate locomotion training (running and running intermingled with walking) of 13 cosmonauts on long-term ISS missions by the parameters of duration (min), distance (m) and intensity (km/h). Based on the results of analyses, the cosmonauts were distributed into three steady groups of 2, 5 and 6 persons. Distance and speed showed a statistical rise (p < 0.03) from group 1 to group 3. Duration of physical locomotion training was not statistically different in the groups (p = 0.125). Therefore, cluster analysis is an adequate method of evaluating fitness of cosmonauts on long-term missions.
Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?

PubMed

Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J

2018-04-01

Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.
A scoping review of spatial cluster analysis techniques for point-event data.

PubMed

Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott

2013-05-01

Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.
Characterization of Oxygen Defect Clusters in UO2+ x Using Neutron Scattering and PDF Analysis.

PubMed

Ma, Yue; Garcia, Philippe; Lechelle, Jacques; Miard, Audrey; Desgranges, Lionel; Baldinozzi, Gianguido; Simeone, David; Fischer, Henry E

2018-06-18

In hyper-stoichiometric uranium oxide, both neutron diffraction work and, more recently, theoretical analyses report the existence of clusters such as the 2:2:2 cluster, comprising two anion vacancies and two types of anion interstitials. However, little is known about whether there exists a region of low deviation-from-stoichiometry in which defects remain isolated, or indeed whether at high deviation-from-stoichiometry defect clusters prevail that contain more excess oxygen atoms than the di-interstitial cluster. In this study, we report pair distribution function (PDF) analyses of UO 2 and UO 2+ x ( x ≈ 0.007 and x ≈ 0.16) samples obtained from high-temperature in situ neutron scattering experiments. PDF refinement for the lower deviation from stoichiometry sample suggests the system is too dilute to differentiate between isolated defects and di-interstitial clusters. For the UO 2.16 sample, several defect structures are tested, and it is found that the data are best represented assuming the presence of center-occupied cuboctahedra.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

PubMed Central

Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

2015-01-01

Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
Clustering Analysis of Antibiograms and Antibiogram Types of Streptococcus agalactiae Strains from Tilapia in China.

PubMed

Liu, Chan; Feng, Juan; Zhang, Defeng; Xie, Yundan; Li, Anxing; Wang, Jiangyong; Su, Youlu

2018-05-11

In view of the changing antibiotic-resistance profiles of Streptococcus agalactiae from tilapia in China, antimicrobial susceptibilities of 75 S. agalactiae strains were determined by the disc diffusion method, and cluster analyses of the antibiograms and antibiogram types were performed. All strains displayed multidrug resistance (MDR). The antimicrobial-resistance rates were highest (>90%) to aminoglycosides, sulfonamides, pipemidic acid, and norfloxacin, followed by penicillin, ampicillin, and ciprofloxacin (26.7-38.7%); those to furadantin, lincomycin, erythromycin, ofloxacin, tetracycline, and florfenicol were low (<10%), and no resistance to vancomycin, cefalexin, cefoxitin, amoxicillin, medemycin, doxitard, oxytetracycline, rifampin, chloramphenicol, or thiamphenicol was detected. Statistical analysis showed that the resistance rate to ciprofloxacin increased significantly in 2016 (p = 0.009), whereas that to trimethoprim/sulfamethoxazole decreased (p = 0.017). Cluster analyses identified that the strains had 23 antibiogram types (A-W) and clustered in five groups (Groups I-V). The strains with higher antimicrobial resistance mainly clustered in Groups I and II. Our results show that the antibiograms varied with time and by location and that antibiogram types are constantly updating and expanding. Effective measures must be taken to reduce the antimicrobial resistance and spread of MDR strains.
Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

PubMed

Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

2014-07-01

Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.
HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

NASA Astrophysics Data System (ADS)

Schellenberger, G.; Reiprich, T. H.

2017-08-01

The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).
Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

NASA Astrophysics Data System (ADS)

Takahashi, A.; Hashimoto, M.

2015-12-01

Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.
Millon Clinical Multiaxial Inventory–III Subtypes of Opioid Dependence: Validity and Matching to Behavioral Therapies

PubMed Central

Ball, Samuel A.; Nich, Charla; Rounsaville, Bruce J.; Eagan, Dorothy; Carroll, Kathleen M.

2013-01-01

The concurrent and predictive validity of 2 different methods of Millon Clinical Multiaxial Inventory–III subtyping (protocol sorting, cluster analysis) was evaluated in 125 recently detoxified opioid-dependent outpatients in a 12-week randomized clinical trial. Participants received naltrexone and relapse prevention group counseling and were assigned to 1 of 3 intervention conditions: (a) no-incentive vouchers, (b) incentive vouchers alone, or (c) incentive vouchers plus relationship counseling. Affective disturbance was the most common Axis I protocol-sorted subtype (66%), antisocial–narcissistic was the most common Axis II subtype (46%), and cluster analysis suggested that a 2-cluster solution (high vs. low psychiatric severity) was optimal. Predictive validity analyses indicated less symptom improvement for the higher problem subtypes, and patient treatment matching analyses indicated that some subtypes had better outcomes in the no-incentive voucher conditions. PMID:15301655

Morphological and Inter Simple Sequence Repeat (ISSR) markers analyses of Corynespora cassiicola isolates from rubber plantations in Malaysia.

PubMed

Nghia, Nguyen Anh; Kadir, Jugah; Sunderasan, E; Puad Abdullah, Mohd; Malik, Adam; Napis, Suhaimi

2008-10-01

Morphological features and Inter Simple Sequence Repeat (ISSR) polymorphism were employed to analyse 21 Corynespora cassiicola isolates obtained from a number of Hevea clones grown in rubber plantations in Malaysia. The C. cassiicola isolates used in this study were collected from several states in Malaysia from 1998 to 2005. The morphology of the isolates was characteristic of that previously described for C. cassiicola. Variations in colony and conidial morphology were observed not only among isolates but also within a single isolate with no inclination to either clonal or geographical origin of the isolates. ISSR analysis delineated the isolates into two distinct clusters. The dendrogram created from UPGMA analysis based on Nei and Li's coefficient (calculated from the binary matrix data of 106 amplified DNA bands generated from 8 ISSR primers) showed that cluster 1 encompasses 12 isolates from the states of Johor and Selangor (this cluster was further split into 2 sub clusters (1A, 1B), sub cluster 1B consists of a unique isolate, CKT05D); while cluster 2 comprises of 9 isolates that were obtained from the other states. Detached leaf assay performed on selected Hevea clones showed that the pathogenicity of representative isolates from cluster 1 (with the exception of CKT05D) resembled that of race 1; and isolates in cluster 2 showed pathogenicity similar to race 2 of the fungus that was previously identified in Malaysia. The isolate CKT05D from sub cluster 1B showed pathogenicity dissimilar to either race 1 or race 2.
Spatial variation of volcanic rock geochemistry in the Virunga Volcanic Province: Statistical analysis of an integrated database

NASA Astrophysics Data System (ADS)

Barette, Florian; Poppe, Sam; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu

2017-10-01

We present an integrated, spatially-explicit database of existing geochemical major-element analyses available from (post-) colonial scientific reports, PhD Theses and international publications for the Virunga Volcanic Province, located in the western branch of the East African Rift System. This volcanic province is characterised by alkaline volcanism, including silica-undersaturated, alkaline and potassic lavas. The database contains a total of 908 geochemical analyses of eruptive rocks for the entire volcanic province with a localisation for most samples. A preliminary analysis of the overall consistency of the database, using statistical techniques on sets of geochemical analyses with contrasted analytical methods or dates, demonstrates that the database is consistent. We applied a principal component analysis and cluster analysis on whole-rock major element compositions included in the database to study the spatial variation of the chemical composition of eruptive products in the Virunga Volcanic Province. These statistical analyses identify spatially distributed clusters of eruptive products. The known geochemical contrasts are highlighted by the spatial analysis, such as the unique geochemical signature of Nyiragongo lavas compared to other Virunga lavas, the geochemical heterogeneity of the Bulengo area, and the trachyte flows of Karisimbi volcano. Most importantly, we identified separate clusters of eruptive products which originate from primitive magmatic sources. These lavas of primitive composition are preferentially located along NE-SW inherited rift structures, often at distance from the central Virunga volcanoes. Our results illustrate the relevance of a spatial analysis on integrated geochemical data for a volcanic province, as a complement to classical petrological investigations. This approach indeed helps to characterise geochemical variations within a complex of magmatic systems and to identify specific petrologic and geochemical investigations that should be tackled within a study area.
The molecular epidemiology of HIV-1 in the Comunidad Valenciana (Spain): analysis of transmission clusters.

PubMed

Patiño-Galindo, Juan Ángel; Torres-Puente, Manoli; Bracho, María Alma; Alastrué, Ignacio; Juan, Amparo; Navarro, David; Galindo, María José; Ocete, Dolores; Ortega, Enrique; Gimeno, Concepción; Belda, Josefina; Domínguez, Victoria; Moreno, Rosario; González-Candelas, Fernando

2017-09-14

HIV infections are still a very serious concern for public heath worldwide. We have applied molecular evolution methods to study the HIV-1 epidemics in the Comunidad Valenciana (CV, Spain) from a public health surveillance perspective. For this, we analysed 1804 HIV-1 sequences comprising protease and reverse transcriptase (PR/RT) coding regions, sampled between 2004 and 2014. These sequences were subtyped and subjected to phylogenetic analyses in order to detect transmission clusters. In addition, univariate and multinomial comparisons were performed to detect epidemiological differences between HIV-1 subtypes, and risk groups. The HIV epidemic in the CV is dominated by subtype B infections among local men who have sex with men (MSM). 270 transmission clusters were identified (>57% of the dataset), 12 of which included ≥10 patients; 11 of subtype B (9 affecting MSMs) and one (n = 21) of CRF14, affecting predominately intravenous drug users (IDUs). Dated phylogenies revealed these large clusters to have originated from the mid-80s to the early 00 s. Subtype B is more likely to form transmission clusters than non-B variants and MSMs to cluster than other risk groups. Multinomial analyses revealed an association between non-B variants, which are not established in the local population yet, and different foreign groups.
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

PubMed Central

2009-01-01

Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

PubMed

Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

2009-10-12

Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.
The Relationship among the Six Vocational Identity Statuses and Five Dimensions of Planned Happenstance Career Skills

ERIC Educational Resources Information Center

Rhee, Eunjeong; Lee, Bo Hyun; Kim, Boyoung; Ha, Gyuyoung; Lee, Sang Min

2016-01-01

The current study investigated how the five components of planned happenstance skills are related to vocational identity statuses. For determination of relationships, cluster and discriminant analyses were conducted sequentially on a sample of 515 university students in South Korea. Cluster analysis revealed vocational identity statuses to be…
Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

PubMed

Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

2014-11-01

Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Combined Analyses of Bacterial, Fungal and Nematode Communities in Andosolic Agricultural Soils in Japan

PubMed Central

Bao, Zhihua; Ikunaga, Yoko; Matsushita, Yuko; Morimoto, Sho; Takada-Hoshino, Yuko; Okada, Hiroaki; Oba, Hirosuke; Takemoto, Shuhei; Niwa, Shigeru; Ohigashi, Kentaro; Suzuki, Chika; Nagaoka, Kazunari; Takenaka, Makoto; Urashima, Yasufumi; Sekiguchi, Hiroyuki; Kushida, Atsuhiko; Toyota, Koki; Saito, Masanori; Tsushima, Seiya

2012-01-01

We simultaneously examined the bacteria, fungi and nematode communities in Andosols from four agro-geographical sites in Japan using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and statistical analyses to test the effects of environmental factors including soil properties on these communities depending on geographical sites. Statistical analyses such as Principal component analysis (PCA) and Redundancy analysis (RDA) revealed that the compositions of the three soil biota communities were strongly affected by geographical sites, which were in turn strongly associated with soil characteristics such as total C (TC), total N (TN), C/N ratio and annual mean soil temperature (ST). In particular, the TC, TN and C/N ratio had stronger effects on bacterial and fungal communities than on the nematode community. Additionally, two-way cluster analysis using the combined DGGE profile also indicated that all soil samples were classified into four clusters corresponding to the four sites, showing high site specificity of soil samples, and all DNA bands were classified into four clusters, showing the coexistence of specific DGGE bands of bacteria, fungi and nematodes in Andosol fields. The results of this study suggest that geography relative to soil properties has a simultaneous impact on soil microbial and nematode community compositions. This is the first combined profile analysis of bacteria, fungi and nematodes at different sites with agricultural Andosols. PMID:22223474
Combined analyses of bacterial, fungal and nematode communities in andosolic agricultural soils in Japan.

PubMed

Bao, Zhihua; Ikunaga, Yoko; Matsushita, Yuko; Morimoto, Sho; Takada-Hoshino, Yuko; Okada, Hiroaki; Oba, Hirosuke; Takemoto, Shuhei; Niwa, Shigeru; Ohigashi, Kentaro; Suzuki, Chika; Nagaoka, Kazunari; Takenaka, Makoto; Urashima, Yasufumi; Sekiguchi, Hiroyuki; Kushida, Atsuhiko; Toyota, Koki; Saito, Masanori; Tsushima, Seiya

2012-01-01

We simultaneously examined the bacteria, fungi and nematode communities in Andosols from four agro-geographical sites in Japan using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and statistical analyses to test the effects of environmental factors including soil properties on these communities depending on geographical sites. Statistical analyses such as Principal component analysis (PCA) and Redundancy analysis (RDA) revealed that the compositions of the three soil biota communities were strongly affected by geographical sites, which were in turn strongly associated with soil characteristics such as total C (TC), total N (TN), C/N ratio and annual mean soil temperature (ST). In particular, the TC, TN and C/N ratio had stronger effects on bacterial and fungal communities than on the nematode community. Additionally, two-way cluster analysis using the combined DGGE profile also indicated that all soil samples were classified into four clusters corresponding to the four sites, showing high site specificity of soil samples, and all DNA bands were classified into four clusters, showing the coexistence of specific DGGE bands of bacteria, fungi and nematodes in Andosol fields. The results of this study suggest that geography relative to soil properties has a simultaneous impact on soil microbial and nematode community compositions. This is the first combined profile analysis of bacteria, fungi and nematodes at different sites with agricultural Andosols.
Characteristics of airflow and particle deposition in COPD current smokers

NASA Astrophysics Data System (ADS)

Zou, Chunrui; Choi, Jiwoong; Haghighi, Babak; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

2017-11-01

A recent imaging-based cluster analysis of computed tomography (CT) lung images in a chronic obstructive pulmonary disease (COPD) cohort identified four clusters, viz. disease sub-populations. Cluster 1 had relatively normal airway structures; Cluster 2 had wall thickening; Cluster 3 exhibited decreased wall thickness and luminal narrowing; Cluster 4 had a significant decrease of luminal diameter and a significant reduction of lung deformation, thus having relatively low pulmonary functions. To better understand the characteristics of airflow and particle deposition in these clusters, we performed computational fluid and particle dynamics analyses on representative cluster patients and healthy controls using CT-based airway models and subject-specific 3D-1D coupled boundary conditions. The results show that particle deposition in central airways of cluster 4 patients was noticeably increased especially with increasing particle size despite reduced vital capacity as compared to other clusters and healthy controls. This may be attributable in part to significant airway constriction in cluster 4. This study demonstrates the potential application of cluster-guided CFD analysis in disease populations. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837.
Spatial Analysis of Rice Blast in China at Three Different Scales.

PubMed

Guo, Fangfang; Chen, Xinglong; Lu, Minghong; Yang, Li; Wang, Shi Wei; Wu, Bo Ming

2018-05-22

In this study, spatial analyses were conducted at three different scales to better understand the epidemiology of rice blast, a major rice disease caused by Magnaporthe oryzae. At regional scale, across the major rice production regions in China, rice blast incidence was monitored on 101 dates at 193 stations from June 10 th to Sep. 10 th during 2009-2014, and surveyed in 143 fields in September, 2016; at county scale, 3 surveys were done covering 1-5 counties in 2015-2016; and at field scale, blast was evaluated in 6 fields in 2015-2016. Spatial cluster and hot spot analyses were conducted in GIS on the geographical pattern of the disease at regional scale, and geostatistical analysis performed at all the three scales. Cluster and hot spot analyses revealed that high-disease areas were clustered in mountainous areas in China. Geostatistical analyses detected spatial dependence of blast incidence with influence ranges of 399 to 1080 km at regional scale, and 5 to 10 m at field scale, but not at county scale. The spatial patterns at different scales might be determined by inherent properties of rice blast and environmental driving forces, and findings from this study provide helpful information to sampling and management of rice blast.
Bias and inference from misspecified mixed-effect models in stepped wedge trial analysis.

PubMed

Thompson, Jennifer A; Fielding, Katherine L; Davey, Calum; Aiken, Alexander M; Hargreaves, James R; Hayes, Richard J

2017-10-15

Many stepped wedge trials (SWTs) are analysed by using a mixed-effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common-to-all or varied-between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within-cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within-cluster comparisons in the standard model. In the SWTs simulated here, mixed-effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within-cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Bias and inference from misspecified mixed‐effect models in stepped wedge trial analysis

PubMed Central

Fielding, Katherine L.; Davey, Calum; Aiken, Alexander M.; Hargreaves, James R.; Hayes, Richard J.

2017-01-01

Many stepped wedge trials (SWTs) are analysed by using a mixed‐effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common‐to‐all or varied‐between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within‐cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within‐cluster comparisons in the standard model. In the SWTs simulated here, mixed‐effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within‐cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28556355
Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

PubMed Central

2014-01-01

Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: An approach to model choice.

PubMed

Ng, Edmond S-W; Diaz-Ordaz, Karla; Grieve, Richard; Nixon, Richard M; Thompson, Simon G; Carpenter, James R

2016-10-01

Multilevel models provide a flexible modelling framework for cost-effectiveness analyses that use cluster randomised trial data. However, there is a lack of guidance on how to choose the most appropriate multilevel models. This paper illustrates an approach for deciding what level of model complexity is warranted; in particular how best to accommodate complex variance-covariance structures, right-skewed costs and missing data. Our proposed models differ according to whether or not they allow individual-level variances and correlations to differ across treatment arms or clusters and by the assumed cost distribution (Normal, Gamma, Inverse Gaussian). The models are fitted by Markov chain Monte Carlo methods. Our approach to model choice is based on four main criteria: the characteristics of the data, model pre-specification informed by the previous literature, diagnostic plots and assessment of model appropriateness. This is illustrated by re-analysing a previous cost-effectiveness analysis that uses data from a cluster randomised trial. We find that the most useful criterion for model choice was the deviance information criterion, which distinguishes amongst models with alternative variance-covariance structures, as well as between those with different cost distributions. This strategy for model choice can help cost-effectiveness analyses provide reliable inferences for policy-making when using cluster trials, including those with missing data. © The Author(s) 2013.
Substructures in DAFT/FADA survey clusters based on XMM and optical data

NASA Astrophysics Data System (ADS)

Durret, F.; DAFT/FADA Team

2014-07-01

The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.
Catchment classification by runoff behaviour with self-organizing maps (SOM)

NASA Astrophysics Data System (ADS)

Ley, R.; Casper, M. C.; Hellebrand, H.; Merz, R.

2011-09-01

Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.
Catchment classification by runoff behaviour with self-organizing maps (SOM)

NASA Astrophysics Data System (ADS)

Ley, R.; Casper, M. C.; Hellebrand, H.; Merz, R.

2011-03-01

Catchments show a wide range of response behaviour, even if they are adjacent. For many purposes it is necessary to characterise and classify them, e.g. for regionalisation, prediction in ungauged catchments, model parameterisation. In this study, we investigate hydrological similarity of catchments with respect to their response behaviour. We analyse more than 8200 event runoff coefficients (ERCs) and flow duration curves of 53 gauged catchments in Rhineland-Palatinate, Germany, for the period from 1993 to 2008, covering a huge variability of weather and runoff conditions. The spatio-temporal variability of event-runoff coefficients and flow duration curves are assumed to represent how different catchments "transform" rainfall into runoff. From the runoff coefficients and flow duration curves we derive 12 signature indices describing various aspects of catchment response behaviour to characterise each catchment. Hydrological similarity of catchments is defined by high similarities of their indices. We identify, analyse and describe hydrologically similar catchments by cluster analysis using Self-Organizing Maps (SOM). As a result of the cluster analysis we get five clusters of similarly behaving catchments where each cluster represents one differentiated class of catchments. As catchment response behaviour is supposed to be dependent on its physiographic and climatic characteristics, we compare groups of catchments clustered by response behaviour with clusters of catchments based on catchment properties. Results show an overlap of 67% between these two pools of clustered catchments which can be improved using the topologic correctness of SOMs.
Efficient generation of low-energy folded states of a model protein

NASA Astrophysics Data System (ADS)

Gordon, Heather L.; Kwan, Wai Kei; Gong, Chunhang; Larrass, Stefan; Rothstein, Stuart M.

2003-01-01

A number of short simulated annealing runs are performed on a highly-frustrated 46-"residue" off-lattice model protein. We perform, in an iterative fashion, a principal component analysis of the 946 nonbonded interbead distances, followed by two varieties of cluster analyses: hierarchical and k-means clustering. We identify several distinct sets of conformations with reasonably consistent cluster membership. Nonbonded distance constraints are derived for each cluster and are employed within a distance geometry approach to generate many new conformations, previously unidentified by the simulated annealing experiments. Subsequent analyses suggest that these new conformations are members of the parent clusters from which they were generated. Furthermore, several novel, previously unobserved structures with low energy were uncovered, augmenting the ensemble of simulated annealing results, and providing a complete distribution of low-energy states. The computational cost of this approach to generating low-energy conformations is small when compared to the expense of further Monte Carlo simulated annealing runs.
The Australian longitudinal study on male health sampling design and survey weighting: implications for analysis and interpretation of clustered data.

PubMed

Spittal, Matthew J; Carlin, John B; Currier, Dianne; Downes, Marnie; English, Dallas R; Gordon, Ian; Pirkis, Jane; Gurrin, Lyle

2016-10-31

The Australian Longitudinal Study on Male Health (Ten to Men) used a complex sampling scheme to identify potential participants for the baseline survey. This raises important questions about when and how to adjust for the sampling design when analyzing data from the baseline survey. We describe the sampling scheme used in Ten to Men focusing on four important elements: stratification, multi-stage sampling, clustering and sample weights. We discuss how these elements fit together when using baseline data to estimate a population parameter (e.g., population mean or prevalence) or to estimate the association between an exposure and an outcome (e.g., an odds ratio). We illustrate this with examples using a continuous outcome (weight in kilograms) and a binary outcome (smoking status). Estimates of a population mean or disease prevalence using Ten to Men baseline data are influenced by the extent to which the sampling design is addressed in an analysis. Estimates of mean weight and smoking prevalence are larger in unweighted analyses than weighted analyses (e.g., mean = 83.9 kg vs. 81.4 kg; prevalence = 18.0 % vs. 16.7 %, for unweighted and weighted analyses respectively) and the standard error of the mean is 1.03 times larger in an analysis that acknowledges the hierarchical (clustered) structure of the data compared with one that does not. For smoking prevalence, the corresponding standard error is 1.07 times larger. Measures of association (mean group differences, odds ratios) are generally similar in unweighted or weighted analyses and whether or not adjustment is made for clustering. The extent to which the Ten to Men sampling design is accounted for in any analysis of the baseline data will depend on the research question. When the goals of the analysis are to estimate the prevalence of a disease or risk factor in the population or the magnitude of a population-level exposure-outcome association, our advice is to adopt an analysis that respects the sampling design.

ANALYSIS AND CHARACTERIZATION OF OZONE-RICH EPISODES IN NORTHEAST PORTUGAL

NASA Astrophysics Data System (ADS)

Carvalho, A.; Monteiro, A.; Ribeiro, I.; Tchepel, O.; Miranda, A.; Borrego, C.; Saavedra, S.; Souto, J. A.; Casares, J. J.

2009-12-01

Each summer period extremely high ozone levels are registered at the rural background station of Lamas d’Olo, located in the Northeast of Portugal. In average, 30% of the total alert threshold registered in Portugal is detected at this site. The main purpose of this study is to characterize the atmospheric conditions that lead to the ozone-rich episodes. Synoptic patterns anomalies and back trajectories cluster analysis were performed for a period of 76 days where ozone maximum concentrations were above 200 µg.m-3. This analysis was performed for the period between 2004 and 2007. The obtained anomaly fields suggested that a positive temperature anomaly is visible above the Iberian Peninsula. In addition, a strong wind flow pattern from NE is visible in the North of Portugal and Galicia, in Spain. These two features may lead to an enhancement of the photochemical production and to the transport of pollutants from Spain to Portugal. In addition, the 3D mean back trajectories associated to the ozone episode days were analysed. A clustering method has been applied to the obtained back trajectories. Four main clusters of ozone-rich episodes were identified, with different frequencies of occurrence: north-westerly flows (11%); north-easterly flows (45%), southern flow (4%) and westerly flows (40%). Both analyses highlight the NE flow as a dominant pattern over the North of Portugal. The analysis of the ozone concentrations for each selected cluster indicates that this northeast circulation pattern, together with the southern flow, is responsible for the highest ozone peak episodes. This also suggests that long-range transport of atmospheric pollutants may be the main contributor to the ozone levels registered at Lamas d’Olo. This is also highlighted by the correlation of the ozone time series with the meteorological parameters analysed in the frequency domain.
Task Analysis for Health Occupations. Cluster: Medical Assisting. Occupation: Medical Assistant. Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lathrop, Janice

Task analyses are provided for two duty areas for the occupation of medical assistant in the medical assisting cluster. Five tasks for the duty area "providing therapeutic measures" are as follows: assist with dressing change, apply clean dressing, apply elastic bandage, assist physician in therapeutic procedure, and apply topical…
Atmospheric effects on cluster analyses. [for remote sensing application

NASA Technical Reports Server (NTRS)

Kiang, R. K.

1979-01-01

Ground reflected radiance, from which information is extracted through techniques of cluster analyses for remote sensing application, is altered by the atmosphere when it reaches the satellite. Therefore it is essential to understand the effects of the atmosphere on Landsat measurements, cluster characteristics and analysis accuracy. A doubling model is employed to compute the effective reflectivity, observed from the satellite, as a function of ground reflectivity, solar zenith angle and aerosol optical thickness for standard atmosphere. The relation between the effective reflectivity and ground reflectivity is approximately linear. It is shown that for a horizontally homogeneous atmosphere, the classification statistics from a maximum likelihood classifier remains unchanged under these transforms. If inhomogeneity is present, the divergence between clusters is reduced, and correlation between spectral bands increases. Radiance reflected by the background area surrounding the target may also reach the satellite. The influence of background reflectivity on effective reflectivity is discussed.
Assessment of cluster yield components by image analysis.

PubMed

Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

2015-04-01

Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
Students' Perceptions of Motivational Climate and Enjoyment in Finnish Physical Education: A Latent Profile Analysis.

PubMed

Jaakkola, Timo; Wang, C K John; Soini, Markus; Liukkonen, Jarmo

2015-09-01

The purpose of this study was to identify student clusters with homogenous profiles in perceptions of task- and ego-involving, autonomy, and social relatedness supporting motivational climate in school physical education. Additionally, we investigated whether different motivational climate groups differed in their enjoyment in PE. Participants of the study were 2 594 girls and 1 803 boys, aged 14-15 years. Students responded to questionnaires assessing their perception of motivational climate and enjoyment in physical education. Latent profile analyses produced a five-cluster solution labeled 1) 'low autonomy, relatedness, task, and moderate ego climate' group', 2) 'low autonomy, relatedness, and high task and ego climate, 3) 'moderate autonomy, relatedness, task and ego climate' group 4) 'high autonomy, relatedness, task, and moderate ego climate' group, and 5) 'high relatedness and task but moderate autonomy and ego climate' group. Analyses of variance showed that students in clusters 4 and 5 perceived the highest level of enjoyment whereas students in cluster 1 experienced the lowest level of enjoyment. The results showed that the students' perceptions of various motivational climates created differential levels of enjoyment in PE classes. Key pointsLatent profile analyses produced a five-cluster solution labeled 1) 'low autonomy, relatedness, task, and moderate ego climate' group', 2) 'low autonomy, relatedness, and high task and ego climate, 3) 'moderate autonomy, relatedness, task and ego climate' group 4) 'high autonomy, relatedness, task, and moderate ego climate' group, and 5) 'high relatedness and task but moderate autonomy and ego climate' group.Analyses of variance showed that clusters 4 and 5 perceived the highest level of enjoyment whereas cluster 1 experienced the lowest level of enjoyment. The results showed that the students' perceptions of motivational climate create differential levels of enjoyment in PE classes.
The association between mood state and chronobiological characteristics in bipolar I disorder: a naturalistic, variable cluster analysis-based study.

PubMed

Gonzalez, Robert; Suppes, Trisha; Zeitzer, Jamie; McClung, Colleen; Tamminga, Carol; Tohen, Mauricio; Forero, Angelica; Dwivedi, Alok; Alvarado, Andres

2018-02-19

Multiple types of chronobiological disturbances have been reported in bipolar disorder, including characteristics associated with general activity levels, sleep, and rhythmicity. Previous studies have focused on examining the individual relationships between affective state and chronobiological characteristics. The aim of this study was to conduct a variable cluster analysis in order to ascertain how mood states are associated with chronobiological traits in bipolar I disorder (BDI). We hypothesized that manic symptomatology would be associated with disturbances of rhythm. Variable cluster analysis identified five chronobiological clusters in 105 BDI subjects. Cluster 1, comprising subjective sleep quality was associated with both mania and depression. Cluster 2, which comprised variables describing the degree of rhythmicity, was associated with mania. Significant associations between mood state and cluster analysis-identified chronobiological variables were noted. Disturbances of mood were associated with subjectively assessed sleep disturbances as opposed to objectively determined, actigraphy-based sleep variables. No associations with general activity variables were noted. Relationships between gender and medication classes in use and cluster analysis-identified chronobiological characteristics were noted. Exploratory analyses noted that medication class had a larger impact on these relationships than the number of psychiatric medications in use. In a BDI sample, variable cluster analysis was able to group related chronobiological variables. The results support our primary hypothesis that mood state, particularly mania, is associated with chronobiological disturbances. Further research is required in order to define these relationships and to determine the directionality of the associations between mood state and chronobiological characteristics.
Analysis of ambient SO 2 concentrations and winds in the complex surroundings of a thermal power plant

NASA Astrophysics Data System (ADS)

Mlakar, P.

2004-11-01

SO2 pollution is still a significant problem in Slovenia, especially around large thermal power plants (TPPs), like the one at Šoštanj. The Šoštanj TPP is the exclusive source of SO2 in the area and is therefore a perfect example for air pollution studies. In order to understand air pollution around the Šoštanj TPP in detail, some analyses of emissions and ambient concentrations of SO2 at six automated monitoring stations in the surroundings of the TPP were made. The data base from 1991 to 1993 was used when there were no desulfurisation plants in operation. Statistical analyses of the influence of the emissions from the three TPP stacks at different measuring points were made. The analyses prove that the smallest stack (100 m) mainly pollutes villages and towns near the TPP within a radius of a few kilometres. The medium stack's (150 m) influence is noticed at shorter as well as at longer distances up to more than ten kilometres. The highest stack (230 m) pollutes mainly at longer distances, where the plume reaches the higher hills. Detailed analyses of ambient SO2 concentrations were made. They show the temporal and spatial distribution of different classes of SO2 concentrations from very low to alarming values. These analyses show that pollution patterns at a particular station remain the same if observed on a yearly basis, but can vary very much if observed on a monthly basis, mainly because of different weather patterns. Therefore the winds in the basin (as the most important feature influencing air pollution dispersion) were further analysed in detail to find clusters of similar patterns. For cluster analysis of ground-level winds patterns in the basin around the Šoštanj Thermal Power Plant, the Kohonen neural network and Leaders' method were used. Furthermore, the dependence of ambient SO2 concentrations on the clusters obtained was analysed. The results proved that effective cluster analysis can be a useful tool for compressing a huge wind data base in order to find the correlation between winds and pollutant concentrations. The analyses made provide a better insight into air pollution over complex terrain.
Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

PubMed

He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

2015-01-01

The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
Cluster analysis and subgrouping to investigate inter-individual variability to non-invasive brain stimulation: a systematic review.

PubMed

Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour

2018-01-12

Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

PubMed

Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay

2015-09-01

The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.
Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses

DTIC Science & Technology

2007-01-01

including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA
Task Analysis for Health Occupations. Cluster: Rehabilitation Services. Occupation: Physical Therapist Assistant. Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lathrop, Janice

Task analyses are provided for two duty areas for the occupation of physical therapist assistant in the rehabilitation services cluster. Ten tasks are listed for the duty area "providing therapeutic measures": apply cold compress, administer hot soak, apply heat lamp, apply warm compress, apply ice bag, assist with dressing change, apply…
A General Framework for Power Analysis to Detect the Moderator Effects in Two- and Three-Level Cluster Randomized Trials

ERIC Educational Resources Information Center

Dong, Nianbo; Spybrook, Jessaca; Kelcey, Ben

2016-01-01

The purpose of this study is to propose a general framework for power analyses to detect the moderator effects in two- and three-level cluster randomized trials (CRTs). The study specifically aims to: (1) develop the statistical formulations for calculating statistical power, minimum detectable effect size (MDES) and its confidence interval to…
Health and Human Services Cluster. Task Analyses. Physical Therapist Aide and Physical Therapist Assistant. A Competency-Based Curriculum Guide.

ERIC Educational Resources Information Center

Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum and Resource Center.

Developed in Virginia, this publication contains task analysis guides to support selected tech prep programs that prepare students for careers in the health and human services cluster. Occupations profiled are physical therapist aide and physical therapist assistant. Each guide contains the following elements: (1) an occupational task list derived…
Factors influencing the quality of life of haemodialysis patients according to symptom cluster.

PubMed

Shim, Hye Yeung; Cho, Mi-Kyoung

2018-05-01

To identify the characteristics in each symptom cluster and factors influencing the quality of life of haemodialysis patients in Korea according to cluster. Despite developments in renal replacement therapy, haemodialysis still restricts the activities of daily living due to pain and impairs physical functioning induced by the disease and its complications. Descriptive survey. Two hundred and thirty dialysis patients aged >18 years. They completed self-administered questionnaires of Dialysis Symptom Index and Kidney Disease Quality of Life instrument-Short Form 1.3. To determine the optimal number of clusters, the collected data were analysed using polytomous variable latent class analysis in R software (poLCA) to estimate the latent class models and the latent class regression models for polytomous outcome variables. Differences in characteristics, symptoms and QOL according to the symptom cluster of haemodialysis patients were analysed using the independent t test and chi-square test. The factors influencing the QOL according to symptom cluster were identified using hierarchical multiple regression analysis. Physical and emotional symptoms were significantly more severe, and the QOL was significantly worse in Cluster 1 than in Cluster 2. The factors influencing the QOL were spouse, job, insurance type and physical and emotional symptoms in Cluster 1, with these variables having an explanatory power of 60.9%. Physical and emotional symptoms were the only influencing factors in Cluster 2, and they had an explanatory power of 37.4%. Mitigating the symptoms experienced by haemodialysis patients and improving their QOL require educational and therapeutic symptom management interventions that are tailored according to the characteristics and symptoms in each cluster. The findings of this study are expected to lead to practical guidelines for addressing the symptoms experienced by haemodialysis patients, and they provide basic information for developing nursing interventions to manage these symptoms and improve the QOL of these patients. © 2017 John Wiley & Sons Ltd.
Pan-genome and phylogeny of Bacillus cereus sensu lato.

PubMed

Bazinet, Adam L

2017-08-02

Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes and novel bioinformatic workflows to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP*, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering. All phylogenetic analyses recapitulated two previously used classification systems, and taxa were consistently assigned to the same major clade and group. By including accessory genes from the pan-genome in the phylogenetic analyses, I produced an exceptionally well-supported phylogeny of 114 complete B. cereus s. l. genomes. The best-performing methods were used to produce a phylogeny of all 498 publicly available B. cereus s. l. genomes, which was in turn used to compare three different classification systems and to test the monophyly status of various B. cereus s. l. species. The majority of the methodology used in this study is generic and could be leveraged to produce pan-genome estimates and similarly robust phylogenetic hypotheses for other bacterial groups.
Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

PubMed

Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

2014-01-01

Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

PubMed Central

Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

2014-01-01

Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683
Group sequential designs for stepped-wedge cluster randomised trials

PubMed Central

Grayling, Michael J; Wason, James MS; Mander, Adrian P

2017-01-01

Background/Aims: The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Methods: Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. Results: We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial’s type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. Conclusion: The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial. PMID:28653550
Group sequential designs for stepped-wedge cluster randomised trials.

PubMed

Grayling, Michael J; Wason, James Ms; Mander, Adrian P

2017-10-01

The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial's type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial.

Replicating cluster subtypes for the prevention of adolescent smoking and alcohol use.

PubMed

Babbin, Steven F; Velicer, Wayne F; Paiva, Andrea L; Brick, Leslie Ann D; Redding, Colleen A

2015-01-01

Substance abuse interventions tailored to the individual level have produced effective outcomes for a wide variety of behaviors. One approach to enhancing tailoring involves using cluster analysis to identify prevention subtypes that represent different attitudes about substance use. This study applied this approach to better understand tailored interventions for smoking and alcohol prevention. Analyses were performed on a sample of sixth graders from 20 New England middle schools involved in a 36-month tailored intervention study. Most adolescents reported being in the Acquisition Precontemplation (aPC) stage at baseline: not smoking or not drinking and not planning to start in the next six months. For smoking (N=4059) and alcohol (N=3973), each sample was randomly split into five subsamples. Cluster analysis was performed within each subsample based on three variables: Pros and Cons (from Decisional Balance Scales), and Situational Temptations. Across all subsamples for both smoking and alcohol, the following four clusters were identified: (1) Most Protected (MP; low Pros, high Cons, low Temptations); (2) Ambivalent (AM; high Pros, average Cons and Temptations); (3) Risk Denial (RD; average Pros, low Cons, average Temptations); and (4) High Risk (HR; high Pros, low Cons, and very high Temptations). Finding the same four clusters within aPC for both smoking and alcohol, replicating the results across the five subsamples, and demonstrating hypothesized relations among the clusters with additional external validity analyses provide strong evidence of the robustness of these results. These clusters demonstrate evidence of validity and can provide a basis for tailoring interventions. Copyright © 2014. Published by Elsevier Ltd.
Replicating cluster subtypes for the prevention of adolescent smoking and alcohol use

PubMed Central

Babbin, Steven F.; Velicer, Wayne F.; Paiva, Andrea L.; Brick, Leslie Ann D.; Redding, Colleen A.

2015-01-01

Introduction Substance abuse interventions tailored to the individual level have produced effective outcomes for a wide variety of behaviors. One approach to enhancing tailoring involves using cluster analysis to identify prevention subtypes that represent different attitudes about substance use. This study applied this approach to better understand tailored interventions for smoking and alcohol prevention. Methods Analyses were performed on a sample of sixth graders from 20 New England middle schools involved in a 36-month tailored intervention study. Most adolescents reported being in the Acquisition Precontemplation (aPC) stage at baseline: not smoking or not drinking and not planning to start in the next six months. For smoking (N= 4059) and alcohol (N= 3973), each sample was randomly split into five subsamples. Cluster analysis was performed within each subsample based on three variables: Pros and Cons (from Decisional Balance Scales), and Situational Temptations. Results Across all subsamples for both smoking and alcohol, the following four clusters were identified: (1) Most Protected (MP; low Pros, high Cons, low Temptations); (2) Ambivalent (AM; high Pros, average Cons and Temptations); (3) Risk Denial (RD; average Pros, low Cons, average Temptations); and (4) High Risk (HR; high Pros, low Cons, and very high Temptations). Conclusions Finding the same four clusters within aPC for both smoking and alcohol, replicating the results across the five subsamples, and demonstrating hypothesized relations among the clusters with additional external validity analyses provide strong evidence of the robustness of these results. These clusters demonstrate evidence of validity and can provide a basis for tailoring interventions. PMID:25222849
Clusters of Midlife Women by Physical Activity and Their Racial/Ethnic Differences

PubMed Central

Im, Eun-Ok; Ko, Young; Chee, Eunice; Chee, Wonshik; Mao, Jun James

2016-01-01

Objective The purpose of this study was to identify clusters of midlife women by physical activity and to determine racial/ethnic differences in physical activities in each cluster. Methods This was a secondary analysis of the data from 542 women (157 Non-Hispanic [NH] Whites, 127 Hispanics, 135 NH African Americans, and 123 NH Asian) in a larger Internet study on midlife women’s attitudes toward physical activity. The instruments included the Barriers to Health Activities Scale, the Physical Activity Assessment Inventory, the Questions on Attitudes toward Physical Activity, Subjective Norm, Perceived Behavioral Control, and Behavioral Intention, and the Kaiser Physical Activity Survey. The data were analyzed using hierarchical cluster analyses, ANOVA, and multinominal logistic analyses. Results A three cluster solution was adopted: Cluster 1 (high active living and sports/exercise activity group; 48%), Cluster 2 (high household/caregiving and occupational activity group; 27%), and Cluster 3 (low active living and sports/exercise activity group; 26%). There were significant racial/ethnic differences in occupational activities of Clusters 1 and 3 (all p<.01). Compared with Cluster 1, Cluster 2 tended to have lower family income, less access to health care, higher unemployment, higher perceived barriers scores, and lower social influences scores (all p<.01). Compared with Cluster 1, Cluster 3 tended to have greater obesity, less access to health care, higher perceived barriers scores, more negative attutides toward physical activity, and lower self-efficacy scores (all p<.01). Conclusions Midlife women’s unique patterns of physical activity and their associated factors need to be considered in future intervention development. PMID:27846052
Chronic Obstructive Pulmonary Disease heterogeneity: challenges for health risk assessment, stratification and management.

PubMed

Roca, Josep; Vargas, Claudia; Cano, Isaac; Selivanov, Vitaly; Barreiro, Esther; Maier, Dieter; Falciani, Francesco; Wagner, Peter; Cascante, Marta; Garcia-Aymerich, Judith; Kalko, Susana; De Mas, Igor; Tegnér, Jesper; Escarrabill, Joan; Agustí, Alvar; Gomez-Cabrero, David

2014-11-28

Heterogeneity in clinical manifestations and disease progression in Chronic Obstructive Pulmonary Disease (COPD) lead to consequences for patient health risk assessment, stratification and management. Implicit with the classical "spill over" hypothesis is that COPD heterogeneity is driven by the pulmonary events of the disease. Alternatively, we hypothesized that COPD heterogeneities result from the interplay of mechanisms governing three conceptually different phenomena: 1) pulmonary disease, 2) systemic effects of COPD and 3) co-morbidity clustering, each of them with their own dynamics. To explore the potential of a systems analysis of COPD heterogeneity focused on skeletal muscle dysfunction and on co-morbidity clustering aiming at generating predictive modeling with impact on patient management. To this end, strategies combining deterministic modeling and network medicine analyses of the Biobridge dataset were used to investigate the mechanisms of skeletal muscle dysfunction. An independent data driven analysis of co-morbidity clustering examining associated genes and pathways was performed using a large dataset (ICD9-CM data from Medicare, 13 million people). Finally, a targeted network analysis using the outcomes of the two approaches (skeletal muscle dysfunction and co-morbidity clustering) explored shared pathways between these phenomena. (1) Evidence of abnormal regulation of skeletal muscle bioenergetics and skeletal muscle remodeling showing a significant association with nitroso-redox disequilibrium was observed in COPD; (2) COPD patients presented higher risk for co-morbidity clustering than non-COPD patients increasing with ageing; and, (3) the on-going targeted network analyses suggests shared pathways between skeletal muscle dysfunction and co-morbidity clustering. The results indicate the high potential of a systems approach to address COPD heterogeneity. Significant knowledge gaps were identified that are relevant to shape strategies aiming at fostering 4P Medicine for patients with COPD.
Subphenotypes of mild-to-moderate COPD by factor and cluster analysis of pulmonary function, CT imaging and breathomics in a population-based survey.

PubMed

Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J

2013-06-01

Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.
Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

PubMed

Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

2015-01-01

In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.
A Cyber-Attack Detection Model Based on Multivariate Analyses

NASA Astrophysics Data System (ADS)

Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.
Novel clustering of items from the Autism Diagnostic Interview-Revised to define phenotypes within autism spectrum disorders

PubMed Central

Hu, Valerie W.; Steinberg, Mara E.

2009-01-01

Heterogeneity in phenotypic presentation of ASD has been cited as one explanation for the difficulty in pinpointing specific genes involved in autism. Recent studies have attempted to reduce the “noise” in genetic and other biological data by reducing the phenotypic heterogeneity of the sample population. The current study employs multiple clustering algorithms on 123 item scores from the Autism Diagnostic Interview-Revised (ADI-R) diagnostic instrument of nearly 2000 autistic individuals to identify subgroups of autistic probands with clinically relevant behavioral phenotypes in order to isolate more homogeneous groups of subjects for gene expression analyses. Our combined cluster analyses suggest optimal division of the autistic probands into 4 phenotypic clusters based on similarity of symptom severity across the 123 selected item scores. One cluster is characterized by severe language deficits, while another exhibits milder symptoms across the domains. A third group possesses a higher frequency of savant skills while the fourth group exhibited intermediate severity across all domains. Grouping autistic individuals by multivariate cluster analysis of ADI-R scores reveals meaningful phenotypes of subgroups within the autistic spectrum which we show, in a related (accompanying) study, to be associated with distinct gene expression profiles. PMID:19455643
Autism spectrum disorder in Down syndrome: cluster analysis of Aberrant Behaviour Checklist data supports diagnosis.

PubMed

Ji, N Y; Capone, G T; Kaufmann, W E

2011-11-01

The diagnostic validity of autism spectrum disorder (ASD) based on Diagnostic and Statistical Manual of Mental Disorders (DSM) has been challenged in Down syndrome (DS), because of the high prevalence of cognitive impairments in this population. Therefore, we attempted to validate DSM-based diagnoses via an unbiased categorisation of participants with a DSM-independent behavioural instrument. Based on scores on the Aberrant Behaviour Checklist - Community, we performed sequential factor (four DS-relevant factors: Autism-Like Behaviour, Disruptive Behaviour, Hyperactivity, Self-Injury) and cluster analyses on a 293-participant paediatric DS clinic cohort. The four resulting clusters were compared with DSM-delineated groups: DS + ASD, DS + None (no DSM diagnosis), DS + DBD (disruptive behaviour disorder) and DS + SMD (stereotypic movement disorder), the latter two as comparison groups. Two clusters were identified with DS + ASD: Cluster 1 (35.1%) with higher disruptive behaviour and Cluster 4 (48.2%) with more severe autistic behaviour and higher percentage of late onset ASD. The majority of participants in DS + None (71.9%) and DS + DBD (87.5%) were classified into Cluster 2 and 3, respectively, while participants in DS + SMD were relatively evenly distributed throughout the four clusters. Our unbiased, DSM-independent analyses, using a rating scale specifically designed for individuals with severe intellectual disability, demonstrated that DSM-based criteria of ASD are applicable to DS individuals despite their cognitive impairments. Two DS + ASD clusters were identified and supported the existence of at least two subtypes of ASD in DS, which deserve further characterisation. Despite the prominence of stereotypic behaviour in DS, the SMD diagnosis was not identified by cluster analysis, suggesting that high-level stereotypy is distributed throughout DS. Further supporting DSM diagnoses, typically behaving DS participants were easily distinguished as a group from those with maladaptive behaviours. © 2011 The Authors. Journal of Intellectual Disability Research © 2011 Blackwell Publishing Ltd.
High ozone levels in the northeast of Portugal: Analysis and characterization

NASA Astrophysics Data System (ADS)

Carvalho, A.; Monteiro, A.; Ribeiro, I.; Tchepel, O.; Miranda, A. I.; Borrego, C.; Saavedra, S.; Souto, J. A.; Casares, J. J.

2010-03-01

Each summer period extremely high ozone levels are registered at the rural background station of Lamas d'Olo, located in the Northeast of Portugal. In average, 30% of the total alert threshold registered in Portugal is detected at this site. The main purpose of this study is to characterize the atmospheric conditions that lead to the ozone-rich episodes at this site. Synoptic patterns anomalies and back trajectories cluster analysis were performed, for the period between 2004 and 2007, considering 76 days when ozone maximum hourly concentrations were above 200 μg m -3. The obtained atmospheric anomaly fields suggested that a positive temperature anomaly is visible above the Iberian Peninsula. A strong wind flow pattern from NE is observable in the North of Portugal and Galicia, in Spain. These two features may lead to an enhancement of the photochemical production and to the transport of pollutants from Spain to Portugal. In addition, the 3D mean back trajectories associated to the ozone episode days were analysed. A clustering method has been applied to the obtained back trajectories. Four main clusters of ozone-rich episodes were identified, with different frequencies of occurrence: north-westerly flows (11%); north-easterly flows (45%), southern flow (4%) and westerly flows (40%). Both analyses highlight the NE flow as a dominant pattern over the North of Portugal during summer. The analysis of the ozone concentrations for each selected cluster indicates that this northeast circulation pattern, together with the southern flow, are responsible for the highest ozone peak episodes. This also suggests that long-range transport of atmospheric pollutants is the main contributor to the ozone levels registered at Lamas d'Olo. This is also highlighted by the correlation of the ozone time-series with the meteorological parameters analysed in the frequency domain.
A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

PubMed

Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

2016-03-01

In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.
A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

PubMed Central

Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

2016-01-01

In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882
Cluster headache and the hypocretin receptor 2 reconsidered: a genetic association study and meta-analysis.

PubMed

Weller, Claudia M; Wilbrink, Leopoldine A; Houwing-Duistermaat, Jeanine J; Koelewijn, Stephany C; Vijfhuizen, Lisanne S; Haan, Joost; Ferrari, Michel D; Terwindt, Gisela M; van den Maagdenberg, Arn M J M; de Vries, Boukje

2015-08-01

Cluster headache is a severe neurological disorder with a complex genetic background. A missense single nucleotide polymorphism (rs2653349; p.Ile308Val) in the HCRTR2 gene that encodes the hypocretin receptor 2 is the only genetic factor that is reported to be associated with cluster headache in different studies. However, as there are conflicting results between studies, we re-evaluated its role in cluster headache. We performed a genetic association analysis for rs2653349 in our large Leiden University Cluster headache Analysis (LUCA) program study population. Systematic selection of the literature yielded three additional studies comprising five study populations, which were included in our meta-analysis. Data were extracted according to predefined criteria. A total of 575 cluster headache patients from our LUCA study and 874 controls were genotyped for HCRTR2 SNP rs2653349 but no significant association with cluster headache was found (odds ratio 0.91 (95% confidence intervals 0.75-1.10), p = 0.319). In contrast, the meta-analysis that included in total 1167 cluster headache cases and 1618 controls from the six study populations, which were part of four different studies, showed association of the single nucleotide polymorphism with cluster headache (random effect odds ratio 0.69 (95% confidence intervals 0.53-0.90), p = 0.006). The association became weaker, as the odds ratio increased to 0.80, when the meta-analysis was repeated without the initial single South European study with the largest effect size. Although we did not find evidence for association of rs2653349 in our LUCA study, which is the largest investigated study population thus far, our meta-analysis provides genetic evidence for a role of HCRTR2 in cluster headache. Regardless, we feel that the association should be interpreted with caution as meta-analyses with individual populations that have limited power have diminished validity. © International Headache Society 2014.
Intracluster age gradients in numerous young stellar clusters

NASA Astrophysics Data System (ADS)

Getman, K. V.; Feigelson, E. D.; Kuhn, M. A.; Bate, M. R.; Broos, P. S.; Garmire, G. P.

2018-05-01

The pace and pattern of star formation leading to rich young stellar clusters is quite uncertain. In this context, we analyse the spatial distribution of ages within 19 young (median t ≲ 3 Myr on the Siess et al. time-scale), morphologically simple, isolated, and relatively rich stellar clusters. Our analysis is based on young stellar object (YSO) samples from the Massive Young Star-Forming Complex Study in Infrared and X-ray and Star Formation in Nearby Clouds surveys, and a new estimator of pre-main sequence (PMS) stellar ages, AgeJX, derived from X-ray and near-infrared photometric data. Median cluster ages are computed within four annular subregions of the clusters. We confirm and extend the earlier result of Getman et al. (2014): 80 per cent of the clusters show age trends where stars in cluster cores are younger than in outer regions. Our cluster stacking analyses establish the existence of an age gradient to high statistical significance in several ways. Time-scales vary with the choice of PMS evolutionary model; the inferred median age gradient across the studied clusters ranges from 0.75 to 1.5 Myr pc-1. The empirical finding reported in the present study - late or continuing formation of stars in the cores of star clusters with older stars dispersed in the outer regions - has a strong foundation with other observational studies and with the astrophysical models like the global hierarchical collapse model of Vázquez-Semadeni et al.
Detection of protein complex from protein-protein interaction network using Markov clustering

NASA Astrophysics Data System (ADS)

Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.

2017-05-01

Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.
Knowledge, attitudes towards and acceptability of genetic modification in Germany.

PubMed

Christoph, Inken B; Bruhn, Maike; Roosen, Jutta

2008-07-01

Genetic modification remains a controversial issue. The aim of this study is to analyse the attitudes towards genetic modification, the knowledge about it and its acceptability in different application areas among German consumers. Results are based on a survey from spring 2005. An exploratory factor analysis is conducted to identify the attitudes towards genetic modification. The identified factors are used in a cluster analysis that identified a cluster of supporters, of opponents and a group of indifferent consumers. Respondents' knowledge of genetics and biotechnology differs among the found clusters without revealing a clear relationship between knowledge and support of genetic modification. The acceptability of genetic modification varies by application area and cluster, and genetically modified non-food products are more widely accepted than food products. The perception of personal health risks has high explanatory power for attitudes and acceptability.
Space-time analysis of pneumonia hospitalisations in the Netherlands.

PubMed

Benincà, Elisa; van Boven, Michiel; Hagenaars, Thomas; van der Hoek, Wim

2017-01-01

Community acquired pneumonia is a major global public health problem. In the Netherlands there are 40,000-50,000 hospital admissions for pneumonia per year. In the large majority of these hospital admissions the etiologic agent is not determined and a real-time surveillance system is lacking. Localised and temporal increases in hospital admissions for pneumonia are therefore only detected retrospectively and the etiologic agents remain unknown. Here, we perform spatio-temporal analyses of pneumonia hospital admission data in the Netherlands. To this end, we scanned for spatial clusters on yearly and seasonal basis, and applied wavelet cluster analysis on the time series of five main regions. The pneumonia hospital admissions show strong clustering in space and time superimposed on a regular yearly cycle with high incidence in winter and low incidence in summer. Cluster analysis reveals a heterogeneous pattern, with most significant clusters occurring in the western, highly urbanised, and in the eastern, intensively farmed, part of the Netherlands. Quantitatively, the relative risk (RR) of the significant clusters for the age-standardised incidence varies from a minimum of 1.2 to a maximum of 2.2. We discuss possible underlying causes for the patterns observed, such as variations in air pollution.
Relationship between Procedural Tactical Knowledge and Specific Motor Skills in Young Soccer Players

PubMed Central

Aquino, Rodrigo; Marques, Renato Francisco R.; Petiot, Grégory Hallé; Gonçalves, Luiz Guilherme C.; Moraes, Camila; Santiago, Paulo Roberto P.; Puggina, Enrico Fuini

2016-01-01

The purpose of this study was to investigate the association between offensive tactical knowledge and the soccer-specific motor skills performance. Fifteen participants were submitted to two evaluation tests, one to assess their technical and tactical analysis. The motor skills performance was measured through four tests of technical soccer skills: ball control, shooting, passing and dribbling. The tactical performance was based on a tactical assessment system called FUT-SAT (Analyses of Procedural Tactical Knowledge in Soccer). Afterwards, technical and tactical evaluation scores were ranked with and without the use of the cluster method. A positive, weak correlation was perceived in both analyses (rho = 0.39, not significant p = 0.14 (with cluster analysis); and rho = 0.35; not significant p = 0.20 (without cluster analysis)). We can conclude that there was a weak association between the technical and the offensive tactical knowledge. This shows the need to reflect on the use of such tests to assess technical skills in team sports since they do not take into account the variability and unpredictability of game actions and disregard the inherent needs to assess such skill performance in the game. PMID:29910300
Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.

PubMed

Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah

2014-06-01

Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.
Probabilistic Analysis of Hierarchical Cluster Protocols for Wireless Sensor Networks

NASA Astrophysics Data System (ADS)

Kaj, Ingemar

Wireless sensor networks are designed to extract data from the deployment environment and combine sensing, data processing and wireless communication to provide useful information for the network users. Hundreds or thousands of small embedded units, which operate under low-energy supply and with limited access to central network control, rely on interconnecting protocols to coordinate data aggregation and transmission. Energy efficiency is crucial and it has been proposed that cluster based and distributed architectures such as LEACH are particularly suitable. We analyse the random cluster hierarchy in this protocol and provide a solution for low-energy and limited-loss optimization. Moreover, we extend these results to a multi-level version of LEACH, where clusters of nodes again self-organize to form clusters of clusters, and so on.

Gene duplications in prokaryotes can be associated with environmental adaptation

PubMed Central

2010-01-01

Background Gene duplication is a normal evolutionary process. If there is no selective advantage in keeping the duplicated gene, it is usually reduced to a pseudogene and disappears from the genome. However, some paralogs are retained. These gene products are likely to be beneficial to the organism, e.g. in adaptation to new environmental conditions. The aim of our analysis is to investigate the properties of paralog-forming genes in prokaryotes, and to analyse the role of these retained paralogs by relating gene properties to life style of the corresponding prokaryotes. Results Paralogs were identified in a number of prokaryotes, and these paralogs were compared to singletons of persistent orthologs based on functional classification. This showed that the paralogs were associated with for example energy production, cell motility, ion transport, and defence mechanisms. A statistical overrepresentation analysis of gene and protein annotations was based on paralogs of the 200 prokaryotes with the highest fraction of paralog-forming genes. Biclustering of overrepresented gene ontology terms versus species was used to identify clusters of properties associated with clusters of species. The clusters were classified using similarity scores on properties and species to identify interesting clusters, and a subset of clusters were analysed by comparison to literature data. This analysis showed that paralogs often are associated with properties that are important for survival and proliferation of the specific organisms. This includes processes like ion transport, locomotion, chemotaxis and photosynthesis. However, the analysis also showed that the gene ontology terms sometimes were too general, imprecise or even misleading for automatic analysis. Conclusions Properties described by gene ontology terms identified in the overrepresentation analysis are often consistent with individual prokaryote lifestyles and are likely to give a competitive advantage to the organism. Paralogs and singletons dominate different categories of functional classification, where paralogs in particular seem to be associated with processes involving interaction with the environment. PMID:20961426
Gene duplications in prokaryotes can be associated with environmental adaptation.

PubMed

Bratlie, Marit S; Johansen, Jostein; Sherman, Brad T; Huang, Da Wei; Lempicki, Richard A; Drabløs, Finn

2010-10-20

Gene duplication is a normal evolutionary process. If there is no selective advantage in keeping the duplicated gene, it is usually reduced to a pseudogene and disappears from the genome. However, some paralogs are retained. These gene products are likely to be beneficial to the organism, e.g. in adaptation to new environmental conditions. The aim of our analysis is to investigate the properties of paralog-forming genes in prokaryotes, and to analyse the role of these retained paralogs by relating gene properties to life style of the corresponding prokaryotes. Paralogs were identified in a number of prokaryotes, and these paralogs were compared to singletons of persistent orthologs based on functional classification. This showed that the paralogs were associated with for example energy production, cell motility, ion transport, and defence mechanisms. A statistical overrepresentation analysis of gene and protein annotations was based on paralogs of the 200 prokaryotes with the highest fraction of paralog-forming genes. Biclustering of overrepresented gene ontology terms versus species was used to identify clusters of properties associated with clusters of species. The clusters were classified using similarity scores on properties and species to identify interesting clusters, and a subset of clusters were analysed by comparison to literature data. This analysis showed that paralogs often are associated with properties that are important for survival and proliferation of the specific organisms. This includes processes like ion transport, locomotion, chemotaxis and photosynthesis. However, the analysis also showed that the gene ontology terms sometimes were too general, imprecise or even misleading for automatic analysis. Properties described by gene ontology terms identified in the overrepresentation analysis are often consistent with individual prokaryote lifestyles and are likely to give a competitive advantage to the organism. Paralogs and singletons dominate different categories of functional classification, where paralogs in particular seem to be associated with processes involving interaction with the environment.
The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

NASA Astrophysics Data System (ADS)

Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

2017-07-01

Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Text-mining analysis of mHealth research.

PubMed

Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies.
Text-mining analysis of mHealth research

PubMed Central

Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies. PMID:29430456
Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

NASA Technical Reports Server (NTRS)

Ballew, G.

1977-01-01

The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
Deriving spatial patterns from a novel database of volcanic rock geochemistry in the Virunga Volcanic Province, East African Rift

NASA Astrophysics Data System (ADS)

Poppe, Sam; Barette, Florian; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu

2016-04-01

The Virunga Volcanic Province (VVP) is situated within the western branch of the East-African Rift. The geochemistry and petrology of its' volcanic products has been studied extensively in a fragmented manner. They represent a unique collection of silica-undersaturated, ultra-alkaline and ultra-potassic compositions, displaying marked geochemical variations over the area occupied by the VVP. We present a novel spatially-explicit database of existing whole-rock geochemical analyses of the VVP volcanics, compiled from international publications, (post-)colonial scientific reports and PhD theses. In the database, a total of 703 geochemical analyses of whole-rock samples collected from the 1950s until recently have been characterised with a geographical location, eruption source location, analytical results and uncertainty estimates for each of these categories. Comparative box plots and Kruskal-Wallis H tests on subsets of analyses with contrasting ages or analytical methods suggest that the overall database accuracy is consistent. We demonstrate how statistical techniques such as Principal Component Analysis (PCA) and subsequent cluster analysis allow the identification of clusters of samples with similar major-element compositions. The spatial patterns represented by the contrasting clusters show that both the historically active volcanoes represent compositional clusters which can be identified based on their contrasted silica and alkali contents. Furthermore, two sample clusters are interpreted to represent the most primitive, deep magma source within the VVP, different from the shallow magma reservoirs that feed the eight dominant large volcanoes. The samples from these two clusters systematically originate from locations which 1. are distal compared to the eight large volcanoes and 2. mostly coincide with the surface expressions of rift faults or NE-SW-oriented inherited Precambrian structures which were reactivated during rifting. The lava from the Mugogo eruption of 1957 belongs to these primitive clusters and is the only known to have erupted outside the current rift valley in historical times. We thus infer there is a distributed hazard of vent opening susceptibility additional to the susceptibility associated with the main Virunga edifices. This study suggests that the statistical analysis of such geochemical database may help to understand complex volcanic plumbing systems and the spatial distribution of volcanic hazards in active and poorly known volcanic areas such as the Virunga Volcanic Province.
Molecular Clustering Interrelationships and Carbohydrate Conformation in Hull and Seeds Among Barley Cultivars

DOE Office of Scientific and Technical Information (OSTI.GOV)

N Liu; P Yu

2011-12-31

The objective of this study was to use molecular spectral analyses with the diffuse reflectance Fourier transform infrared spectroscopy (DRIFT) bioanlytical technique to study carbohydrate conformation features, molecular clustering and interrelationships in hull and seed among six barley cultivars (AC Metcalfe, CDC Dolly, McLeod, CDC Helgason, CDC Trey, CDC Cowboy), which had different degradation kinetics in rumen. The molecular structure spectral analyses in both hull and seed involved the fingerprint regions of ca. 1536-1484 cm{sup -1} (attributed mainly to aromatic lignin semicircle ring stretch), ca. 1293-1212 cm{sup -1} (attributed mainly to cellulosic compounds in the hull), ca. 1269-1217 cm{sup -1}more » (attributed mainly to cellulosic compound in the seeds), and ca. 1180-800 cm{sup -1} (attributed mainly to total CHO C-O stretching vibrations) together with an agglomerative hierarchical cluster (AHCA) and principal component spectral analyses (PCA). The results showed that the DRIFT technique plus AHCA and PCA molecular analyses were able to reveal carbohydrate conformation features and identify carbohydrate molecular structure differences in both hull and seeds among the barley varieties. The carbohydrate molecular spectral analyses at the region of ca. 1185-800 cm{sup -1} together with the AHCA and PCA were able to show that the barley seed inherent structures exhibited distinguishable differences among the barley varieties. CDC Helgason had differences from AC Metcalfe, MeLeod, CDC Cowboy and CDC Dolly in carbohydrate conformation in the seed. Clear molecular cluster classes could be distinguished and identified in AHCA analysis and the separate ellipses could be grouped in PCA analysis. But CDC Helgason had no distinguished differences from CDC Trey in carbohydrate conformation. These carbohydrate conformation/structure difference could partially explain why the varieties were different in digestive behaviors in animals. The molecular spectroscopy technique used in this study could also be used for other plant-based feed and food structure studies.« less
An Ecological Analysis of the Effects of Deviant Peer Clustering on Sexual Promiscuity, Problem Behavior, and Childbearing from Early Adolescence to Adulthood: An Enhancement of the Life History Framework

ERIC Educational Resources Information Center

Dishion, Thomas J.; Ha, Thao; Veronneau, Marie-Helene

2012-01-01

The authors propose that peer relationships should be included in a life history perspective on adolescent problem behavior. Longitudinal analyses were used to examine deviant peer clustering as the mediating link between attenuated family ties, peer marginalization, and social disadvantage in early adolescence and sexual promiscuity in middle…
Employing linear tetranuclear [Zn4(COO)4(OH)2] clusters as building subunits to construct a new Zn(II) coordination polymer with tunable luminescent properties

NASA Astrophysics Data System (ADS)

Li, Wu-Wu; Zhang, Zun-Ting

2016-02-01

A new Zn(II) coordination polymer, [Zn2(btc) (biimpy) (OH)]n (1 H3btc = 1,3,5-benzenetricarboxylic acid, biimpy = 2,6-bis(1-imdazoly)pyridine) has been successfully synthesized and characterized by elemental analysis, powder single crystal X-ray diffraction analyses. Compound 1 features a 3D framework employing linear tetranuclear [Zn4(COO)4(OH)2] cluster as building subunits. Topological analysis reveals it represents a (3,10)-connected structural topology by viewing btc3-, linear tetranuclear clusters and biimpy as 3-connected nodes, 10-connected nodes, linear linkers, respectively. Moreover, the thermal stability and luminescent property of compound 1 have been well investigated.
Global optimization of small bimetallic Pd-Co binary nanoalloy clusters: a genetic algorithm approach at the DFT level.

PubMed

Aslan, Mikail; Davis, Jack B A; Johnston, Roy L

2016-03-07

The global optimisation of small bimetallic PdCo binary nanoalloys are systematically investigated using the Birmingham Cluster Genetic Algorithm (BCGA). The effect of size and composition on the structures, stability, magnetic and electronic properties including the binding energies, second finite difference energies and mixing energies of Pd-Co binary nanoalloys are discussed. A detailed analysis of Pd-Co structural motifs and segregation effects is also presented. The maximal mixing energy corresponds to Pd atom compositions for which the number of mixed Pd-Co bonds is maximised. Global minimum clusters are distinguished from transition states by vibrational frequency analysis. HOMO-LUMO gap, electric dipole moment and vibrational frequency analyses are made to enable correlation with future experiments.
The adiposity of children is associated with their lifestyle behaviours: a cluster analysis of school-aged children from 12 nations.

PubMed

Dumuid, Dorothea; Olds, T; Lewis, L K; Martin-Fernández, J A; Barreira, T; Broyles, S; Chaput, J-P; Fogelholm, M; Hu, G; Kuriyan, R; Kurpad, A; Lambert, E V; Maia, J; Matsudo, V; Onywera, V O; Sarmiento, O L; Standage, M; Tremblay, M S; Tudor-Locke, C; Zhao, P; Katzmarzyk, P; Gillison, F; Maher, C

2018-02-01

The relationship between children's adiposity and lifestyle behaviour patterns is an area of growing interest. The objectives of this study are to identify clusters of children based on lifestyle behaviours and compare children's adiposity among clusters. Cross-sectional data from the International Study of Childhood Obesity, Lifestyle and the Environment were used. the participants were children (9-11 years) from 12 nations (n = 5710). 24-h accelerometry and self-reported diet and screen time were clustering input variables. Objectively measured adiposity indicators were waist-to-height ratio, percent body fat and body mass index z-scores. sex-stratified analyses were performed on the global sample and repeated on a site-wise basis. Cluster analysis (using isometric log ratios for compositional data) was used to identify common lifestyle behaviour patterns. Site representation and adiposity were compared across clusters using linear models. Four clusters emerged: (1) Junk Food Screenies, (2) Actives, (3) Sitters and (4) All-Rounders. Countries were represented differently among clusters. Chinese children were over-represented in Sitters and Colombian children in Actives. Adiposity varied across clusters, being highest in Sitters and lowest in Actives. Children from different sites clustered into groups of similar lifestyle behaviours. Cluster membership was linked with differing adiposity. Findings support the implementation of activity interventions in all countries, targeting both physical activity and sedentary time. © 2016 World Obesity Federation.
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.

PubMed

Nixon, R M; Thompson, S G

2003-09-15

Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
Statistical analysis of atom probe data: detecting the early stages of solute clustering and/or co-segregation.

PubMed

Hyde, J M; Cerezo, A; Williams, T J

2009-04-01

Statistical analysis of atom probe data has improved dramatically in the last decade and it is now possible to determine the size, the number density and the composition of individual clusters or precipitates such as those formed in reactor pressure vessel (RPV) steels during irradiation. However, the characterisation of the onset of clustering or co-segregation is more difficult and has traditionally focused on the use of composition frequency distributions (for detecting clustering) and contingency tables (for detecting co-segregation). In this work, the authors investigate the possibility of directly examining the neighbourhood of each individual solute atom as a means of identifying the onset of solute clustering and/or co-segregation. The methodology involves comparing the mean observed composition around a particular type of solute with that expected from the overall composition of the material. The methodology has been applied to atom probe data obtained from several irradiated RPV steels. The results show that the new approach is more sensitive to fine scale clustering and co-segregation than that achievable using composition frequency distribution and contingency table analyses.
Electrical Load Profile Analysis Using Clustering Techniques

NASA Astrophysics Data System (ADS)

Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.

2017-03-01

Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Potential of SNP markers for the characterization of Brazilian cassava germplasm.

PubMed

de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte

2014-06-01

High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.
Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

PubMed

Bansal, Ravi; Peterson, Bradley S

2018-06-01

Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.
MOCCA-SURVEY Database I: Is NGC 6535 a dark star cluster harbouring an IMBH?

NASA Astrophysics Data System (ADS)

Askar, Abbas; Bianchini, Paolo; de Vita, Ruggero; Giersz, Mirek; Hypki, Arkadiusz; Kamann, Sebastian

2017-01-01

We describe the dynamical evolution of a unique type of dark star cluster model in which the majority of the cluster mass at Hubble time is dominated by an intermediate-mass black hole (IMBH). We analysed results from about 2000 star cluster models (Survey Database I) simulated using the Monte Carlo code MOnte Carlo Cluster simulAtor and identified these dark star cluster models. Taking one of these models, we apply the method of simulating realistic `mock observations' by utilizing the Cluster simulatiOn Comparison with ObservAtions (COCOA) and Simulating Stellar Cluster Observation (SISCO) codes to obtain the photometric and kinematic observational properties of the dark star cluster model at 12 Gyr. We find that the perplexing Galactic globular cluster NGC 6535 closely matches the observational photometric and kinematic properties of the dark star cluster model presented in this paper. Based on our analysis and currently observed properties of NGC 6535, we suggest that this globular cluster could potentially harbour an IMBH. If it exists, the presence of this IMBH can be detected robustly with proposed kinematic observations of NGC 6535.
Career paths in physicians' postgraduate training - an eight-year follow-up study.

PubMed

Buddeberg-Fischer, Barbara; Stamm, Martina; Klaghofer, Richard

2010-10-06

To date, there are hardly any studies on the choice of career path in medical school graduates. The present study aimed to investigate what career paths can be identified in the course of postgraduate training of physicians; what factors have an influence on the choice of a career path; and in what way the career paths are correlated with career-related factors as well as with work-life balance aspirations. The data reported originates from five questionnaire surveys of the prospective SwissMedCareer Study, beginning in 2001 (T1, last year of medical school). The study sample consisted of 358 physicians (197 females, 55%; 161 males, 45%) participating at each assessment from T2 (2003, first year of residency) to T5 (2009, seventh year of residency), answering the question: What career do you aspire to have? Furthermore, personal characteristics, chosen specialty, career motivation, mentoring experience, work-life balance as well as workload, career success and career satisfaction were assessed. Career paths were analysed with cluster analysis, and differences between clusters analysed with multivariate methods. The cluster analysis revealed four career clusters which discriminated distinctly between each other: (1) career in practice, (2) hospital career, (3) academic career, and (4) changing career goal. From T3 (third year of residency) to T5, respondents in Cluster 1-3 were rather stable in terms of their career path aspirations, while those assigned to Cluster 4 showed a high fluctuation in their career plans. Physicians in Cluster 1 showed high values in extraprofessional concerns and often consider part-time work. Cluster 2 and 3 were characterised by high instrumentality, intrinsic and extrinsic career motivation, career orientation and high career success. No cluster differences were seen in career satisfaction. In Cluster 1 and 4, females were overrepresented. Trainees should be supported to stay on the career path that best suits his/her personal and professional profile. Attention should be paid to the subgroup of physicians in Cluster 4 switching from one to another career goal in the course of their postgraduate training.
The Use of Web 2.0 Tools by Students in Learning and Leisure Contexts: A Study in a Portuguese Institution of Higher Education

ERIC Educational Resources Information Center

Costa, Carolina; Alvelos, Helena; Teixeira, Leonor

2016-01-01

This study analyses and compares the use of Web 2.0 tools by students in both learning and leisure contexts. Data were collected based on a questionnaire applied to 234 students from the University of Aveiro (Portugal) and the results were analysed by using descriptive analysis, paired samples t-tests, cluster analyses and Kruskal-Wallis tests.…

Simultaneous Classification and Multidimensional Scaling with External Information

ERIC Educational Resources Information Center

Kiers, Henk A. L.; Vicari, Donatella; Vichi, Maurizio

2005-01-01

For the exploratory analysis of a matrix of proximities or (dis)similarities between objects, one often uses cluster analysis (CA) or multidimensional scaling (MDS). Solutions resulting from such analyses are sometimes interpreted using external information on the objects. Usually the procedures of CA, MDS and using external information are…
a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

NASA Astrophysics Data System (ADS)

Huang, W.; Li, S.; Xu, S.

2016-06-01

How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the approximate 55% spatiotemporal clusters distributed in different locations can be eventually grouped as the same type of clusters with consideration of semantic aspect.
Statistical analyses and characteristics of volcanic tremor on Stromboli Volcano (Italy)

NASA Astrophysics Data System (ADS)

Falsaperla, S.; Langer, H.; Spampinato, S.

A study of volcanic tremor on Stromboli is carried out on the basis of data recorded daily between 1993 and 1995 by a permanent seismic station (STR) located 1.8km away from the active craters. We also consider the signal of a second station (TF1), which operated for a shorter time span. Changes in the spectral tremor characteristics can be related to modifications in volcanic activity, particularly to lava effusions and explosive sequences. Statistical analyses were carried out on a set of spectra calculated daily from seismic signals where explosion quakes were present or excluded. Principal component analysis and cluster analysis were applied to identify different classes of spectra. Three clusters of spectra are associated with two different states of volcanic activity. One cluster corresponds to a state of low to moderate activity, whereas the two other clusters are present during phases with a high magma column as inferred from the occurrence of lava fountains or effusions. We therefore conclude that variations in volcanic activity at Stromboli are usually linked to changes in the spectral characteristics of volcanic tremor. Site effects are evident when comparing the spectra calculated from signals synchronously recorded at STR and TF1. However, some major spectral peaks at both stations may reflect source properties. Statistical considerations and polarization analysis are in favor of a prevailing presence of P-waves in the tremor signal along with a position of the source northwest of the craters and at shallow depth.
Spatial-temporal clustering of companion animal enteric syndrome: detection and investigation through the use of electronic medical records from participating private practices.

PubMed

Anholt, R M; Berezowski, J; Robertson, C; Stephen, C

2015-09-01

There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.
Strain-Level Diversity of Secondary Metabolism in Streptomyces albus

PubMed Central

Seipke, Ryan F.

2015-01-01

Streptomyces spp. are robust producers of medicinally-, industrially- and agriculturally-important small molecules. Increased resistance to antibacterial agents and the lack of new antibiotics in the pipeline have led to a renaissance in natural product discovery. This endeavor has benefited from inexpensive high quality DNA sequencing technology, which has generated more than 140 genome sequences for taxonomic type strains and environmental Streptomyces spp. isolates. Many of the sequenced streptomycetes belong to the same species. For instance, Streptomyces albus has been isolated from diverse environmental niches and seven strains have been sequenced, consequently this species has been sequenced more than any other streptomycete, allowing valuable analyses of strain-level diversity in secondary metabolism. Bioinformatics analyses identified a total of 48 unique biosynthetic gene clusters harboured by Streptomyces albus strains. Eighteen of these gene clusters specify the core secondary metabolome of the species. Fourteen of the gene clusters are contained by one or more strain and are considered auxiliary, while 16 of the gene clusters encode the production of putative strain-specific secondary metabolites. Analysis of Streptomyces albus strains suggests that each strain of a Streptomyces species likely harbours at least one strain-specific biosynthetic gene cluster. Importantly, this implies that deep sequencing of a species will not exhaust gene cluster diversity and will continue to yield novelty. PMID:25635820
The association between content of the elements S, Cl, K, Fe, Cu, Zn and Br in normal and cirrhotic liver tissue from Danes and Greenlandic Inuit examined by dual hierarchical clustering analysis.

PubMed

Laursen, Jens; Milman, Nils; Pind, Niels; Pedersen, Henrik; Mulvad, Gert

2014-01-01

Meta-analysis of previous studies evaluating associations between content of elements sulphur (S), chlorine (Cl), potassium (K), iron (Fe), copper (Cu), zinc (Zn) and bromine (Br) in normal and cirrhotic autopsy liver tissue samples. Normal liver samples from 45 Greenlandic Inuit, median age 60 years and from 71 Danes, median age 61 years. Cirrhotic liver samples from 27 Danes, median age 71 years. Element content was measured using X-ray fluorescence spectrometry. Dual hierarchical clustering analysis, creating a dual dendrogram, one clustering element contents according to calculated similarities, one clustering elements according to correlation coefficients between the element contents, both using Euclidian distance and Ward Procedure. One dendrogram separated subjects in 7 clusters showing no differences in ethnicity, gender or age. The analysis discriminated between elements in normal and cirrhotic livers. The other dendrogram clustered elements in four clusters: sulphur and chlorine; copper and bromine; potassium and zinc; iron. There were significant correlations between the elements in normal liver samples: S was associated with Cl, K, Br and Zn; Cl with S and Br; K with S, Br and Zn; Cu with Br. Zn with S and K. Br with S, Cl, K and Cu. Fe did not show significant associations with any other element. In contrast to simple statistical methods, which analyses content of elements separately one by one, dual hierarchical clustering analysis incorporates all elements at the same time and can be used to examine the linkage and interplay between multiple elements in tissue samples. Copyright © 2013 Elsevier GmbH. All rights reserved.
Person mobility in the design and analysis of cluster-randomized cohort prevention trials.

PubMed

Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard

2012-06-01

Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.
The faces of pain: a cluster analysis of individual differences in facial activity patterns of pain.

PubMed

Kunz, M; Lautenbacher, S

2014-07-01

There is general agreement that facial activity during pain conveys pain-specific information but is nevertheless characterized by substantial inter-individual differences. With the present study we aim to investigate whether these differences represent idiosyncratic variations or whether they can be clustered into distinct facial activity patterns. Facial actions during heat pain were assessed in two samples of pain-free individuals (n = 128; n = 112) and were later analysed using the Facial Action Coding System. Hierarchical cluster analyses were used to look for combinations of single facial actions in episodes of pain. The stability/replicability of facial activity patterns was determined across samples as well as across different basic social situations. Cluster analyses revealed four distinct activity patterns during pain, which stably occurred across samples and situations: (I) narrowed eyes with furrowed brows and wrinkled nose; (II) opened mouth with narrowed eyes; (III) raised eyebrows; and (IV) furrowed brows with narrowed eyes. In addition, a considerable number of participants were facially completely unresponsive during pain induction (stoic cluster). These activity patterns seem to be reaction stereotypies in the majority of individuals (in nearly two-thirds), whereas a minority displayed varying clusters across situations. These findings suggest that there is no uniform set of facial actions but instead there are at least four different facial activity patterns occurring during pain that are composed of different configurations of facial actions. Raising awareness about these different 'faces of pain' might hold the potential of improving the detection and, thereby, the communication of pain. © 2013 European Pain Federation - EFIC®
Bayesian multivariate hierarchical transformation models for ROC analysis.

PubMed

O'Malley, A James; Zou, Kelly H

2006-02-15

A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.
Bayesian multivariate hierarchical transformation models for ROC analysis

PubMed Central

O'Malley, A. James; Zou, Kelly H.

2006-01-01

SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836
Genetic structure of Plasmodium falciparum populations across the Honduras-Nicaragua border.

PubMed

Larrañaga, Nerea; Mejía, Rosa E; Hormaza, José I; Montoya, Alberto; Soto, Aida; Fontecha, Gustavo A

2013-10-04

The Caribbean coast of Central America remains an area of malaria transmission caused by Plasmodium falciparum despite the fact that morbidity has been reduced in recent years. Parasite populations in that region show interesting characteristics such as chloroquine susceptibility and low mortality rates. Genetic structure and diversity of P. falciparum populations in the Honduras-Nicaragua border were analysed in this study. Seven neutral microsatellite loci were analysed in 110 P. falciparum isolates from endemic areas of Honduras (n = 77) and Nicaragua (n = 33), mostly from the border region called the Moskitia. Several analyses concerning the genetic diversity, linkage disequilibrium, population structure, molecular variance, and haplotype clustering were conducted. There was a low level of genetic diversity in P. falciparum populations from Honduras and Nicaragua. Expected heterozigosity (H(e)) results were similarly low for both populations. A moderate differentiation was revealed by the F(ST) index between both populations, and two putative clusters were defined through a structure analysis. The main cluster grouped most of samples from Honduras and Nicaragua, while the second cluster was smaller and included all the samples from the Siuna community in Nicaragua. This result could partially explain the stronger linkage disequilibrium (LD) in the parasite population from that country. These findings are congruent with the decreasing rates of malaria endemicity in Central America.
Transformation and model choice for RNA-seq co-expression analysis.

PubMed

Rau, Andrea; Maugis-Rabusseau, Cathy

2018-05-01

Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Cluster-distinguishing genotypic and phenotypic diversity of carbapenem-resistant Gram-negative bacteria in solid-organ transplantation patients: a comparative study.

PubMed

Karampatakis, Theodoros; Geladari, Anastasia; Politi, Lida; Antachopoulos, Charalampos; Iosifidis, Elias; Tsiatsiou, Olga; Karyoti, Aggeliki; Papanikolaou, Vasileios; Tsakris, Athanassios; Roilides, Emmanuel

2017-07-31

Solid-organ transplant recipients may display high rates of colonization and/or infection by multidrug-resistant bacteria. We analysed and compared the phenotypic and genotypic diversity of carbapenem-resistant (CR) strains of Klebsiella pneumoniae, Pseudomonas aeruginosa and Acinetobacter baumannii isolated from patients in the Solid Organ Transplantation department of our hospital. Between March 2012 and August 2013, 56 CR strains from various biological fluids underwent antimicrobial susceptibility testing with VITEK 2, molecular analysis by PCR amplification and genotypic analysis with pulsed-field gel electrophoresis (PFGE). They were clustered according to antimicrobial drug susceptibility and genotypic profiles. Diversity analyses were performed by calculating Simpson's diversity index and applying computed rarefaction curves.Results/Key findings. Among K. pneumoniae, KP-producers predominated (57.1 %). VIM and OXA-23 carbapenemases prevailed among P. aeruginosa and A. baumannii (89.4 and 88.9 %, respectively). KPC-producing K. pneumoniae and OXA-23 A. baumannii were assigned in single PFGE pulsotypes. VIM-producing P. aeruginosa generated multiple pulsotypes. CR K. pneumoniae strains displayed phenotypic diversity in tigecycline, colistin (CS), amikacin (AMK), gentamicin (GEN) and co-trimoxazole (SXT) (16 clusters); P. aeruginosa displayed phenotypic diversity in cefepime (FEP), ceftazidime, aztreonam, piperacillin, piperacillin-tazobactam, AMK, GEN and CS (9 clusters); and A. baumannii displayed phenotypic diversity in AMK, GEN, SXT, FEP, tobramycin and rifampicin (8 clusters). The Simpson diversity indices for the interpretative phenotype and PFGE analysis were 0.89 and 0.6, respectively, for K. pneumoniae strains (P<0.001); 0.77 and 0.6 for P. aeruginosa (P=0.22); and 0.86 and 0.19 for A. baumannii (P=0.004). The presence of different antimicrobial susceptibility profiles does not preclude the possibility that two CR K. pneumoniae or A. baumannii isolates are clonally related.
Moment tensor clustering: a tool to monitor mining induced seismicity

NASA Astrophysics Data System (ADS)

Cesca, Simone; Dahm, Torsten; Tolga Sen, Ali

2013-04-01

Automated moment tensor inversion routines have been setup in the last decades for the analysis of global and regional seismicity. Recent developments could be used to analyse smaller events and larger datasets. In particular, applications to microseismicity, e.g. in mining environments, have then led to the generation of large moment tensor catalogues. Moment tensor catalogues provide a valuable information about the earthquake source and details of rupturing processes taking place in the seismogenic region. Earthquake focal mechanisms can be used to discuss the local stress field, possible orientations of the fault system or to evaluate the presence of shear and/or tensile cracks. Focal mechanism and moment tensor solutions are typically analysed for selected events, and quick and robust tools for the automated analysis of larger catalogues are needed. We propose here a method to perform cluster analysis for large moment tensor catalogues and identify families of events which characterize the studied microseismicity. Clusters include events with similar focal mechanisms, first requiring the definition of distance between focal mechanisms. Different metrics are here proposed, both for the case of pure double couple, constrained moment tensor and full moment tensor catalogues. Different clustering approaches are implemented and discussed. The method is here applied to synthetic and real datasets from mining environments to demonstrate its potential: the proposed cluserting techniques prove to be able to automatically recognise major clusters. An important application for mining monitoring concerns the early identification of anomalous rupture processes, which is relevant for the hazard assessment. This study is funded by the project MINE, which is part of the R&D-Programme GEOTECHNOLOGIEN. The project MINE is funded by the German Ministry of Education and Research (BMBF), Grant of project BMBF03G0737.
A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB.

PubMed

Kent, Peter; Jensen, Rikke K; Kongsted, Alice

2014-10-02

There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups. Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.
A formal concept analysis approach to consensus clustering of multi-experiment expression data

PubMed Central

2014-01-01

Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
A HIV-1 heterosexual transmission chain in Guangzhou, China: a molecular epidemiological study.

PubMed

Han, Zhigang; Leung, Tommy W C; Zhao, Jinkou; Wang, Ming; Fan, Lirui; Li, Kai; Pang, Xinli; Liang, Zhenbo; Lim, Wilina W L; Xu, Huifang

2009-09-25

We conducted molecular analyses to confirm four clustering HIV-1 infections (Patient A, B, C & D) in Guangzhou, China. These cases were identified by epidemiological investigation and suspected to acquire the infection through a common heterosexual transmission chain. Env C2V3V4 region, gag p17/p24 junction and partial pol gene of HIV-1 genome from serum specimens of these infected cases were amplified by reverse transcription polymerase chain reaction (RT-PCR) and nucleotide sequenced. Phylogenetic analyses indicated that their viral nucleotide sequences were significantly clustered together (bootstrap value is 99%, 98% and 100% in env, gag and pol tree respectively). Evolutionary distance analysis indicated that their genetic diversities of env, gag and pol genes were significantly lower than non-clustered controls, as measured by unpaired t-test (env gene comparison: p < 0.005; gag gene comparison: p < 0.005; pol gene comparison: p < 0.005). Epidemiological results and molecular analyses consistently illustrated these four cases represented a transmission chain which dispersed in the locality through heterosexual contact involving commercial sex worker.
Vitamin and mineral supplement users. Do they have healthy or unhealthy dietary behaviours?

PubMed

van der Horst, Klazine; Siegrist, Michael

2011-12-01

It is unknown whether people use vitamin and mineral supplements (VMS) to compensate for unhealthy diets, or people whom already have a healthy diet use VMS. Therefore, this study aimed to examine correlates of VMS use and whether VMS users can be categorised into specific clusters based on dietary lifestyle variables. The data used came from the Swiss Food Panel questionnaire for 2010. The sample consisted of 6189 respondents, mean age was 54 years and 47.6% were males. Data was analysed with logistic regression analysis and hierarchical cluster analysis. The results revealed that for VMS use, gender, age, education, chronic illness, health consciousness, benefits of fortification, convenience food and sugar-sweetened beverage consumption were of importance. Cluster analysis revealed three clusters (1) healthy diet, (2) unhealthy diet and (3) modest diet. Compared to non-users a higher percentage of VMS users was categorised in the healthy cluster and a lower percentage in the unhealthy cluster. More VMS-users were categorised as having an unhealthy diet (31.4%) than having a healthy diet (20.6%). The results suggest that both hypotheses-VMS are used by people with unhealthy diets and by people who least need them-hold true meaning. Copyright © 2011. Published by Elsevier Ltd.
Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

NASA Astrophysics Data System (ADS)

Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

2014-01-01

Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 < z < 0.9), all with HST imaging data available. This survey has two main objectives: to constrain dark energy (DE) using weak lensing tomography on galaxy clusters and to build a database (deep multi-band imaging allowing photometric redshift estimates, spectroscopic data, X-ray data) of rich distant clusters to study their properties. Aims: We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures show the same general behaviour as regular cluster galaxies; however, in substructures, there is a deficiency of both late type and old stellar population galaxies. Late type galaxies with recent bursts of star formation seem to be missing in the substructures close to the bottom of the host cluster potential well. However, our sample would need to be increased to allow a more robust analysis. Tables 1, 2, 4 and Appendices A-C are available in electronic form at http://www.aanda.org
Space-time analysis of Down syndrome: results consistent with transient pre-disposing contagious agent.

PubMed

McNally, Richard J Q; Rankin, Judith; Shirley, Mark D F; Rushton, Stephen P; Pless-Mulloli, Tanja

2008-10-01

Whilst maternal age is an established risk factor for Patau syndrome (trisomy 13), Edwards syndrome (trisomy 18) and Down syndrome (trisomy 21), the aetiology and contribution of genetic and environmental factors remains unclear. We analysed for space-time clustering using high quality fully population-based data from a geographically defined region. The study included all cases of Patau, Edwards and Down syndrome, delivered during 1985-2003 and resident in the former Northern Region of England, including terminations of pregnancy for fetal anomaly. We applied the K-function test for space-time clustering with fixed thresholds of close in space and time using residential addresses at time of delivery. The Knox test was used to indicate the range over which the clustering effect occurred. Tests were repeated using nearest neighbour (NN) thresholds to adjust for variable population density. The study analysed 116 cases of Patau syndrome, 240 cases of Edwards syndrome and 1084 cases of Down syndrome. There was evidence of space-time clustering for Down syndrome (fixed threshold of close in space: P = 0.01, NN threshold: P = 0.02), but little or no clustering for Patau (P = 0.57, P = 0.19) or Edwards (P = 0.37, P = 0.06) syndromes. Clustering of Down syndrome was associated with cases from more densely populated areas and evidence of clustering persisted when cases were restricted to maternal age <40 years. The highly novel space-time clustering for Down syndrome suggests an aetiological role for transient environmental factors, such as infections.

[Structural analysis of the functional status of the brain as affected by bemethyl using pattern recognition theory].

PubMed

Bobkov, Iu G; Machula, A I; Morozov, Iu I; Dvalishvili, E G

1987-11-01

Evoked visual potentials in associated, parietal and second somatosensory zones of the neocortex were analysed in trained cats using implanted electrodes. The influence of bemethyl on the structure of behavioral reactions was analysed using theoretical methods of perceptual images, particularly the method of cluster analysis. Bemethyl was shown to increase the level of interaction between the functional elements of the system, leading to a more stable resolution of problems facing the system, as compared to the initial state.
Descriptive epidemiology of typhoid fever during an epidemic in Harare, Zimbabwe, 2012.

PubMed

Polonsky, Jonathan A; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J

2014-01-01

Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range.
Descriptive Epidemiology of Typhoid Fever during an Epidemic in Harare, Zimbabwe, 2012

PubMed Central

Polonsky, Jonathan A.; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J.

2014-01-01

Background Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. Methods A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. Principal Findings We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. Conclusions This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range. PMID:25486292
Assessment of self-organizing maps to analyze sole-carbon source utilization profiles.

PubMed

Leflaive, Joséphine; Céréghino, Régis; Danger, Michaël; Lacroix, Gérard; Ten-Hage, Loïc

2005-07-01

The use of community-level physiological profiles obtained with Biolog microplates is widely employed to consider the functional diversity of bacterial communities. Biolog produces a great amount of data which analysis has been the subject of many studies. In most cases, after some transformations, these data were investigated with classical multivariate analyses. Here we provided an alternative to this method, that is the use of an artificial intelligence technique, the Self-Organizing Maps (SOM, unsupervised neural network). We used data from a microcosm study of algae-associated bacterial communities placed in various nutritive conditions. Analyses were carried out on the net absorbances at two incubation times for each substrates and on the chemical guild categorization of the total bacterial activity. Compared to Principal Components Analysis and cluster analysis, SOM appeared as a valuable tool for community classification, and to establish clear relationships between clusters of bacterial communities and sole-carbon sources utilization. Specifically, SOM offered a clear bidimensional projection of a relatively large volume of data and were easier to interpret than plots commonly obtained with multivariate analyses. They would be recommended to pattern the temporal evolution of communities' functional diversity.
Developmental analysis of the dopamine-containing neurons of the Drosophila brain

PubMed Central

Hartenstein, Volker; Cruz, Louie; Lovick, Jennifer K.; Guo, Ming

2016-01-01

The Drosophila dopaminergic (DA) system consists of a relatively small number of neurons clustered throughout the brain and ventral nerve cord. Previous work shows that clusters of DA neurons innervate different brain compartments, which in part accounts for functional diversity of the DA system. In this paper, we analyzed the association between DA neuron clusters and specific brain lineages, developmental and structural units of the Drosophila brain which provide a framework of connections that can be followed throughout development. The hatching larval brain contains six groups of primary DA neurons (born in the embryo), which we assign to six distinct lineages. We can show that all larval DA clusters persist into the adult brain. Some clusters increase in cell number during late larval stages while others do not become DA-positive until early pupa. Ablating neuroblasts with hydroxyurea (HU) prior to onset of larval proliferation (generates secondary neurons) confirms these added DA clusters are primary neurons born in the embryo, rather than secondary neurons. A single cluster that becomes DA-positive in the late pupa, PAM1/lineage DALcm1/2, forms part of a secondary lineage which can be ablated by larval HU application. By supplying lineage information for each DA cluster, our analysis promotes further developmental and functional analyses of this important system of neurons. PMID:27350102
Fatality rate of pedestrians and fatal crash involvement rate of drivers in pedestrian crashes: a case study of Iran.

PubMed

Kashani, Ali Tavakoli; Besharati, Mohammad Mehdi

2017-06-01

The aim of this study was to uncover patterns of pedestrian crashes. In the first stage, 34,178 pedestrian-involved crashes occurred in Iran during a four-year period were grouped into homogeneous clusters using a clustering analysis. Next, some in-cluster and inter-cluster crash patterns were analysed. The clustering analysis yielded six pedestrian crash groups. Car/van/pickup crashes on rural roads as well as heavy vehicle crashes were found to be less frequent but more likely to be fatal compared to other crash clusters. In addition, after controlling for crash frequency in each cluster, it was found that the fatality rate of each pedestrian age group as well as the fatal crash involvement rate of each driver age group varies across the six clusters. Results of present study has some policy implications including, promoting pedestrian safety training sessions for heavy vehicle drivers, imposing limitations over elderly heavy vehicle drivers, reinforcing penalties toward under 19 drivers and motorcyclists. In addition, road safety campaigns in rural areas may be promoted to inform people about the higher fatality rate of pedestrians on rural roads. The crash patterns uncovered in this study might also be useful for prioritizing future pedestrian safety research areas.
Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of Landsat digital data. [mapping of hydrothermally altered volcanic rocks

NASA Technical Reports Server (NTRS)

Ballew, G.

1977-01-01

The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed using Johnson's HICLUS program. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
Comparative analysis of bones, mites, soil chemistry, nematodes and soil micro-eukaryotes from a suspected homicide to estimate the post-mortem interval.

PubMed

Szelecz, Ildikó; Lösch, Sandra; Seppey, Christophe V W; Lara, Enrique; Singer, David; Sorge, Franziska; Tschui, Joelle; Perotti, M Alejandra; Mitchell, Edward A D

2018-01-08

Criminal investigations of suspected murder cases require estimating the post-mortem interval (PMI, or time after death) which is challenging for long PMIs. Here we present the case of human remains found in a Swiss forest. We have used a multidisciplinary approach involving the analysis of bones and soil samples collected beneath the remains of the head, upper and lower body and "control" samples taken a few meters away. We analysed soil chemical characteristics, mites and nematodes (by microscopy) and micro-eukaryotes (by Illumina high throughput sequencing). The PMI estimate on hair 14 C-data via bomb peak radiocarbon dating gave a time range of 1 to 3 years before the discovery of the remains. Cluster analyses for soil chemical constituents, nematodes, mites and micro-eukaryotes revealed two clusters 1) head and upper body and 2) lower body and controls. From mite evidence, we conclude that the body was probably brought to the site after death. However, chemical analyses, nematode community analyses and the analyses of micro-eukaryotes indicate that decomposition took place at least partly on site. This study illustrates the usefulness of combining several lines of evidence for the study of homicide cases to better calibrate PMI inference tools.
Health and disease phenotyping in old age using a cluster network analysis.

PubMed

Valenzuela, Jesus Felix; Monterola, Christopher; Tong, Victor Joo Chuan; Ng, Tze Pin; Larbi, Anis

2017-11-15

Human ageing is a complex trait that involves the synergistic action of numerous biological processes that interact to form a complex network. Here we performed a network analysis to examine the interrelationships between physiological and psychological functions, disease, disability, quality of life, lifestyle and behavioural risk factors for ageing in a cohort of 3,270 subjects aged ≥55 years. We considered associations between numerical and categorical descriptors using effect-size measures for each variable pair and identified clusters of variables from the resulting pairwise effect-size network and minimum spanning tree. We show, by way of a correspondence analysis between the two sets of clusters, that they correspond to coarse-grained and fine-grained structure of the network relationships. The clusters obtained from the minimum spanning tree mapped to various conceptual domains and corresponded to physiological and syndromic states. Hierarchical ordering of these clusters identified six common themes based on interactions with physiological systems and common underlying substrates of age-associated morbidity and disease chronicity, functional disability, and quality of life. These findings provide a starting point for indepth analyses of ageing that incorporate immunologic, metabolomic and proteomic biomarkers, and ultimately offer low-level-based typologies of healthy and unhealthy ageing.
Microsatellite markers identify three lineages of Phytophthora ramorum in US nurseries, yet single lineages in US forest and European nursery populations.

PubMed

Ivors, K; Garbelotto, M; Vries, I D E; Ruyter-Spira, C; Te Hekkert, B; Rosenzweig, N; Bonants, P

2006-05-01

Analysis of 12 polymorphic simple sequence repeats identified in the genome sequence of Phytophthora ramorum, causal agent of 'sudden oak death', revealed genotypic diversity to be significantly higher in nurseries (91% of total) than in forests (18% of total). Our analysis identified only two closely related genotypes in US forests, while the genetic structure of populations from European nurseries was of intermediate complexity, including multiple, closely related genotypes. Multilocus analysis determined populations in US forests reproduce clonally and are likely descendants of a single introduced individual. The 151 isolates analysed clustered in three clades. US forest and European nursery isolates clustered into two distinct clades, while one isolate from a US nursery belonged to a third novel clade. The combined microsatellite, sequencing and morphological analyses suggest the three clades represent distinct evolutionary lineages. All three clades were identified in some US nurseries, emphasizing the role of commercial plant trade in the movement of this pathogen.
Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States.

PubMed

Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A

2017-01-01

A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents.
Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States

PubMed Central

Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S.; Allard, Marc W.; Brown, Eric W.; Strain, Errol A.

2017-01-01

A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents. PMID:28166293
Space-Time Analysis of Testicular Cancer Clusters Using Residential Histories: A Case-Control Study in Denmark

PubMed Central

Sloan, Chantel D.; Nordsborg, Rikke B.; Jacquez, Geoffrey M.; Raaschou-Nielsen, Ole; Meliker, Jaymie R.

2015-01-01

Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID:25756204
Space-time analysis of testicular cancer clusters using residential histories: a case-control study in Denmark.

PubMed

Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R

2015-01-01

Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population.
Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

NASA Astrophysics Data System (ADS)

Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.

2014-06-01

Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Assessment and application of clustering techniques to atmospheric particle number size distribution for the purpose of source apportionment

NASA Astrophysics Data System (ADS)

Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.

2014-11-01

Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Identification among morphologically similar Argyreia (Convolvulaceae) based on leaf anatomy and phenetic analyses.

PubMed

Traiperm, Paweena; Chow, Janene; Nopun, Possathorn; Staples, G; Swangpol, Sasivimon C

2017-12-01

The genus Argyreia Lour. is one of the species-rich Asian genera in the family Convolvulaceae. Several species complexes were recognized in which taxon delimitation was imprecise, especially when examining herbarium materials without fully developed open flowers. The main goal of this study is to investigate and describe leaf anatomy for some morphologically similar Argyreia using epidermal peeling, leaf and petiole transverse sections, and scanning electron microscopy. Phenetic analyses including cluster analysis and principal component analysis were used to investigate the similarity of these morpho-types. Anatomical differences observed between the morpho-types include epidermal cell walls and the trichome types on the leaf epidermis. Additional differences in the leaf and petiole transverse sections include the epidermal cell shape of the adaxial leaf blade, the leaf margins, and the petiole transverse sectional outline. The phenogram from cluster analysis using the UPGMA method represented four groups with an R value of 0.87. Moreover, the important quantitative and qualitative leaf anatomical traits of the four groups were confirmed by the principal component analysis of the first two components. The results from phenetic analyses confirmed the anatomical differentiation between the morpho-types. Leaf anatomical features regarded as particularly informative for morpho-type differentiation can be used to supplement macro morphological identification.
DMINDA: an integrated web server for DNA motif identification and analyses

PubMed Central

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-01-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419
Who attends a Children's Hospital Emergency Department for dental reasons? A two-step cluster analysis approach.

PubMed

Marshman, Z; Broomhead, T; Rodd, H D; Jones, K; Burke, D; Baker, S R

2016-09-28

Emergency departments (EDs) have been identified as key providers of dental care although few studies have examined patterns of attendance or clusters of characteristics. The aim was to identify the reasons for visits to an ED, whether these remained stable over time, and characterize clusters of patients by socio-demographic and attendance variables. Pseudonymized data were obtained for children who attended the ED in 2003-2004, 2004-2005 and 2012-2013. Presenting complaint was categorized as attending for dental or nondental reasons. Other variables analysed included patient (age, sex, ethnicity and deprivation) and attendance characteristics (distance travelled, season, nature of complaint, time elapsed since onset of symptoms, day of week and hours of attendance), together with treatment outcome (advice, antibiotics and referral). To assess trends over time, analyses were conducted on patient, attendance and treatment outcome variables. To examine whether patients could be characterized by socio-demographic and attendance variables, a two-step cluster analysis was undertaken on 2003-2004 data set and validated on 2004-2005 and 2012-2013 data sets. In 2003-2004, 550 children attended the ED for dental reasons rising to 687 in 2012-2013. The most important predictors of dental attendance were as follows: nature of complaint, ethnicity, time elapsed, sex and deprivation of the area in which children lived. The analysis showed two clusters: cluster 1 was comprised of children who attended the ED for dental injury, were of White ethnicity and attended within 24 h of onset of symptoms. Children in this cluster were likely to be from the least or less deprived areas (compared to Cluster 2) and were more likely to be males. Cluster 2 comprised of children attending the ED for caries, oral mucosal lesions or other complaints, were likely to be of other (non-White) ethnicities and were likely to attend more than 24 h after symptoms began. Children in this cluster were more likely to come from the most deprived areas and were both males and females. The clusters varied according to treatment outcome; those patients in Cluster 2 were more likely to be prescribed medication, whilst those children in Cluster 1 were more likely to be referred to another specialty. A significant number of visits to the ED were for dental reasons with two clusters of children. The results have identified groups of patients for whom appropriate dental provision is lacking and where targeted services are needed to improve outcomes for children and reduce the burden on EDs. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

PubMed

Xiao, Yongling; Abrahamowicz, Michal

2010-03-30

We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.

Traveling around Cape Horn: Otolith chemistry reveals a mixed stock of Patagonian hoki with separate Atlantic and Pacific spawning grounds

USGS Publications Warehouse

Schuchert, P.C.; Arkhipkin, A.I.; Koenig, A.E.

2010-01-01

Trace element fingerprints of edge and core regions in otoliths from 260 specimens of Patagonian hoki, Macruronus magellanicus L??nnberg, 1907, were analyzed by LA-ICPMS to reveal whether this species forms one or more population units (stocks) in the Southern Oceans. Fish were caught on their spawning grounds in Chile and feeding grounds in Chile and the Falkland Islands. Univariate and multivariate analyses of trace element concentrations in the otolith edges, which relate to the adult life of fish, could not distinguish between Atlantic (Falkland) and Pacific (Chile) hoki. Cluster analyses of element concentrations in the otolith edges produced three different clusters in all sample areas indicating high mixture of the stocks. Cluster analysis of trace element concentrations in the otolith cores, relating to juvenile and larval life stages, produced two separate clusters mainly distinguished by 137Ba concentrations. The results suggest that Patagonian hoki is a highly mixed fish stock with at least two spawning grounds around South America. ?? 2009 Elsevier B.V.
Broad DNA methylation changes of spermatogenesis, inflammation and immune response-related genes in a subgroup of sperm samples for assisted reproduction.

PubMed

Schütte, B; El Hajj, N; Kuhtz, J; Nanda, I; Gromoll, J; Hahn, T; Dittrich, M; Schorsch, M; Müller, T; Haaf, T

2013-11-01

Aberrant sperm DNA methylation patterns, mainly in imprinted genes, have been associated with male subfertility and oligospermia. Here, we performed a genome-wide methylation analysis in sperm samples representing a wide range of semen parameters. Sperm DNA samples of 38 males attending a fertility centre were analysed with Illumina HumanMethylation27 BeadChips, which quantify methylation of >27 000 CpG sites in cis-regulatory regions of almost 15 000 genes. In an unsupervised analysis of methylation of all analysed sites, the patient samples clustered into a major and a minor group. The major group clustered with samples from normozoospermic healthy volunteers and, thus, may more closely resemble the normal situation. When correlating the clusters with semen and clinical parameters, the sperm counts were significantly different between groups with the minor group exhibiting sperm counts in the low normal range. A linear model identified almost 3000 CpGs with significant methylation differences between groups. Functional analysis revealed a broad gain of methylation in spermatogenesis-related genes and a loss of methylation in inflammation- and immune response-related genes. Quantitative bisulfite pyrosequencing validated differential methylation in three of five significant candidate genes on the array. Collectively, we identified a subgroup of sperm samples for assisted reproduction with sperm counts in the low normal range and broad methylation changes (affecting approximately 10% of analysed CpG sites) in specific pathways, most importantly spermatogenesis-related genes. We propose that epigenetic analysis can supplement traditional semen parameters and has the potential to provide new insights into the aetiology of male subfertility. © 2013 American Society of Andrology and European Academy of Andrology.
Floral and Vegetative Morphometrics of Five Pleurothallis (Orchidaceae) Species: Correlation with Taxonomy, Phylogeny, Genetic Variability and Pollination Systems

PubMed Central

BORBA, EDUARDO L.; SHEPHERD, GEORGE J.; BERG, CÁSSIO VAN DEN; SEMIR, JOÃO

2002-01-01

Morphometric analyses of vegetative and floral characters were conducted in 21 populations of five Pleurothallis (Orchidaceae) species occurring in Brazilian ‘campo rupestre’ vegetation. A phylogenetic analysis of this species group was also carried out using nuclear ribosomal DNA internal transcribed spacers (ITS1 and ITS2). Results of the ordination and cluster analyses agree with species’ delimitation revealed by taxonomic and allozyme studies. The groups formed in ordination analysis correspond to the pollinator groups determined in a previous pollination study. Relationships among the species in the cluster analysis using only vegetative characters are similar to those found in a previous allozyme study, but those indicated by cluster analysis using only floral characters differ. These results support the hypothesis that floral similarities are due to convergence driven by similar pollination mechanisms, and therefore floral traits may not be good indicators of phylogenetic relationships in this group. The results of the phylogenetic analysis support this conclusion to some extent. There is no correlation between genetic (allozyme) and morphological variability in the populations nor in the way this variability is distributed among conspecific populations. We describe a new subspecies of Pleurothallis ochreata based on differences in vegetative and chemical characters as well as geographic distribution. Absence of differentiation in floral characters, attraction of the same pollinator species, interfertility and genetic similarity support the argument for subspecific rather than specific status. PMID:12197519
Diary Data Subjected to Cluster Analysis of Intake/Output/Void Habits with Resulting Clusters Compared by Continence Status, Age, Race

PubMed Central

Miller, Janis M; Guo, Ying; Rodseth, Sarah Becker

2011-01-01

Background Data that incorporate the full complexity of healthy beverage intake and voiding frequency do not exist; therefore, clinicians reviewing bladder habits or voiding diaries for continence care must rely on expert opinion recommendations. Objective To use data-driven cluster analyses to reduce complex voiding diary variables into discrete patterns or data cluster profiles, descriptively name the clusters, and perform validity testing. Method Participants were 352 community women who filled out a 3-day voiding diary. Six variables (void frequency during daytime hours, void frequency during nighttime hours, modal output, total output, total intake, and body mass index) were entered into cluster analyses. The clusters were analyzed for differences by continence status, age, race (Black women, n = 196 White women, n = 156), and for those who were incontinent, by leakage episode severity. Results Three clusters emerged, labeled descriptively as Conventional, Benchmark, and Superplus. The Conventional cluster (68% of the sample) demonstrated mean daily intake of 45 ±13 ounces; mean daily output of 37 ± 15 ounces, mean daily voids 5 ± 2 times, mean modal daytime output 10±0.5 ounces, and mean nighttime voids 1±1 times. The Superplus cluster (7% of the sample) showed double or triple these values across the 5 variables, and the Benchmark cluster (25%) showed values consistent with current popular recommendations on intake and output (e.g., meeting or exceeding the 8 × 8 fluid intake rule of thumb). The clusters differed significantly (p < .05) by age, race, amount of irritating beverages consumed, and incontinence status. Discussion Identification of three discrete clusters provides for a potential parsimonious but data-driven means of classifying individuals for additional epidemiological or clinical study. The clinical utility rests with potential for intervening to move an individual from a high risk to low risk cluster with regards to incontinence. PMID:21317828
Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

PubMed

Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

2018-04-01

Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).
a Morphometric Analysis of HYLARANA SIGNATA Group (previously Known as RANA SIGNATA and RANA PICTURATA) of Malaysia

NASA Astrophysics Data System (ADS)

Zainudin, Ramlah; Sazali, Siti Nurlydia

A study on morphometrical variations of Malaysian Hylarana signata group was conducted to reveal the morphological relationships within the species group. Twenty-seven morphological characters from 18 individuals of H. signata and H. picturata were measured and recorded. The numerical data were analysed using Discriminant Function Analysis in SPSS program version 16.0 and UPGMA Cluster Analysis in Minitab program version 14.0. The results show the complexity clustering between the examined species that might be due to ancient polymorphism of the lineages or cryptic species within the group. Hence, further study should include more representatives in order to fully elucidate the morphological relationships of H. signata group.
ETE: a python Environment for Tree Exploration.

PubMed

Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni

2010-01-13

Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.
ETE: a python Environment for Tree Exploration

PubMed Central

2010-01-01

Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org. PMID:20070885
Analysis of Chromobacterium sp. natural isolates from different Brazilian ecosystems

PubMed Central

Lima-Bittencourt, Cláudia I; Astolfi-Filho, Spartaco; Chartone-Souza, Edmar; Santos, Fabrício R; Nascimento, Andréa MA

2007-01-01

Background Chromobacterium violaceum is a free-living bacterium able to survive under diverse environmental conditions. In this study we evaluate the genetic and physiological diversity of Chromobacterium sp. isolates from three Brazilian ecosystems: Brazilian Savannah (Cerrado), Atlantic Rain Forest and Amazon Rain Forest. We have analyzed the diversity with molecular approaches (16S rRNA gene sequences and amplified ribosomal DNA restriction analysis) and phenotypic surveys of antibiotic resistance and biochemistry profiles. Results In general, the clusters based on physiological profiles included isolates from two or more geographical locations indicating that they are not restricted to a single ecosystem. The isolates from Brazilian Savannah presented greater physiologic diversity and their biochemical profile was the most variable of all groupings. The isolates recovered from Amazon and Atlantic Rain Forests presented the most similar biochemical characteristics to the Chromobacterium violaceum ATCC 12472 strain. Clusters based on biochemical profiles were congruent with clusters obtained by the 16S rRNA gene tree. According to the phylogenetic analyses, isolates from the Amazon Rain Forest and Savannah displayed a closer relationship to the Chromobacterium violaceum ATCC 12472. Furthermore, 16S rRNA gene tree revealed a good correlation between phylogenetic clustering and geographic origin. Conclusion The physiological analyses clearly demonstrate the high biochemical versatility found in the C. violaceum genome and molecular methods allowed to detect the intra and inter-population diversity of isolates from three Brazilian ecosystems. PMID:17584942
New Insights into the Diversity of the Genus Faecalibacterium.

PubMed

Benevides, Leandro; Burman, Sriti; Martin, Rebeca; Robert, Véronique; Thomas, Muriel; Miquel, Sylvie; Chain, Florian; Sokol, Harry; Bermudez-Humaran, Luis G; Morrison, Mark; Langella, Philippe; Azevedo, Vasco A; Chatel, Jean-Marc; Soares, Siomar

2017-01-01

Faecalibacterium prausnitzii is a commensal bacterium, ubiquitous in the gastrointestinal tracts of animals and humans. This species is a functionally important member of the microbiota and studies suggest it has an impact on the physiology and health of the host. F. prausnitzii is the only identified species in the genus Faecalibacterium , but a recent study clustered strains of this species in two different phylogroups. Here, we propose the existence of distinct species in this genus through the use of comparative genomics. Briefly, we performed analyses of 16S rRNA gene phylogeny, phylogenomics, whole genome Multi-Locus Sequence Typing (wgMLST), Average Nucleotide Identity (ANI), gene synteny, and pangenome to better elucidate the phylogenetic relationships among strains of Faecalibacterium . For this, we used 12 newly sequenced, assembled, and curated genomes of F. prausnitzii , which were isolated from feces of healthy volunteers from France and Australia, and combined these with published data from 5 strains downloaded from public databases. The phylogenetic analysis of the 16S rRNA sequences, together with the wgMLST profiles and a phylogenomic tree based on comparisons of genome similarity, all supported the clustering of Faecalibacterium strains in different genospecies. Additionally, the global analysis of gene synteny among all strains showed a highly fragmented profile, whereas the intra-cluster analyses revealed larger and more conserved collinear blocks. Finally, ANI analysis substantiated the presence of three distinct clusters-A, B, and C-composed of five, four, and four strains, respectively. The pangenome analysis of each cluster corroborated the classification of these clusters into three distinct species, each containing less variability than that found within the global pangenome of all strains. Here, we propose that comparison of pangenome subsets and their associated α values may be used as an alternative approach, together with ANI, in the in silico classification of new species. Altogether, our results provide evidence not only for the reconsideration of the phylogenetic and genomic relatedness among strains currently assigned to F. prausnitzii , but also the need for lineage (strain-based) differentiation of this taxon to better define how specific members might be associated with positive or negative host interactions.
ANALYSIS OF LOTIC MACROINVERTEBRATE ASSEMBLAGES IN CALIFORNIA'S CENTRAL VALLEY

EPA Science Inventory

Using multivariate and cluster analyses, we examined the relaitonships between chemical and physical characteristics and macroinvertebrate assemblages at sites sampled by R-EMAP in California's Central Valley. By contrasting results where community structure was summarized as met...
Assessing different measures of population-level vaccine protection using a case-control study.

PubMed

Ali, Mohammad; You, Young Ae; Kanungo, Suman; Manna, Byomkesh; Deen, Jacqueline L; Lopez, Anna Lena; Wierzba, Thomas F; Bhattacharya, Sujit K; Sur, Dipika; Clemens, John D

2015-11-27

Case-control studies have not been examined for their utility in assessing population-level vaccine protection in individually randomized trials. We used the data of a randomized, placebo-controlled trial of a cholera vaccine to compare the results of case-control analyses with those of cohort analyses. Cases of cholera were selected from the trial population followed for three years following dosing. For each case, we selected 4 age-matched controls who had not developed cholera. For each case and control, GIS was used to calculate vaccine coverage of individuals in a surrounding "virtual" cluster. Specific selection strategies were used to evaluate the vaccine protective effects. 66,900 out of 108,389 individuals received two doses of the assigned regimen. For direct protection among subjects in low vaccine coverage clusters, we observed 78% (95% CI: 47-91%) protection in a cohort analysis and 84% (95% CI: 60-94%) in case-control analysis after adjusting for confounding factors. Using our GIS-based approach, estimated indirect protection was 52% (95% CI: 10-74%) in cohort and 76% (95% CI: 47-89%) in case control analysis. Estimates of total and overall effectiveness were similar for cohort and case-control analyses. The findings show that case-control analyses of individually randomized vaccine trials may be used to evaluate direct as well as population-level vaccine protection. Copyright © 2015. Published by Elsevier Ltd.
Comparative analysis of prophages in Streptococcus mutans genomes

PubMed Central

Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

2017-01-01

Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986
Effect of wheat flour characteristics on sponge cake quality.

PubMed

Moiraghi, Malena; de la Hera, Esther; Pérez, Gabriela T; Gómez, Manuel

2013-02-01

To select the flour parameters that relate strongly to cake-making performance, in this study the relationship between sponge cake quality, solvent retention capacity (SRC) profile and flour physicochemical characteristics was investigated using 38 soft wheat samples of different origins. Particle size average, protein, damaged starch, water-soluble pentosans, total pentosans, SRC and pasting properties were analysed. Sponge cake volume and crumb texture were measured to evaluate cake quality. Cluster analysis was applied to assess differences in flour quality parameters among wheat lines based on the SRC profile. Cluster 1 showed significantly higher sponge cake volume and crumb softness, finer particle size and lower SRC sucrose, SRC carbonate, SRC water, damaged starch and protein content. Particle size, damaged starch, protein, thickening capacity and SRC parameters correlated negatively with sponge cake volume, while total pentosans and pasting temperature showed the opposite effect. The negative correlations between cake volume and SRC parameters along with the cluster analysis results indicated that flours with smaller particle size, lower absorption capacity and higher pasting temperature had better cake-making performance. Some simple analyses, such as SRC, particle size distribution and pasting properties, may help to choose flours suitable for cake making. Copyright © 2012 Society of Chemical Industry.
Identifying technical aliases in SELDI mass spectra of complex mixtures of proteins

PubMed Central

2013-01-01

Background Biomarker discovery datasets created using mass spectrum protein profiling of complex mixtures of proteins contain many peaks that represent the same protein with different charge states. Correlated variables such as these can confound the statistical analyses of proteomic data. Previously we developed an algorithm that clustered mass spectrum peaks that were biologically or technically correlated. Here we demonstrate an algorithm that clusters correlated technical aliases only. Results In this paper, we propose a preprocessing algorithm that can be used for grouping technical aliases in mass spectrometry protein profiling data. The stringency of the variance allowed for clustering is customizable, thereby affecting the number of peaks that are clustered. Subsequent analysis of the clusters, instead of individual peaks, helps reduce difficulties associated with technically-correlated data, and can aid more efficient biomarker identification. Conclusions This software can be used to pre-process and thereby decrease the complexity of protein profiling proteomics data, thus simplifying the subsequent analysis of biomarkers by decreasing the number of tests. The software is also a practical tool for identifying which features to investigate further by purification, identification and confirmation. PMID:24010718
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma

PubMed Central

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han; Lim, Jing Quan; Huang, Mi Ni; Padmanabhan, Nisha; Nellore, Vishwa; Kongpetch, Sarinya; Ng, Alvin Wei Tian; Ng, Ley Moy; Choo, Su Pin; Myint, Swe Swe; Thanan, Raynoo; Nagarajan, Sanjanaa; Lim, Weng Khong; Ng, Cedric Chuan Young; Boot, Arnoud; Liu, Mo; Ong, Choon Kiat; Rajasegaran, Vikneswari; Lie, Stefanus; Lim, Alvin Soon Tiong; Lim, Tse Hui; Tan, Jing; Loh, Jia Liang; McPherson, John R.; Khuntikeo, Narong; Bhudhisawasdi, Vajaraphongsa; Yongvanit, Puangrat; Wongkham, Sopit; Totoki, Yasushi; Nakamura, Hiromi; Arai, Yasuhito; Yamasaki, Satoshi; Chow, Pierce Kah-Hoe; Chung, Alexander Yaw Fui; Ooi, London Lucien Peng Jin; Lim, Kiat Hon; Dima, Simona; Duda, Dan G.; Popescu, Irinel; Broet, Philippe; Hsieh, Sen-Yung; Yu, Ming-Chin; Scarpa, Aldo; Lai, Jiaming; Luo, Di-Xian; Carvalho, André Lopes; Vettore, André Luiz; Rhee, Hyungjin; Park, Young Nyun; Alexandrov, Ludmil B.; Gordân, Raluca; Rozen, Steven G.; Shibata, Tatsuhiro; Pairojkul, Chawalit; Teh, Bin Tean; Tan, Patrick

2017-01-01

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters – Fluke-Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3′UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation of H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores – mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer. PMID:28667006
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters - Fluke- Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3’UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation ofmore » H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores - mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Lastly, our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.« less
Patterns of gender equality at workplaces and psychological distress.

PubMed

Elwér, Sofia; Harryson, Lisa; Bolin, Malin; Hammarström, Anne

2013-01-01

Research in the field of occupational health often uses a risk factor approach which has been criticized by feminist researchers for not considering the combination of many different variables that are at play simultaneously. To overcome this shortcoming this study aims to identify patterns of gender equality at workplaces and to investigate how these patterns are associated with psychological distress. Questionnaire data from the Northern Swedish Cohort (n = 715) have been analysed and supplemented with register data about the participants' workplaces. The register data were used to create gender equality indicators of women/men ratios of number of employees, educational level, salary and parental leave. Cluster analysis was used to identify patterns of gender equality at the workplaces. Differences in psychological distress between the clusters were analysed by chi-square test and logistic regression analyses, adjusting for individual socio-demographics and previous psychological distress. The cluster analysis resulted in six distinctive clusters with different patterns of gender equality at the workplaces that were associated to psychological distress for women but not for men. For women the highest odds of psychological distress was found on traditionally gender unequal workplaces. The lowest overall occurrence of psychological distress as well as same occurrence for women and men was found on the most gender equal workplaces. The results from this study support the convergence hypothesis as gender equality at the workplace does not only relate to better mental health for women, but also more similar occurrence of mental ill-health between women and men. This study highlights the importance of utilizing a multidimensional view of gender equality to understand its association to health outcomes. Health policies need to consider gender equality at the workplace level as a social determinant of health that is of importance for reducing differences in health outcomes for women and men.
Testing feedback message framing and comparators to address prescribing of high-risk medications in nursing homes: protocol for a pragmatic, factorial, cluster-randomized trial.

PubMed

Ivers, Noah M; Desveaux, Laura; Presseau, Justin; Reis, Catherine; Witteman, Holly O; Taljaard, Monica K; McCleary, Nicola; Thavorn, Kednapa; Grimshaw, Jeremy M

2017-07-14

Audit and feedback (AF) interventions that leverage routine administrative data offer a scalable and relatively low-cost method to improve processes of care. AF interventions are usually designed to highlight discrepancies between desired and actual performance and to encourage recipients to act to address such discrepancies. Comparing to a regional average is a common approach, but more recipients would have a discrepancy if compared to a higher-than-average level of performance. In addition, how recipients perceive and respond to discrepancies may depend on how the feedback itself is framed. We aim to evaluate the effectiveness of different comparators and framing in feedback on high-risk prescribing in nursing homes. This is a pragmatic, 2 × 2 factorial, cluster-randomized controlled trial testing variations in the comparator and framing on the effectiveness of quarterly AF in changing high-risk prescribing in nursing homes in Ontario, Canada. We grouped homes that share physicians into clusters and randomized these clusters into the four experimental conditions. Outcomes will be assessed after 6 months; all primary analyses will be by intention-to-treat. The primary outcome (monthly number of high-risk medications received by each patient) will be analysed using a general linear mixed effects regression model. We will present both four-arm and factorial analyses. With 160 clusters and an average of 350 beds per cluster, assuming no interaction and similar effects for each intervention, we anticipate 90% power to detect an absolute mean difference of 0.3 high-risk medications prescribed. A mixed-methods process evaluation will explore potential mechanisms underlying the observed effects, exploring targeted constructs including intention, self-efficacy, outcome expectations, descriptive norms, and goal prioritization. An economic analysis will examine cost-effectiveness analysis from the perspective of the publicly funded health care system. This protocol describes the rationale and methodology of a trial testing manipulations of theory-informed components of an audit and feedback intervention to determine how to improve an existing intervention and provide generalizable insights for implementation science. NCT02979964.
Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

NASA Astrophysics Data System (ADS)

Žibert, Janez; Pražnikar, Jure

2012-09-01

The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of the diurnal concentration showed that various different day-profiles are presented in a cold period, while this is not the case for the hot season. Additional analysis of ship traffic and rain fall data showed that there is no statistically significant difference between the ship gross (bruto) registered tonnage (BRT) values in the case of BC and PM10 clusters, but that there is statistically significant differences between the rain fall in the BC and PM10 clusters. The wind-rose for clusters which included most days in the sampling period indicating that emitted PM10 and BC from Port of Koper were manly transported in the west direction over the sea and in the east direction, where there is in no populated area. Presented analysis showed that both BC and PM10 concentrations were driven by rain intensity and wind speed.

Spatial cluster analysis of human cases of Crimean Congo hemorrhagic fever reported in Pakistan.

PubMed

Abbas, Tariq; Younus, Muhammad; Muhammad, Sayyad Aun

2015-01-01

Crimean Congo hemorrhagic fever (CCHF) is a tick-borne viral zoonotic disease that has been reported in almost all geographic regions in Pakistan. The aim of this study was to identify spatial clusters of human cases of CCHF reported in country. Kulldorff's spatial scan statisitc, Anselin's Local Moran's I and Getis Ord Gi* tests were applied on data (i.e. number of laboratory confirmed cases reported from each district during year 2013). The analyses revealed a large multi-district cluster of high CCHF incidence in the uplands of Balochistan province near it border with Afghanistan. The cluster comprised the following districts: Qilla Abdullah; Qilla Saifullah; Loralai, Quetta, Sibi, Chagai, and Mastung. Another cluster was detected in Punjab and included Rawalpindi district and a part of Islamabad. We provide empirical evidence of spatial clustering of human CCHF cases in the country. The districts in the clusters should be given priority in surveillance, control programs, and further research.
Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

PubMed

Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

2018-05-01

The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantitative and qualitative analysis of semantic verbal fluency in patients with temporal lobe epilepsy.

PubMed

Jaimes-Bautista, A G; Rodríguez-Camacho, M; Martínez-Juárez, I E; Rodríguez-Agudelo, Y

2017-08-29

Patients with temporal lobe epilepsy (TLE) perform poorly on semantic verbal fluency (SVF) tasks. Completing these tasks successfully involves multiple cognitive processes simultaneously. Therefore, quantitative analysis of SVF (number of correct words in one minute), conducted in most studies, has been found to be insufficient to identify cognitive dysfunction underlying SVF difficulties in TLE. To determine whether a sample of patients with TLE had SVF difficulties compared with a control group (CG), and to identify the cognitive components associated with SVF difficulties using quantitative and qualitative analysis. SVF was evaluated in 25 patients with TLE and 24 healthy controls; the semantic verbal fluency test included 5 semantic categories: animals, fruits, occupations, countries, and verbs. All 5 categories were analysed quantitatively (number of correct words per minute and interval of execution: 0-15, 16-30, 31-45, and 46-60seconds); the categories animals and fruits were also analysed qualitatively (clusters, cluster size, switches, perseverations, and intrusions). Patients generated fewer words for all categories and intervals and fewer clusters and switches for animals and fruits than the CG (P<.01). Differences between groups were not significant in terms of cluster size and number of intrusions and perseverations (P>.05). Our results suggest an association between SVF difficulties in TLE and difficulty activating semantic networks, impaired strategic search, and poor cognitive flexibility. Attention, inhibition, and working memory are preserved in these patients. Copyright © 2017 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
Comparing population structure as inferred from genealogical versus genetic information.

PubMed

Colonna, Vincenza; Nutile, Teresa; Ferrucci, Ronald R; Fardella, Giulio; Aversano, Mario; Barbujani, Guido; Ciullo, Marina

2009-12-01

Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite F(st) between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and F(st) between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when F(st) values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data.
Comparing population structure as inferred from genealogical versus genetic information

PubMed Central

Colonna, Vincenza; Nutile, Teresa; Ferrucci, Ronald R; Fardella, Giulio; Aversano, Mario; Barbujani, Guido; Ciullo, Marina

2009-01-01

Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite Fst between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and Fst between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when Fst values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data. PMID:19550436
Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

NASA Astrophysics Data System (ADS)

Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

2015-03-01

Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.
5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

NASA Technical Reports Server (NTRS)

Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

1989-01-01

The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.
Dynamic secondary ion mass spectroscopy of Au nanoparticles on Si wafer using Bi3+ as primary ion coupled with surface etching by Ar cluster ion beam: The effect of etching conditions on surface structure

NASA Astrophysics Data System (ADS)

Park, Eun Ji; Choi, Chang Min; Kim, Il Hee; Kim, Jung-Hwan; Lee, Gaehang; Jin, Jong Sung; Ganteför, Gerd; Kim, Young Dok; Choi, Myoung Choul

2018-01-01

Wet-chemically synthesized Au nanoparticles were deposited on Si wafer surfaces, and the secondary ions mass spectra (SIMS) from these samples were collected using Bi3+ with an energy of 30 keV as the primary ions. In the SIMS, Au cluster cations with a well-known, even-odd alteration pattern in the signal intensity were observed. We also performed depth profile SIMS analyses, i.e., etching the surface using an Ar gas cluster ion beam (GCIB), and a subsequent Bi3+ SIMS analysis was repetitively performed. Here, two different etching conditions (Ar1600 clusters of 10 keV energy or Ar1000 of 2.5 keV denoted as "harsh" or "soft" etching conditions, respectively) were used. Etching under harsh conditions induced emission of the Au-Si binary cluster cations in the SIMS spectra of the Bi3+ primary ions. The formation of binary cluster cations can be induced by either fragmentation of Au nanoparticles or alloying of Au and Si, increasing Au-Si coordination on the sample surface during harsh GCIB etching. Alternatively, use of the soft GCIB etching conditions resulted in exclusive emission of pure Au cluster cations with nearly no Au-Si cluster cation formation. Depth profile analyses of the Bi3+ SIMS combined with soft GCIB etching can be useful for studying the chemical environments of atoms at the surface without altering the original interface structure during etching.
Trajectories of acute low back pain: a latent class growth analysis.

PubMed

Downie, Aron S; Hancock, Mark J; Rzewuska, Magdalena; Williams, Christopher M; Lin, Chung-Wei Christine; Maher, Christopher G

2016-01-01

Characterising the clinical course of back pain by mean pain scores over time may not adequately reflect the complexity of the clinical course of acute low back pain. We analysed pain scores over 12 weeks for 1585 patients with acute low back pain presenting to primary care to identify distinct pain trajectory groups and baseline patient characteristics associated with membership of each cluster. This was a secondary analysis of the PACE trial that evaluated paracetamol for acute low back pain. Latent class growth analysis determined a 5 cluster model, which comprised 567 (35.8%) patients who recovered by week 2 (cluster 1, rapid pain recovery); 543 (34.3%) patients who recovered by week 12 (cluster 2, pain recovery by week 12); 222 (14.0%) patients whose pain reduced but did not recover (cluster 3, incomplete pain recovery); 167 (10.5%) patients whose pain initially decreased but then increased by week 12 (cluster 4, fluctuating pain); and 86 (5.4%) patients who experienced high-level pain for the whole 12 weeks (cluster 5, persistent high pain). Patients with longer pain duration were more likely to experience delayed recovery or nonrecovery. Belief in greater risk of persistence was associated with nonrecovery, but not delayed recovery. Higher pain intensity, longer duration, and workers' compensation were associated with persistent high pain, whereas older age and increased number of episodes were associated with fluctuating pain. Identification of discrete pain trajectory groups offers the potential to better manage acute low back pain.
Fast gene ontology based clustering for microarray experiments.

PubMed

Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

2008-11-21

Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
Genetic structure of Plasmodium falciparum populations across the Honduras-Nicaragua border

PubMed Central

2013-01-01

Background The Caribbean coast of Central America remains an area of malaria transmission caused by Plasmodium falciparum despite the fact that morbidity has been reduced in recent years. Parasite populations in that region show interesting characteristics such as chloroquine susceptibility and low mortality rates. Genetic structure and diversity of P. falciparum populations in the Honduras-Nicaragua border were analysed in this study. Methods Seven neutral microsatellite loci were analysed in 110 P. falciparum isolates from endemic areas of Honduras (n = 77) and Nicaragua (n = 33), mostly from the border region called the Moskitia. Several analyses concerning the genetic diversity, linkage disequilibrium, population structure, molecular variance, and haplotype clustering were conducted. Results There was a low level of genetic diversity in P. falciparum populations from Honduras and Nicaragua. Expected heterozigosity (He) results were similarly low for both populations. A moderate differentiation was revealed by the FST index between both populations, and two putative clusters were defined through a structure analysis. The main cluster grouped most of samples from Honduras and Nicaragua, while the second cluster was smaller and included all the samples from the Siuna community in Nicaragua. This result could partially explain the stronger linkage disequilibrium (LD) in the parasite population from that country. These findings are congruent with the decreasing rates of malaria endemicity in Central America. PMID:24093629
Sonora exploratory study for the detection of wheat-leaf rust

NASA Technical Reports Server (NTRS)

Payne, R. W. (Principal Investigator)

1980-01-01

The applicability of LANDSAT remote sensing technology to the detection of a wheat-leaf-rust epidemic in Sonora, Mexico, during 1977 was investigated. LANDSAT data acquired during crop years 1975-76 and 1976-77 were clustered, classified, and analyzed in order to detect agricultural changes. Analysis of 1977 data indicates a significant proportion of the identified wheat is stressed (potentially rust-infected). Additional analyses show a significant increase in fallowing during the year, as well as a substantial decrease in reservoir levels in the Sonora agricultural region. Ground observations are required to substantiate these analyses. The possibility exists that heat-rust is not LANDSAT detectable and that the clusters identified as containing stressed signatures represent different varieties of wheat or perhaps nonwheat crops.
Advanced multivariate analysis to assess remediation of hydrocarbons in soils.

PubMed

Lin, Deborah S; Taylor, Peter; Tibbett, Mark

2014-10-01

Accurate monitoring of degradation levels in soils is essential in order to understand and achieve complete degradation of petroleum hydrocarbons in contaminated soils. We aimed to develop the use of multivariate methods for the monitoring of biodegradation of diesel in soils and to determine if diesel contaminated soils could be remediated to a chemical composition similar to that of an uncontaminated soil. An incubation experiment was set up with three contrasting soil types. Each soil was exposed to diesel at varying stages of degradation and then analysed for key hydrocarbons throughout 161 days of incubation. Hydrocarbon distributions were analysed by Principal Coordinate Analysis and similar samples grouped by cluster analysis. Variation and differences between samples were determined using permutational multivariate analysis of variance. It was found that all soils followed trajectories approaching the chemical composition of the unpolluted soil. Some contaminated soils were no longer significantly different to that of uncontaminated soil after 161 days of incubation. The use of cluster analysis allows the assignment of a percentage chemical similarity of a diesel contaminated soil to an uncontaminated soil sample. This will aid in the monitoring of hydrocarbon contaminated sites and the establishment of potential endpoints for successful remediation.
Machine-learned cluster identification in high-dimensional data.

PubMed

Ultsch, Alfred; Lötsch, Jörn

2017-02-01

High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright Â© 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Preliminary Comparisons of the Information Content and Utility of TM Versus MSS Data

NASA Technical Reports Server (NTRS)

Markham, B. L.

1984-01-01

Comparisons were made between subscenes from the first TM scene acquired of the Washington, D.C. area and a MSS scene acquired approximately one year earlier. Three types of analyses were conducted to compare TM and MSS data: a water body analysis, a principal components analysis and a spectral clustering analysis. The water body analysis compared the capability of the TM to the MSS for detecting small uniform targets. Of the 59 ponds located on aerial photographs 34 (58%) were detected by the TM with six commission errors (15%) and 13 (22%) were detected by the MSS with three commission errors (19%). The smallest water body detected by the TM was 16 meters; the smallest detected by the MSS was 40 meters. For the principal components analysis, means and covariance matrices were calculated for each subscene, and principal components images generated and characterized. In the spectral clustering comparison each scene was independently clustered and the clusters were assigned to informational classes. The preliminary comparison indicated that TM data provides enhancements over MSS in terms of (1) small target detection and (2) data dimensionality (even with 4-band data). The extra dimension, partially resultant from TM band 1, appears useful for built-up/non-built-up area separation.
Evolution of coding and non-coding genes in HOX clusters of a marsupial.

PubMed

Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

2012-06-18

The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
Evolution of coding and non-coding genes in HOX clusters of a marsupial

PubMed Central

2012-01-01

Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672
Regional heatwaves in china: a cluster analysis

NASA Astrophysics Data System (ADS)

Wang, Pinya; Tang, Jianping; Wang, Shuyu; Dong, Xinning; Fang, Juan

2018-03-01

With the consideration of spatial extension of heatwave events, two kind of regional heatwaves using absolute and relative thresholds, namely RHWs-A and RHWs-R, are investigated during 1959-2013. The temperature data is derived from the daily maximum temperatures (DMTs) of 587 stations in China. Totally 298 RHWs-A and 374 RHWs-R are identified during the past 55 years, and both of them are growing more frequent since the mid-1980s. By utilizing the cluster analysis, several typical spatial distributions of RHWs-A/RHWs-R are obtained. For RHWs-A, there are three clusters covering the southeastern, northwestern China and the lower reaches of Yangtze River, of which the southeastern cluster groups the most heatwaves. For RHWs-R, there are seven clusters distributed throughout the whole regions of China. The clusters in the northwestern and northeastern China are more stable than others for both RHWs-A and RHWs-R, and the northern clusters are of larger intensity than that of the southern ones. All RHWs-A/RHWs-R are accompanied by the anomalous high systems along with the reduced soil moisture. The southern clusters are controlled by Northwestern Pacific subtropical high (WPSH), and the northern ones are influenced by the mid-latitude high systems. The influences of atmospheric circulations and soil moisture on regional heatwaves are further demonstrated by two case analyses of the severe RHW-A in 2003 and the RHW-R in 2013.
Genetic Interaction Score (S-Score) Calculation, Clustering, and Visualization of Genetic Interaction Profiles for Yeast.

PubMed

Roguev, Assen; Ryan, Colm J; Xu, Jiewei; Colson, Isabelle; Hartsuiker, Edgar; Krogan, Nevan

2018-02-01

This protocol describes computational analysis of genetic interaction screens, ranging from data capture (plate imaging) to downstream analyses. Plate imaging approaches using both digital camera and office flatbed scanners are included, along with a protocol for the extraction of colony size measurements from the resulting images. A commonly used genetic interaction scoring method, calculation of the S-score, is discussed. These methods require minimal computer skills, but some familiarity with MATLAB and Linux/Unix is a plus. Finally, an outline for using clustering and visualization software for analysis of resulting data sets is provided. © 2018 Cold Spring Harbor Laboratory Press.
Intermediate and advanced topics in multilevel logistic regression analysis.

PubMed

Austin, Peter C; Merlo, Juan

2017-09-10

Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Phylogenomic and MALDI-TOF MS Analysis of Streptococcus sinensis HKU4T Reveals a Distinct Phylogenetic Clade in the Genus Streptococcus

PubMed Central

Tse, Herman; Chen, Jonathan H.K.; Tang, Ying; Lau, Susanna K.P.; Woo, Patrick C.Y.

2014-01-01

Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. PMID:25331233
Phylogenomic and MALDI-TOF MS analysis of Streptococcus sinensis HKU4T reveals a distinct phylogenetic clade in the genus Streptococcus.

PubMed

Teng, Jade L L; Huang, Yi; Tse, Herman; Chen, Jonathan H K; Tang, Ying; Lau, Susanna K P; Woo, Patrick C Y

2014-10-20

Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the "sanguinis group." As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the "mitis group." On the basis of the findings, we propose a novel group, named "sinensis group," to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Gas and galaxies in filaments between clusters of galaxies. The study of A399-A401

NASA Astrophysics Data System (ADS)

Bonjean, V.; Aghanim, N.; Salomé, P.; Douspis, M.; Beelen, A.

2018-01-01

We have performed a multi-wavelength analysis of two galaxy cluster systems selected with the thermal Sunyaev-Zel'dovich (tSZ) effect and composed of cluster pairs and an inter-cluster filament. We have focused on one pair of particular interest: A399-A401 at redshift z 0.073 seperated by 3 Mpc. We have also performed the first analysis of one lower-significance newly associated pair: A21-PSZ2 G114.09-34.34 at z 0.094, separated by 4.2 Mpc. We have characterised the intra-cluster gas using the tSZ signal from Planck and, when possible, the galaxy optical and infrared (IR) properties based on two photometric redshift catalogues: 2MPZ and WISExSCOS. From the tSZ data, we measured the gas pressure in the clusters and in the inter-cluster filaments. In the case of A399-A401, the results are in perfect agreement with previous studies and, using the temperature measured from the X-rays, we further estimate the gas density in the filament and find n0 = (4.3 ± 0.7) × 10-4 cm-3. The optical and IR colour-colour and colour-magnitude analyses of the galaxies selected in the cluster system, together with their star formation rate, show no segregation between galaxy populations, both in the clusters and in the filament of A399-A401. Galaxies are all passive, early type, and red and dead. The gas and galaxy properties of this system suggest that the whole system formed at the same time and corresponds to a pre-merger, with a cosmic filament gas heated by the collapse. For the other cluster system, the tSZ analysis was performed and the pressure in the clusters and in the inter-cluster filament was constrained. However, the limited or nonexistent optical and IR data prevent us from concluding on the presence of an actual cosmic filament or from proposing a scenario.
Identifying the regional-scale groundwater-surface water interaction on the Sanjiang Plain, Northeast China.

PubMed

Wang, Xihua; Zhang, Guangxin; Xu, Y Jun; Sun, Guangzhi

2015-11-01

Assessment on the interaction between groundwater and surface water (GW-SW) can generate information that is critical to regional water resource management, especially for regions that are highly dependent on groundwater resources for irrigation. This study investigated such interaction on China's Sanjiang Plain (10.9 × 10(4) km(2)) and produced results to assist sustainable regional water management for intensive agricultural activities. Methods of hierarchical cluster analysis (HCA), principal component analysis (PCA), and statistical analysis were used in this study. One hundred two water samplings (60 from shallow groundwater, 7 from deep groundwater, and 35 from surface water) were collected and grouped into three clusters and seven sub-clusters during the analyses. The PCA analysis identified four principal components of the interaction, which explained 85.9% variance of total database, attributed to the dissolution and evolution of gypsum, feldspar, and other natural minerals in the region that was affected by anthropic and geological (sedimentary rock mineral) activities. The analyses showed that surface water in the upper region of the Sanjiang Plain gained water from local shallow groundwater, indicating that the surface water in the upper region was relatively more resilient to withdrawal for usage, whereas in the middle region, there was only a weak interaction between shallow groundwater and surface water. In the lower region of the Sanjiang Plain, surface water lost water to shallow groundwater, indicating that the groundwater was vulnerable to pollution by pesticides and fertilizers from terrestrial sources.
Choosing appropriate analysis methods for cluster randomised cross-over trials with a binary outcome.

PubMed

Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C

2017-01-30

In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
On the Analysis of Clustering in an Irradiated Low Alloy Reactor Pressure Vessel Steel Weld.

PubMed

Lindgren, Kristina; Stiller, Krystyna; Efsing, Pål; Thuvander, Mattias

2017-04-01

Radiation induced clustering affects the mechanical properties, that is the ductile to brittle transition temperature (DBTT), of reactor pressure vessel (RPV) steel of nuclear power plants. The combination of low Cu and high Ni used in some RPV welds is known to further enhance the DBTT shift during long time operation. In this study, RPV weld samples containing 0.04 at% Cu and 1.6 at% Ni were irradiated to 2.0 and 6.4×1023 n/m2 in the Halden test reactor. Atom probe tomography (APT) was applied to study clustering of Ni, Mn, Si, and Cu. As the clusters are in the nanometer-range, APT is a very suitable technique for this type of study. From APT analyses information about size distribution, number density, and composition of the clusters can be obtained. However, the quantification of these attributes is not trivial. The maximum separation method (MSM) has been used to characterize the clusters and a detailed study about the influence of the choice of MSM cluster parameters, primarily on the cluster number density, has been undertaken.
First evidence of diffuse ultra-steep-spectrum radio emission surrounding the cool core of a cluster

NASA Astrophysics Data System (ADS)

Savini, F.; Bonafede, A.; Brüggen, M.; van Weeren, R.; Brunetti, G.; Intema, H.; Botteon, A.; Shimwell, T.; Wilber, A.; Rafferty, D.; Giacintucci, S.; Cassano, R.; Cuciti, V.; de Gasperin, F.; Röttgering, H.; Hoeft, M.; White, G.

2018-05-01

Diffuse synchrotron radio emission from cosmic-ray electrons is observed at the center of a number of galaxy clusters. These sources can be classified either as giant radio halos, which occur in merging clusters, or as mini halos, which are found only in cool-core clusters. In this paper, we present the first discovery of a cool-core cluster with an associated mini halo that also shows ultra-steep-spectrum emission extending well beyond the core that resembles radio halo emission. The large-scale component is discovered thanks to LOFAR observations at 144 MHz. We also analyse GMRT observations at 610 MHz to characterise the spectrum of the radio emission. An X-ray analysis reveals that the cluster is slightly disturbed, and we suggest that the steep-spectrum radio emission outside the core could be produced by a minor merger that powers electron re-acceleration without disrupting the cool core. This discovery suggests that, under particular circumstances, both a mini and giant halo could co-exist in a single cluster, opening new perspectives for particle acceleration mechanisms in galaxy clusters.
Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

PubMed

Ramón, M; Martínez-Pastor, F

2018-04-23

Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-07-01

In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Gaussian mixture clustering and imputation of microarray data.

PubMed

Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

2004-04-12

In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
DMINDA: an integrated web server for DNA motif identification and analyses.

PubMed

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-07-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
An Empirical Typology of Perfectionism in Academically Talented Children.

ERIC Educational Resources Information Center

Parker, Wayne D.

1997-01-01

A national sample of 820 academically talented children took the Multidimensional Perfectionism Scale. Cluster analyses of scores found a three-cluster solution. Further analyses indicated that these clusters were: nonperfectionistic (32.%), healthy perfectionistic (41.7%), and dysfunctional perfectionistic (25.5%). The construct of perfectionism…
Compositional classification and sedimentological interpretation of the laminated lacustrine sediments at Baumkrichen (Western Austria) using XRF core scanning data

NASA Astrophysics Data System (ADS)

Barrett, Samuel; Tjallingii, Rik; Bloemsma, Menno; Brauer, Achim; Starnberger, Reinhard; Spötl, Christoph; Dulski, Peter

2015-04-01

The outcrop at Baumkirchen (Austria) encloses part of a unique sequence of laminated lacustrine sediments deposited during the last glacial cycle. A ~250m long composite sediment record recovered at this location now continuously covers the periods ~33 to ~45 ka BP (MIS 3) and ~59 to ~73 ka BP (MIS 4), which are separated by a hiatus. The well-laminated (mm-cm scale) and almost entirely clastic sediments reveal alternations of clayey silt and medium silt to very-fine sand layers. Although radiocarbon and optically stimulated luminescence (OSL) dating provide a robust chronology, accurate dating of the sediment laminations appears to be problematic due to very high sedimentation rates (3-8 cm/yr). X-ray fluorescence (XRF) core scanning provided a detailed ~150m long record of compositional changes of the sediments at Baumkirchen. Changes in the sediments are subtle and classification into different facies based on individual elements is therefore subjective. We applied a statistically robust clustering analysis to provide an objective compositional classification without prior knowledge, based on XRF measurements for 15 analysed elements (all those with an acceptable signal-noise ratio: Zr, Sr, Ca, Mn, Cu, Zn, Rb, Ni, Fe, K, Cr, V, Si, Ba, T). The clustering analysis indicates a distinct compositional change between sediments deposited below and above the stratigraphic hiatus, but also differentiates between individual different laminae. Preliminary results suggest variations in the sequence are largely controlled by the relative occurrence of different kinds of sediment represented by different clusters. Three clusters identify well-laminated sediments, visually similar in appearance, each dominated by an anti-correlation between Ca and one or more of the detrital elements K, Zr, Ti, Si and Fe. Two of these clusters occur throughout the entire sequence, one frequently and the other restricted to short sections, while the third occurs almost exclusively below the hiatus, indicating a geochemically distinct component that possibly represents a specific sediment source. In a similar manner, three other clusters identify event layers with different compositions of which two occur exclusively above the hiatus and one exclusively below. The variations in the occurrence of these clusters revealing distinct event layers suggest variations in dominant sediment source both above and below the hiatus and within the section above it. More detailed comparisons between compositional variations of the individual clusters obtained from biplots and microscopic observations on thin sections, grain-size analyses, and mineralogical analyses are needed to further differentiate between sediment sources and transport mechanisms.
Insight on AV-45 binding in white and grey matter from histogram analysis: a study on early Alzheimer's disease patients and healthy subjects

PubMed Central

Nemmi, Federico; Saint-Aubert, Laure; Adel, Djilali; Salabert, Anne-Sophie; Pariente, Jérémie; Barbeau, Emmanuel; Payoux, Pierre; Péran, Patrice

2014-01-01

Purpose AV-45 amyloid biomarker is known to show uptake in white matter in patients with Alzheimer’s disease (AD) but also in healthy population. This binding; thought to be of a non-specific lipophilic nature has not yet been investigated. The aim of this study was to determine the differential pattern of AV-45 binding in healthy and pathological populations in white matter. Methods We recruited 24 patients presenting with AD at early stage and 17 matched, healthy subjects. We used an optimized PET-MRI registration method and an approach based on intensity histogram using several indexes. We compared the results of the intensity histogram analyses with a more canonical approach based on target-to-cerebellum Standard Uptake Value (SUVr) in white and grey matters using MANOVA and discriminant analyses. A cluster analysis on white and grey matter histograms was also performed. Results White matter histogram analysis revealed significant differences between AD and healthy subjects, which were not revealed by SUVr analysis. However, white matter histograms was not decisive to discriminate groups, and indexes based on grey matter only showed better discriminative power than SUVr. The cluster analysis divided our sample in two clusters, showing different uptakes in grey but also in white matter. Conclusion These results demonstrate that AV-45 binding in white matter conveys subtle information not detectable using SUVr approach. Although it is not better than standard SUVr to discriminate AD patients from healthy subjects, this information could reveal white matter modifications. PMID:24573658
Analyses of amplified fragment length polymorphisms (AFLP) indicate rapid radiation of Diospyros species (Ebenaceae) endemic to New Caledonia

PubMed Central

2013-01-01

Background Radiation in some plant groups has occurred on islands and due to the characteristic rapid pace of phenotypic evolution, standard molecular markers often provide insufficient variation for phylogenetic reconstruction. To resolve relationships within a clade of 21 closely related New Caledonian Diospyros species and evaluate species boundaries we analysed genome-wide DNA variation via amplified fragment length polymorphisms (AFLP). Results A neighbour-joining (NJ) dendrogram based on Dice distances shows all species except D. minimifolia, D. parviflora and D. vieillardii to form unique clusters of genetically similar accessions. However, there was little variation between these species clusters, resulting in unresolved species relationships and a star-like general NJ topology. Correspondingly, analyses of molecular variance showed more variation within species than between them. A Bayesian analysis with BEAST produced a similar result. Another Bayesian method, this time a clustering method, Structure, demonstrated the presence of two groups, highly congruent with those observed in a principal coordinate analysis (PCO). Molecular divergence between the two groups is low and does not correspond to any hypothesised taxonomic, ecological or geographical patterns. Conclusions We hypothesise that such a pattern could have been produced by rapid and complex evolution involving a widespread progenitor for which an initial split into two groups was followed by subsequent fragmentation into many diverging populations, which was followed by range expansion of then divergent entities. Overall, this process resulted in an opportunistic pattern of phenotypic diversification. The time since divergence was probably insufficient for some species to become genetically well-differentiated, resulting in progenitor/derivative relationships being exhibited in a few cases. In other cases, our analyses may have revealed evidence for the existence of cryptic species, for which more study of morphology and ecology are now required. PMID:24330478
Identifying contextual influences of community reintegration among injured servicemembers.

PubMed

Hawkins, Brent L; McGuire, Francis A; Britt, Thomas W; Linder, Sandra M

2015-01-01

Research suggests that community reintegration (CR) after injury and rehabilitation is difficult for many injured servicemembers. However, little is known about the influence of the contextual factors, both personal and environmental, that influence CR. Framed within the International Classification of Functioning, Disability and Health and Social Cognitive Theory, the quantitative portion of a larger mixed-methods study of 51 injured, community-dwelling servicemembers compared the relative contribution of contextual factors between groups of servicemembers with different levels of CR. Cluster analysis indicated three groups of servicemembers showing low, moderate, and high levels of CR. Statistical analyses identified contextual factors (e.g., personal and environmental factors) that significantly discriminated between CR clusters. Multivariate analysis of variance and discriminant analysis indicated significant contributions of general self-efficacy, services and assistance barriers, physical and structural barriers, attitudes and support barriers, perceived level of disability and/or handicap, work and school barriers, and policy barriers on CR scores. Overall, analyses indicated that injured servicemembers with lower CR scores had lower general self-efficacy scores, reported more difficulty with environmental barriers, and reported their injuries as more disabling.
Clustering of dietary intake and sedentary behavior in 2-year-old children.

PubMed

Gubbels, Jessica S; Kremers, Stef P J; Stafleu, Annette; Dagnelie, Pieter C; de Vries, Sanne I; de Vries, Nanne K; Thijs, Carel

2009-08-01

To examine clustering of energy balance-related behaviors (EBRBs) in young children. This is crucial because lifestyle habits are formed at an early age and track in later life. This study is the first to examine EBRB clustering in children as young as 2 years. Cross-sectional data originated from the Child, Parent and Health: Lifestyle and Genetic Constitution (KOALA) Birth Cohort Study. Parents of 2578 2-year-old children completed a questionnaire. Correlation analyses, principal component analyses, and linear regression analyses were performed to examine clustering of EBRBs. We found modest but consistent correlations in EBRBs. Two clusters emerged: a "sedentary-snacking cluster" and a "fiber cluster." Television viewing clustered with computer use and unhealthy dietary behaviors. Children who frequently consumed vegetables also consumed fruit and brown bread more often and white bread less often. Lower maternal education and maternal obesity were associated with high scores on the sedentary-snacking cluster, whereas higher educational level was associated with high fiber cluster scores. Obesity-prone behavioral clusters are already visible in 2-year-old children and are related to maternal characteristics. The findings suggest that obesity prevention should apply an integrated approach to physical activity and dietary intake in early childhood.
Psychosocial Clusters and their Associations with Well-Being and Health: An Empirical Strategy for Identifying Psychosocial Predictors Most Relevant to Racially/Ethnically Diverse Women’s Health

PubMed Central

Jabson, Jennifer M.; Bowen, Deborah; Weinberg, Janice; Kroenke, Candyce; Luo, Juhua; Messina, Catherine; Shumaker, Sally; Tindle, Hilary A.

2016-01-01

BACKGROUND Strategies for identifying the most relevant psychosocial predictors in studies of racial/ethnic minority women’s health are limited because they largely exclude cultural influences and they assume that psychosocial predictors are independent. This paper proposes and tests an empirical solution. METHODS Hierarchical cluster analysis, conducted with data from 140,652 Women’s Health Initiative participants, identified clusters among individual psychosocial predictors. Multivariable analyses tested associations between clusters and health outcomes. RESULTS A Social Cluster and a Stress Cluster were identified. The Social Cluster was positively associated with well-being and inversely associated with chronic disease index, and the Stress Cluster was inversely associated with well-being and positively associated with chronic disease index. As hypothesized, the magnitude of association between clusters and outcomes differed by race/ethnicity. CONCLUSIONS By identifying psychosocial clusters and their associations with health, we have taken an important step toward understanding how individual psychosocial predictors interrelate and how empirically formed Stress and Social clusters relate to health outcomes. This study has also demonstrated important insight about differences in associations between these psychosocial clusters and health among racial/ethnic minorities. These differences could signal the best pathways for intervention modification and tailoring. PMID:27279761
Spatial Analysis of Great Lakes Regional Icing Cloud Liquid Water Content

NASA Technical Reports Server (NTRS)

Ryerson, Charles C.; Koenig, George G.; Melloh, Rae A.; Meese, Debra A.; Reehorst, Andrew L.; Miller, Dean R.

2003-01-01

Abstract Clustering of cloud microphysical conditions, such as liquid water content (LWC) and drop size, can affect the rate and shape of ice accretion and the airworthiness of aircraft. Clustering may also degrade the accuracy of cloud LWC measurements from radars and microwave radiometers being developed by the government for remotely mapping icing conditions ahead of aircraft in flight. This paper evaluates spatial clustering of LWC in icing clouds using measurements collected during NASA research flights in the Great Lakes region. We used graphical and analytical approaches to describe clustering. The analytical approach involves determining the average size of clusters and computing a clustering intensity parameter. We analyzed flight data composed of 1-s-frequency LWC measurements for 12 periods ranging from 17.4 minutes (73 km) to 45.3 minutes (190 km) in duration. Graphically some flight segments showed evidence of consistency with regard to clustering patterns. Cluster intensity varied from 0.06, indicating little clustering, to a high of 2.42. Cluster lengths ranged from 0.1 minutes (0.6 km) to 4.1 minutes (17.3 km). Additional analyses will allow us to determine if clustering climatologies can be developed to characterize cluster conditions by region, time period, or weather condition. Introduction
Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust.

PubMed

Cun, Yupeng; Yang, Tsun-Po; Achter, Viktor; Lang, Ulrich; Peifer, Martin

2018-06-01

The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking <10 min. Sclust is designed such that even non-experts in computational biology or bioinformatics with basic knowledge of the Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.

Low Divergence of Clonorchis sinensis in China Based on Multilocus Analysis

PubMed Central

Sun, Jiufeng; Huang, Yan; Huang, Huaiqiu; Liang, Pei; Wang, Xiaoyun; Mao, Qiang; Men, Jingtao; Chen, Wenjun; Deng, Chuanhuan; Zhou, Chenhui; Lv, Xiaoli; Zhou, Juanjuan; Zhang, Fan; Li, Ran; Tian, Yanli; Lei, Huali; Liang, Chi; Hu, Xuchu; Xu, Jin; Li, Xuerong; XinbingYu

2013-01-01

Clonorchis sinensis, an ancient parasite that infects a number of piscivorous mammals, attracts significant public health interest due to zoonotic exposure risks in Asia. The available studies are insufficient to reflect the prevalence, geographic distribution, and intraspecific genetic diversity of C. sinensis in endemic areas. Here, a multilocus analysis based on eight genes (ITS1, act, tub, ef-1a, cox1, cox3, nad4 and nad5 [4.986 kb]) was employed to explore the intra-species genetic construction of C. sinensis in China. Two hundred and fifty-six C. sinensis isolates were obtained from environmental reservoirs from 17 provinces of China. A total of 254 recognized Multilocus Types (MSTs) showed high diversity among these isolates using multilocus analysis. The comparison analysis of nuclear and mitochondrial phylogeny supports separate clusters in a nuclear dendrogram. Genetic differentiation analysis of three clusters (A, B, and C) showed low divergence within populations. Most isolates from clusters B and C are geographically limited to central China, while cluster A is extraordinarily genetically diverse. Further genetic analyses between different geographic distributions, water bodies and hosts support the low population divergence. The latter haplotype analyses were consistent with the phylogenetic and genetic differentiation results. A recombination network based on concatenated sequences showed a concentrated linkage recombination population in cox1, cox3, nad4 and nad5, with spatial structuring in ITS1. Coupled with the history record and archaeological evidence of C. sinensis infection in mummified desiccated feces, these data point to an ancient origin of C. sinensis in China. In conclusion, we present a likely phylogenetic structure of the C. sinensis population in mainland China, highlighting its possible tendency for biogeographic expansion. Meanwhile, ITS1 was found to be an effective marker for tracking C. sinensis infection worldwide. Thus, the present study improves our understanding of the global epidemiology and evolution of C. sinensis. PMID:23825605
Identifying Two Groups of Entitled Individuals: Cluster Analysis Reveals Emotional Stability and Self-Esteem Distinction.

PubMed

Crowe, Michael L; LoPilato, Alexander C; Campbell, W Keith; Miller, Joshua D

2016-12-01

The present study hypothesized that there exist two distinct groups of entitled individuals: grandiose-entitled, and vulnerable-entitled. Self-report scores of entitlement were collected for 916 individuals using an online platform. Model-based cluster analyses were conducted on the individuals with scores one standard deviation above mean (n = 159) using the five-factor model dimensions as clustering variables. The results support the existence of two groups of entitled individuals categorized as emotionally stable and emotionally vulnerable. The emotionally stable cluster reported emotional stability, high self-esteem, more positive affect, and antisocial behavior. The emotionally vulnerable cluster reported low self-esteem and high levels of neuroticism, disinhibition, conventionality, psychopathy, negative affect, childhood abuse, intrusive parenting, and attachment difficulties. Compared to the control group, both clusters reported being more antagonistic, extraverted, Machiavellian, and narcissistic. These results suggest important differences are missed when simply examining the linear relationships between entitlement and various aspects of its nomological network.
Changing the paradigm: messages for hand hygiene education and audit from cluster analysis.

PubMed

Gould, D J; Navaie, D; Purssell, E; Drey, N S; Creedon, S

2018-04-01

Hand hygiene is considered to be the foremost infection prevention measure. How healthcare workers accept and make sense of the hand hygiene message is likely to contribute to the success and sustainability of initiatives to improve performance, which is often poor. A survey of nurses in critical care units in three National Health Service trusts in England was undertaken to explore opinions about hand hygiene, use of alcohol hand rubs, audit with performance feedback, and other key hand-hygiene-related issues. Data were analysed descriptively and subjected to cluster analysis. Three main clusters of opinion were visualized, each forming a significant group: positive attitudes, pragmatism and scepticism. A smaller cluster suggested possible guilt about ability to perform hand hygiene. Cluster analysis identified previously unsuspected constellations of beliefs about hand hygiene that offer a plausible explanation for behaviour. Healthcare workers might respond to education and audit differently according to these beliefs. Those holding predominantly positive opinions might comply with hand hygiene policy and perform well as infection prevention link nurses and champions. Those holding pragmatic attitudes are likely to respond favourably to the need for professional behaviour and need to protect themselves from infection. Greater persuasion may be needed to encourage those who are sceptical about the importance of hand hygiene to comply with guidelines. Interventions to increase compliance should be sufficiently broad in scope to tackle different beliefs. Alternatively, cluster analysis of hand hygiene beliefs could be used to identify the most effective educational and monitoring strategies for a particular clinical setting. Copyright © 2017 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
East Greenland and Barents Sea polar bears (Ursus maritimus): adaptive variation between two populations using skull morphometrics as an indicator of environmental and genetic differences.

PubMed

Pertoldi, Cino; Sonne, Christian; Wiig, Øystein; Baagøe, Hans J; Loeschcke, Volker; Bechshøft, Thea Østergaard

2012-06-01

A morphometric study was conducted on four skull traits of 37 male and 18 female adult East Greenland polar bears (Ursus maritimus) collected 1892-1968, and on 54 male and 44 female adult Barents Sea polar bears collected 1950-1969. The aim was to compare differences in size and shape of the bear skulls using a multivariate approach, characterizing the variation between the two populations using morphometric traits as an indicator of environmental and genetic differences. Mixture analysis testing for geographic differentiation within each population revealed three clusters for Barents Sea males and three clusters for Barents Sea females. East Greenland consisted of one female and one male cluster. A principal component analysis (PCA) conducted on the clusters defined by the mixture analysis, showed that East Greenland and Barents Sea polar bear populations overlapped to a large degree, especially with regards to females. Multivariate analyses of variance (MANOVA) showed no significant differences in morphometric means between the two populations, but differences were detected between clusters from each respective geographic locality. To estimate the importance of genetics and environment in the morphometric differences between the bears, a PCA was performed on the covariance matrix derived from the skull measurements. Skull trait size (PC1) explained approx. 80% of the morphometric variation, whereas shape (PC2) defined approx. 15%, indicating some genetic differentiation. Hence, both environmental and genetic factors seem to have contributed to the observed skull differences between the two populations. Overall, results indicate that many Barents Sea polar bears are morphometrically similar to the East Greenland ones, suggesting an exchange of individuals between the two populations. Furthermore, a subpopulation structure in the Barents Sea population was also indicated from the present analyses, which should be considered with regards to future management decisions. © 2012 The Authors.
Spatial Analysis of the Human Immunodeficiency Virus Epidemic among Men Who Have Sex with Men in China, 2006-2015.

PubMed

Qin, Qianqian; Guo, Wei; Tang, Weiming; Mahapatra, Tanmay; Wang, Liyan; Zhang, Nanci; Ding, Zhengwei; Cai, Chang; Cui, Yan; Sun, Jiangping

2017-04-01

Studies have shown a recent upsurge in human immunodeficiency virus (HIV) burden among men who have sex with men (MSM) in China, especially in urban areas. For intervention planning and resource allocation, spatial analyses of HIV/AIDS case-clusters were required to identify epidemic foci and trends among MSM in China. Information regarding MSM recorded as HIV/AIDS cases during 2006-2015 were extracted from the National Case Reporting System. Demographic trends were determined through Cochran-Armitage trend tests. Distribution of case-clusters was examined using spatial autocorrelation. Spatial-temporal scan was used to detect disease clustering. Spatial correlations between cases and socioenvironmental factors were determined by spatial regression. Between 2006 and 2015, in China, 120 371 HIV/AIDS cases were identified among MSM. Newly identified HIV/AIDS cases among self-reported MSM increased from 487 cases in 2006 to >30 000 cases in 2015. Among those HIV/AIDS cases recorded during 2006-2015, 47.0% were 20-29 years old and 24.9% were aged 30-39 years. Based on clusters of HIV/AIDS cases identified through spatial analysis, the epidemic was concentrated among MSM in large cities. Spatial-temporal clusters contained municipalities, provincial capitals, and main cities such as Beijing, Shanghai, Chongqing, Chengdu, and Guangzhou. Spatial regression analysis showed that sociodemographic indicators such as population density, per capita gross domestic product, and number of county-level medical institutions had statistically significant positive correlations with HIV/AIDS among MSM. Assorted spatial analyses revealed an increasingly concentrated HIV epidemic among young MSM in Chinese cities, calling for targeted health education and intensive interventions at an early age. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.
Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps

PubMed Central

2011-01-01

Background Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering. Results The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions. Conclusions The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources. PMID:21569575
[Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

PubMed

Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

2015-12-01

We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.
Development of an automated energy audit protocol for office buildings

NASA Astrophysics Data System (ADS)

Deb, Chirag

This study aims to enhance the building energy audit process, and bring about reduction in time and cost requirements in the conduction of a full physical audit. For this, a total of 5 Energy Service Companies in Singapore have collaborated and provided energy audit reports for 62 office buildings. Several statistical techniques are adopted to analyse these reports. These techniques comprise cluster analysis and development of prediction models to predict energy savings for buildings. The cluster analysis shows that there are 3 clusters of buildings experiencing different levels of energy savings. To understand the effect of building variables on the change in EUI, a robust iterative process for selecting the appropriate variables is developed. The results show that the 4 variables of GFA, non-air-conditioning energy consumption, average chiller plant efficiency and installed capacity of chillers should be taken for clustering. This analysis is extended to the development of prediction models using linear regression and artificial neural networks (ANN). An exhaustive variable selection algorithm is developed to select the input variables for the two energy saving prediction models. The results show that the ANN prediction model can predict the energy saving potential of a given building with an accuracy of +/-14.8%.
Evidence of new species for malaria vector Anopheles nuneztovari sensu lato in the Brazilian Amazon region.

PubMed

Scarpassa, Vera Margarete; Cunha-Machado, Antonio Saulo; Saraiva, José Ferreira

2016-04-12

Anopheles nuneztovari sensu lato comprises cryptic species in northern South America, and the Brazilian populations encompass distinct genetic lineages within the Brazilian Amazon region. This study investigated, based on two molecular markers, whether these lineages might actually deserve species status. Specimens were collected in five localities of the Brazilian Amazon, including Manaus, Careiro Castanho and Autazes, in the State of Amazonas; Tucuruí, in the State of Pará; and Abacate da Pedreira, in the State of Amapá, and analysed for the COI gene (Barcode region) and 12 microsatellite loci. Phylogenetic analyses were performed using the maximum likelihood (ML) approach. Intra and inter samples genetic diversity were estimated using population genetics analyses, and the genetic groups were identified by means of the ML, Bayesian and factorial correspondence analyses and the Bayesian analysis of population structure. The Barcode region dataset (N = 103) generated 27 haplotypes. The haplotype network suggested three lineages. The ML tree retrieved five monophyletic groups. Group I clustered all specimens from Manaus and Careiro Castanho, the majority of Autazes and a few from Abacate da Pedreira. Group II clustered most of the specimens from Abacate da Pedreira and a few from Autazes and Tucuruí. Group III clustered only specimens from Tucuruí (lineage III), strongly supported (97 %). Groups IV and V clustered specimens of A. nuneztovari s.s. and A. dunhami, strongly (98 %) and weakly (70 %) supported, respectively. In the second phylogenetic analysis, the sequences from GenBank, identified as A. goeldii, clustered to groups I and II, but not to group III. Genetic distances (Kimura-2 parameters) among the groups ranged from 1.60 % (between I and II) to 2.32 % (between I and III). Microsatellite data revealed very high intra-population genetic variability. Genetic distances showed the highest and significant values (P = 0.005) between Tucuruí and all the other samples, and between Abacate da Pedreira and all the other samples. Genetic distances, Bayesian (Structure and BAPS) analyses and FCA suggested three distinct biological groups, supporting the barcode region results. The two markers revealed three genetic lineages for A. nuneztovari s.l. in the Brazilian Amazon region. Lineages I and II may represent genetically distinct groups or species within A. goeldii. Lineage III may represent a new species, distinct from the A. goeldii group, and may be the most ancestral in the Brazilian Amazon. They may have differences in Plasmodium susceptibility and should therefore be investigated further.
Genetic Markers Analyses and Bioinformatic Approaches to Distinguish Between Olive Tree (Olea europaea L.) Cultivars.

PubMed

Ben Ayed, Rayda; Ben Hassen, Hanen; Ennouri, Karim; Rebai, Ahmed

2016-12-01

The genetic diversity of 22 olive tree cultivars (Olea europaea L.) sampled from different Mediterranean countries was assessed using 5 SNP markers (FAD2.1; FAD2.3; CALC; SOD and ANTHO3) located in four different genes. The genotyping analysis of the 22 cultivars with 5 SNP loci revealed 11 alleles (average 2.2 per allele). The dendrogram based on cultivar genotypes revealed three clusters consistent with the cultivars classification. Besides, the results obtained with the five SNPs were compared to those obtained with the SSR markers using bioinformatic analyses and by computing a cophenetic correlation coefficient, indicating the usefulness of the UPGMA method for clustering plant genotypes. Based on principal coordinate analysis using a similarity matrix, the first two coordinates, revealed 54.94 % of the total variance. This work provides a more comprehensive explanation of the diversity available in Tunisia olive cultivars, and an important contribution for olive breeding and olive oil authenticity.
Study of quantitative and qualitative variations in essential oils of Sicilian Rosmarinus officinalis L.

PubMed

Tuttolomondo, Teresa; Dugo, Giacomo; Ruberto, Giuseppe; Leto, Claudio; Napoli, Edoardo M; Cicero, Nicola; Gervasi, Teresa; Virga, Giuseppe; Leone, Raffaele; Licata, Mario; La Bella, Salvatore

2015-01-01

In this study the chemical characterisation of 10 Sicilian Rosmarinus officinalis L. biotypes essential oils is reported. The main goal of this work was to analyse the relationship between the essential oils yield and the geographical distribution of the species plants. The essential oils were analysed by GC-FID and GC-MS. Hierarchical cluster analysis and principal component analysis statistical methods were used to cluster biotypes according to the essential oils chemical composition. The essential oil yield ranged from 0.8 to 2.3 (v/w). In total 82 compounds have been identified, these represent 96.7-99.9% of the essential oil. The most represented compounds in the essential oils were 1.8-cineole, linalool, α-terpineol, verbenone, α-pinene, limonene, bornyl acetate and terpinolene. The results show that the essential oil yield of the 10 biotypes is affected by the environmental characteristics of the sampling sites while the chemical composition is linked to the genetic characteristics of different biotypes.
Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

PubMed

Meghani, Salimah H; Knafl, George J

2017-02-10

To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern in using analgesia for cancer pain. Addressing these concerns, perhaps through real time clinical assessments, may improve patients' analgesic adherence patterns and cancer pain outcomes.
Highly dynamically evolved intermediate-age open clusters

NASA Astrophysics Data System (ADS)

Piatti, Andrés E.; Dias, Wilton S.; Sampedro, Laura M.

2017-04-01

We present a comprehensive UBVRI and Washington CT1T2 photometric analysis of seven catalogued open clusters, namely: Ruprecht 3, 9, 37, 74, 150, ESO 324-15 and 436-2. The multiband photometric data sets in combination with 2MASS photometry and Gaia astrometry for the brighter stars were used to estimate their structural parameters and fundamental astrophysical properties. We found that Ruprecht 3 and ESO 436-2 do not show self-consistent evidence of being physical systems. The remained studied objects are open clusters of intermediate age (9.0 ≤ log(t yr-1) ≤ 9.6), of relatively small size (rcls ˜ 0.4-1.3 pc) and placed between 0.6 and 2.9 kpc from the Sun. We analysed the relationships between core, half-mass, tidal and Jacoby radii as well as half-mass relaxation times to conclude that the studied clusters are in an evolved dynamical stage. The total cluster masses obtained by summing those of the observed cluster stars resulted to be ˜10-15 per cent of the masses of open clusters of similar age located closer than 2 kpc from the Sun. We found that cluster stars occupy volumes as large as those for tidally filled clusters.
Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

PubMed

Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

2015-11-04

There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.
Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

PubMed

Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

2002-12-15

We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.
Deriving temperature, mass, and age of evolved stars from high-resolution spectra. Application to field stars and the open cluster IC 4651

NASA Astrophysics Data System (ADS)

Biazzo, K.; Pasquini, L.; Girardi, L.; Frasca, A.; da Silva, L.; Setiawan, J.; Marilli, E.; Hatzes, A. P.; Catalano, S.

2007-12-01

Aims:We test our capability of deriving stellar physical parameters of giant stars by analysing a sample of field stars and the well studied open cluster IC 4651 with different spectroscopic methods. Methods: The use of a technique based on line-depth ratios (LDRs) allows us to determine with high precision the effective temperature of the stars and to compare the results with those obtained with a classical LTE abundance analysis. Results: (i) For the field stars we find that the temperatures derived by means of the LDR method are in excellent agreement with those found by the spectral synthesis. This result is extremely encouraging because it shows that spectra can be used to firmly derive population characteristics (e.g., mass and age) of the observed stars. (ii) For the IC 4651 stars we use the determined effective temperature to derive the following results. a) The reddening E(B-V) of the cluster is 0.12±0.02, largely independent of the color-temperature calibration used. b) The age of the cluster is 1.2±0.2 Gyr. c) The typical mass of the analysed giant stars is 2.0±0.2~M⊙. Moreover, we find a systematic difference of about 0.2 dex in log g between spectroscopic and evolutionary values. Conclusions: We conclude that, in spite of known limitations, a classical spectroscopic analysis of giant stars may indeed result in very reliable stellar parameters. We caution that the quality of the agreement, on the other hand, depends on the details of the adopted spectroscopic analysis. Based on observations collected at the ESO telescopes at the Paranal and La Silla Observatories, Chile.
Clustering of lifestyle risk behaviours among residents of forty deprived neighbourhoods in London: lessons for targeting public health interventions.

PubMed

Watts, P; Buck, D; Netuveli, G; Renton, A

2016-06-01

Clustering of lifestyle risk behaviours is very important in predicting premature mortality. Understanding the extent to which risk behaviours are clustered in deprived communities is vital to most effectively target public health interventions. We examined co-occurrence and associations between risk behaviours (smoking, alcohol consumption, poor diet, low physical activity and high sedentary time) reported by adults living in deprived London neighbourhoods. Associations between sociodemographic characteristics and clustered risk behaviours were examined. Latent class analysis was used to identify underlying clustering of behaviours. Over 90% of respondents reported at least one risk behaviour. Reporting specific risk behaviours predicted reporting of further risk behaviours. Latent class analyses revealed four underlying classes. Membership of a maximal risk behaviour class was more likely for young, white males who were unable to work. Compared with recent national level analysis, there was a weaker relationship between education and clustering of behaviours and a very high prevalence of clustering of risk behaviours in those unable to work. Young, white men who report difficulty managing on income were at high risk of reporting multiple risk behaviours. These groups may be an important target for interventions to reduce premature mortality caused by multiple risk behaviours. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Modifiable lifestyle behavior patterns, sedentary time and physical activity contexts: a cluster analysis among middle school boys and girls in the SALTA study.

PubMed

Marques, Elisa A; Pizarro, Andreia N; Figueiredo, Pedro; Mota, Jorge; Santos, Maria P

2013-06-01

To analyze how modifiable health-related variables are clustered and associated with children's participation in play, active travel and structured exercise and sport among boys and girls. Data were collected from 9 middle-schools in Porto (Portugal) area. A total of 636 children in the 6th grade (340 girls and 296 boys) with a mean age of 11.64 years old participated in the study. Cluster analyses were used to identify patterns of lifestyle and healthy/unhealthy behaviors. Multinomial logistic regression analysis was used to estimate associations between cluster allocation, sedentary time and participation in three different physical activity (PA) contexts: play, active travel, and structured exercise/sport. Four distinct clusters were identified based on four lifestyle risk factors. The most disadvantaged cluster was characterized by high body mass index, low high-density lipoprotein cholesterol and cardiorespiratory fitness and a moderate level of moderate to vigorous PA. Everyday outdoor play (OR=1.85, 95%CI 0.318-0.915) and structured exercise/sport (OR=1.85, 95%CI 0.291-0.990) were associated with healthier lifestyle patterns. There were no significant associations between health patterns and sedentary time or travel mode. Outdoor play and sport/exercise participation seem more important than active travel from school in influencing children's healthy cluster profiles. Copyright © 2013 Elsevier Inc. All rights reserved.
Towards Tunable Consensus Clustering for Studying Functional Brain Connectivity During Affective Processing.

PubMed

Liu, Chao; Abu-Jamous, Basel; Brattico, Elvira; Nandi, Asoke K

2017-03-01

In the past decades, neuroimaging of humans has gained a position of status within neuroscience, and data-driven approaches and functional connectivity analyses of functional magnetic resonance imaging (fMRI) data are increasingly favored to depict the complex architecture of human brains. However, the reliability of these findings is jeopardized by too many analysis methods and sometimes too few samples used, which leads to discord among researchers. We propose a tunable consensus clustering paradigm that aims at overcoming the clustering methods selection problem as well as reliability issues in neuroimaging by means of first applying several analysis methods (three in this study) on multiple datasets and then integrating the clustering results. To validate the method, we applied it to a complex fMRI experiment involving affective processing of hundreds of music clips. We found that brain structures related to visual, reward, and auditory processing have intrinsic spatial patterns of coherent neuroactivity during affective processing. The comparisons between the results obtained from our method and those from each individual clustering algorithm demonstrate that our paradigm has notable advantages over traditional single clustering algorithms in being able to evidence robust connectivity patterns even with complex neuroimaging data involving a variety of stimuli and affective evaluations of them. The consensus clustering method is implemented in the R package "UNCLES" available on http://cran.r-project.org/web/packages/UNCLES/index.html .
Suicide methods in children and adolescents.

PubMed

Kõlves, Kairi; de Leo, Diego

2017-02-01

There are notable differences in suicide methods between countries. The aim of this paper is to analyse and describe suicide methods in children and adolescents aged 10-19 years in different countries/territories worldwide. Suicide data by ICD-10 X codes were obtained from the WHO Mortality Database and population data from the World Bank. In total, 101 countries or territories, have data at least for 5 years in 2000-2009. Cluster analysis by suicide methods was performed for countries/territories with at least 10 suicide cases separately by gender (74 for males and 71 for females) in 2000-2009. The most frequent suicide method was hanging, followed by poisoning by pesticides for females and firearms for males. Cluster analyses of similarities in the country/territory level suicide method patterns by gender identified four clusters for both gender. Hanging and poisoning by pesticides defined the clusters of countries/territories by their suicide patterns in youth for both genders. In addition, a mixed method and a jumping from height cluster were identified for females and two mixed method clusters for males. A number of geographical similarities were observed. Overall, the patterns of suicide methods in children and adolescents reflect lethality, availability and acceptability of suicide means similarly to country specific patterns of all ages. Means restriction has very good potential in preventing youth suicides in different countries. It is also crucial to consider cognitive availability influenced by sensationalised media reporting and/or provision of technical details about specific methods.

Early Environment and Neurobehavioral Development Predict Adult Temperament Clusters

PubMed Central

Congdon, Eliza; Service, Susan; Wessman, Jaana; Seppänen, Jouni K.; Schönauer, Stefan; Miettunen, Jouko; Turunen, Hannu; Koiranen, Markku; Joukamaa, Matti; Järvelin, Marjo-Riitta; Veijola, Juha; Mannila, Heikki; Paunio, Tiina; Freimer, Nelson B.

2012-01-01

Background Investigation of the environmental influences on human behavioral phenotypes is important for our understanding of the causation of psychiatric disorders. However, there are complexities associated with the assessment of environmental influences on behavior. Methods/Principal Findings We conducted a series of analyses using a prospective, longitudinal study of a nationally representative birth cohort from Finland (the Northern Finland 1966 Birth Cohort). Participants included a total of 3,761 male and female cohort members who were living in Finland at the age of 16 years and who had complete temperament scores. Our initial analyses (Wessman et al., in press) provide evidence in support of four stable and robust temperament clusters. Using these temperament clusters, as well as independent temperament dimensions for comparison, we conducted a data-driven analysis to assess the influence of a broad set of life course measures, assessed pre-natally, in infancy, and during adolescence, on adult temperament. Results Measures of early environment, neurobehavioral development, and adolescent behavior significantly predict adult temperament, classified by both cluster membership and temperament dimensions. Specifically, our results suggest that a relatively consistent set of life course measures are associated with adult temperament profiles, including maternal education, characteristics of the family’s location and residence, adolescent academic performance, and adolescent smoking. Conclusions Our finding that a consistent set of life course measures predict temperament clusters indicate that these clusters represent distinct developmental temperament trajectories and that information about a subset of life course measures has implications for adult health outcomes. PMID:22815688
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters - Fluke- Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3’UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation ofmore » H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores - mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Lastly, our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.« less
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma

DOE PAGES

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han; ...

2017-06-30

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters - Fluke- Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3’UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation ofmore » H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores - mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Lastly, our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.« less
Symptom clusters predict mortality among dialysis patients in Norway: a prospective observational cohort study.

PubMed

Amro, Amin; Waldum, Bård; von der Lippe, Nanna; Brekke, Fredrik Barth; Dammen, Toril; Miaskowski, Christine; Os, Ingrid

2015-01-01

Patients with end-stage renal disease on dialysis have reduced survival rates compared with the general population. Symptoms are frequent in dialysis patients, and a symptom cluster is defined as two or more related co-occurring symptoms. The aim of this study was to explore the associations between symptom clusters and mortality in dialysis patients. In a prospective observational cohort study of dialysis patients (n = 301), Kidney Disease and Quality of Life Short Form and Beck Depression Inventory questionnaires were administered. To generate symptom clusters, principal component analysis with varimax rotation was used on 11 kidney-specific self-reported physical symptoms. A Beck Depression Inventory score of 16 or greater was defined as clinically significant depressive symptoms. Physical and mental component summary scores were generated from Short Form-36. Multivariate Cox regression analysis was used for the survival analysis, Kaplan-Meier curves and log-rank statistics were applied to compare survival rates between the groups. Three different symptom clusters were identified; one included loading of several uremic symptoms. In multivariate analyses and after adjustment for health-related quality of life and depressive symptoms, the worst perceived quartile of the "uremic" symptom cluster independently predicted all-cause mortality (hazard ratio 2.47, 95% CI 1.44-4.22, P = 0.001) compared with the other quartiles during a follow-up period that ranged from four to 52 months. The two other symptom clusters ("neuromuscular" and "skin") or the individual symptoms did not predict mortality. Clustering of uremic symptoms predicted mortality. Assessing co-occurring symptoms rather than single symptoms may help to identify dialysis patients at high risk for mortality. Copyright © 2015 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Individual participant data meta-analyses should not ignore clustering

PubMed Central

Abo-Zaid, Ghada; Guo, Boliang; Deeks, Jonathan J.; Debray, Thomas P.A.; Steyerberg, Ewout W.; Moons, Karel G.M.; Riley, Richard David

2013-01-01

Objectives Individual participant data (IPD) meta-analyses often analyze their IPD as if coming from a single study. We compare this approach with analyses that rather account for clustering of patients within studies. Study Design and Setting Comparison of effect estimates from logistic regression models in real and simulated examples. Results The estimated prognostic effect of age in patients with traumatic brain injury is similar, regardless of whether clustering is accounted for. However, a family history of thrombophilia is found to be a diagnostic marker of deep vein thrombosis [odds ratio, 1.30; 95% confidence interval (CI): 1.00, 1.70; P = 0.05] when clustering is accounted for but not when it is ignored (odds ratio, 1.06; 95% CI: 0.83, 1.37; P = 0.64). Similarly, the treatment effect of nicotine gum on smoking cessation is severely attenuated when clustering is ignored (odds ratio, 1.40; 95% CI: 1.02, 1.92) rather than accounted for (odds ratio, 1.80; 95% CI: 1.29, 2.52). Simulations show models accounting for clustering perform consistently well, but downwardly biased effect estimates and low coverage can occur when ignoring clustering. Conclusion Researchers must routinely account for clustering in IPD meta-analyses; otherwise, misleading effect estimates and conclusions may arise. PMID:23651765
A simple algorithm for the identification of clinical COPD phenotypes.

PubMed

Burgel, Pierre-Régis; Paillasseur, Jean-Louis; Janssens, Wim; Piquet, Jacques; Ter Riet, Gerben; Garcia-Aymerich, Judith; Cosio, Borja; Bakke, Per; Puhan, Milo A; Langhammer, Arnulf; Alfageme, Inmaculada; Almagro, Pere; Ancochea, Julio; Celli, Bartolome R; Casanova, Ciro; de-Torres, Juan P; Decramer, Marc; Echazarreta, Andrés; Esteban, Cristobal; Gomez Punter, Rosa Mar; Han, MeiLan K; Johannessen, Ane; Kaiser, Bernhard; Lamprecht, Bernd; Lange, Peter; Leivseth, Linda; Marin, Jose M; Martin, Francis; Martinez-Camblor, Pablo; Miravitlles, Marc; Oga, Toru; Sofia Ramírez, Ana; Sin, Don D; Sobradillo, Patricia; Soler-Cataluña, Juan J; Turner, Alice M; Verdu Rivera, Francisco Javier; Soriano, Joan B; Roche, Nicolas

2017-11-01

This study aimed to identify simple rules for allocating chronic obstructive pulmonary disease (COPD) patients to clinical phenotypes identified by cluster analyses.Data from 2409 COPD patients of French/Belgian COPD cohorts were analysed using cluster analysis resulting in the identification of subgroups, for which clinical relevance was determined by comparing 3-year all-cause mortality. Classification and regression trees (CARTs) were used to develop an algorithm for allocating patients to these subgroups. This algorithm was tested in 3651 patients from the COPD Cohorts Collaborative International Assessment (3CIA) initiative.Cluster analysis identified five subgroups of COPD patients with different clinical characteristics (especially regarding severity of respiratory disease and the presence of cardiovascular comorbidities and diabetes). The CART-based algorithm indicated that the variables relevant for patient grouping differed markedly between patients with isolated respiratory disease (FEV 1 , dyspnoea grade) and those with multi-morbidity (dyspnoea grade, age, FEV 1 and body mass index). Application of this algorithm to the 3CIA cohorts confirmed that it identified subgroups of patients with different clinical characteristics, mortality rates (median, from 4% to 27%) and age at death (median, from 68 to 76 years).A simple algorithm, integrating respiratory characteristics and comorbidities, allowed the identification of clinically relevant COPD phenotypes. Copyright ©ERS 2017.
Hepatitis a virus genotypes and strains from an endemic area of Europe, Bulgaria 2012-2014.

PubMed

Bruni, Roberto; Taffon, Stefania; Equestre, Michele; Cella, Eleonora; Lo Presti, Alessandra; Costantino, Angela; Chionne, Paola; Madonna, Elisabetta; Golkocheva-Markova, Elitsa; Bankova, Diljana; Ciccozzi, Massimo; Teoharov, Pavel; Ciccaglione, Anna Rita

2017-07-14

Hepatitis A virus (HAV) infection is endemic in Eastern European and Balkan region countries. In 2012, Bulgaria showed the highest rate (67.13 cases per 100,000) in Europe. Nevertheless, HAV genotypes and strains circulating in this country have never been described. The present study reports the molecular characterization of HAV from 105 patients from Bulgaria. Anti-HAV IgM positive serum samples collected in 2012-2014 from different towns and villages in Bulgaria were analysed by nested RT-PCR, sequencing of the VP1/2A region and phylogenetic analysis; the results were analysed together with patient and geographical data. Phylogenetic analysis revealed two main sequence groups corresponding to the IA (78/105, 74%) and IB (27/105, 26%) sub-genotypes. In the IA group, a major and a minor cluster were observed (62 and 16 sequences, respectively). Most sequences from the major cluster (44/62, 71%) belonged to either of two strains, termed "strain 1" and "strain 2", differing only for a single specific nucleotide; the remaining sequences (18/62, 29%) showed few (1 to 4) nucleotide variations respect to strain 1 and 2. Strain 2 is identical to the strain previously responsible for an outbreak in the Czech Republic in 2008 and a large multi-country European outbreak caused by contaminated mixed frozen berries in 2013. Most sequences of the IA minor cluster and the IB group were detected in large/medium centers (LMCs). Overall, sequences from the IA major cluster were more frequent in small centers (SCs), but strain 1 and strain 2 showed an opposite relative frequency in SCs and LMCs (strain 1 more frequent in SCs, strain 2 in LMCs). Genotype IA predominated in Bulgaria in 2012-2014 and phylogenetic analysis identified a major cluster of highly related or identical IA sequences, representing 59% of the analysed cases; these isolates were mostly detected in SCs, in which HAV shows higher endemicity than in LMCs. The distribution of viral sequences suggests the existence of some differences between the transmission routes in SCs and LMCs. Molecular characterization of an increased number of isolates from Bulgaria, regularly collected over time, will be useful to explore specific transmission routes and plan appropriate preventing measures.
Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

PubMed

Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Cluster and Multiple Correspondence Analyses in Rheumatology: Paths to Uncovering Relationships in a Sea of Data.

PubMed

Han, Lu; Benseler, Susanne M; Tyrrell, Pascal N

2018-05-01

Rheumatic diseases encompass a wide range of conditions caused by inflammation and dysregulation of the immune system resulting in organ damage. Research in these heterogeneous diseases benefits from multivariate methods. The aim of this review was to describe and evaluate current literature in rheumatology regarding cluster analysis and correspondence analysis. A systematic review showed an increase in studies making use of these 2 methods. However, standardization in how these methods are applied and reported is needed. Researcher expertise was determined to be the main barrier to considering these approaches, whereas education and collaborating with a biostatistician were suggested ways forward. Copyright © 2018 Elsevier Inc. All rights reserved.
Different disease subtypes with distinct clinical expression in familial Mediterranean fever: results of a cluster analysis.

PubMed

Akar, Servet; Solmaz, Dilek; Kasifoglu, Timucin; Bilge, Sule Yasar; Sari, Ismail; Gumus, Zeynep Zehra; Tunca, Mehmet

2016-02-01

The aim of this study was to evaluate whether there are clinical subgroups that may have different prognoses among FMF patients. The cumulative clinical features of a large group of FMF patients [1168 patients, 593 (50.8%) male, mean age 35.3 years (s.d. 12.4)] were studied. To analyse our data and identify groups of FMF patients with similar clinical characteristics, a two-step cluster analysis using log-likelihood distance measures was performed. For clustering the FMF patients, we evaluated the following variables: gender, current age, age at symptom onset, age at diagnosis, presence of major clinical features, variables related with therapy and family history for FMF, renal failure and carriage of M694V. Three distinct groups of FMF patients were identified. Cluster 1 was characterized by a high prevalence of arthritis, pleuritis, erysipelas-like erythema (ELE) and febrile myalgia. The dosage of colchicine and the frequency of amyloidosis were lower in cluster 1. Patients in cluster 2 had an earlier age of disease onset and diagnosis. M694V carriage and amyloidosis prevalence were the highest in cluster 2. This group of patients was using the highest dose of colchicine. Patients in cluster 3 had the lowest prevalence of arthritis, ELE and febrile myalgia. The frequencies of M694V carriage and amyloidosis were lower in cluster 3 than the overall FMF patients. Non-response to colchicine was also slightly lower in cluster 3. Patients with FMF can be clustered into distinct patterns of clinical and genetic manifestations and these patterns may have different prognostic significance. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Analysis of Rainfall and PM2.5 Data Using Clustered Trajectory Analysis for National Park Sites in the Western U.S.

NASA Astrophysics Data System (ADS)

Solorzano, N. N.; Hafner, W.; Jaffe, D.

2005-12-01

We calculated daily kinematic back-trajectories using the NOAA-HYSPLIT model to analyze 7 years of PM2.5 data from National Park sites in the Western U.S. (Glacier N.P., Mount Rainier N.P., Sequoia N.P., Rocky Mountain N.P. and Denali N.P.) The back-trajectories were clustered using a k-means clustering algorithm to segregate the trajectories into 6 main transport patterns. We calculated trajectory clusters for 1, 5 and 10 days to represent short, medium and long-range flow patterns. Some trajectory types and clusters show marked seasonality. Generally faster flow patterns are more prevalent in winter and slower/stagnant patterns are more prevalent in summer. In addition, we found significant inter-annual variability that may be important for explaining variations in rainfall and/or pollutant concentrations. The 5 and 10-day analyses revealed that, for the 4 non-Alaskan sites, trajectories from Asia tend to be less frequent in the summer, compared to the rest of the year. The clusters of different duration show very different predictive power for rainfall and PM2.5. We found that the 1-day clusters are a better predictor for precipitation and PM2.5 concentrations, as compared to the 5 and 10-day clusters. At each of the sites, there is at least one cluster with an average PM2.5 concentration that is different than the average for the site, indicating distinctive transport patterns. The same is true for 5 and 10-day clusters. Interestingly, only one site, Mount Rainier N.P., shows seasonal differences in PM2.5 concentrations between the clusters that differ from the average.
Using Self-Organizing Neural Network Map Combined with Ward's Clustering Algorithm for Visualization of Students' Cognitive Structural Models about Aliveness Concept

PubMed Central

Ugulu, Ilker; Aydin, Halil

2016-01-01

We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Applications of Stochastic Analyses for Collaborative Learning and Cognitive Assessment

DTIC Science & Technology

2007-04-01

models (Visser, Maartje, Raijmakers, & Molenaar , 2002). The second part of this paper illustrates two applications of the methods described in the...clustering three-way data sets. Computational Statistics and Data Analysis, 51 (11), 5368–5376. Visser, I., Maartje, E., Raijmakers, E. J., & Molenaar
Clustering of food and activity preferences in primary school children.

PubMed

Rodenburg, Gerda; Oenema, Anke; Pasma, Marleen; Kremers, Stef P J; van de Mheen, Dike

2013-01-01

This study examined clustering of food and activity preferences in Dutch primary school children. It also explored whether the preference clusters are associated with child and parental background characteristics and with parenting practices. Data were used from 1480 parent-child dyads participating in the IVO Nutrition and Physical Activity Child cohort (INPACT). Children aged 8-11years reported their preferences for food (e.g. fruit and sweet snacks) and activities (e.g. biking and watching television) at school with a newly-developed, visual instrument designed for primary school children. Parents completed a questionnaire at home. Principal component analysis was used to identify preference clusters. Backward regression analyses were used to examine the relationship between child and parental characteristics with cluster scores. We found (1) a clustering of preferences for unhealthy foods and unhealthy drinks, (2) a clustering of preferences for various physical activity behaviours, and (3) a clustering of preferences for unhealthy drinks and sedentary behaviour. Boys had a higher cluster score than girls on all three preference clusters. In addition, physical activity-related parenting practices were negatively related to unhealthy preference clusters and positively to the physical-activity-preference cluster. The next step is to relate our preference clusters to child dietary and activity behaviours, with special attention to gender differences. This may help in the development of interventions aimed at improving children's food and activity preferences. Copyright © 2012 Elsevier Ltd. All rights reserved.
Village-based spatio-temporal cluster analysis of the schistosomiasis risk in the Poyang Lake Region, China.

PubMed

Xia, Congcong; Bergquist, Robert; Lynn, Henry; Hu, Fei; Lin, Dandan; Hao, Yuwan; Li, Shizhu; Hu, Yi; Zhang, Zhijie

2017-03-08

The Poyang Lake Region, one of the major epidemic sites of schistosomiasis in China, remains a severe challenge. To improve our understanding of the current endemic status of schistosomiasis and to better control the transmission of the disease in the Poyang Lake Region, it is important to analyse the clustering pattern of schistosomiasis and detect the hotspots of transmission risk. Based on annual surveillance data, at the village level in this region from 2009 to 2014, spatial and temporal cluster analyses were conducted to assess the pattern of schistosomiasis infection risk among humans through purely spatial (Local Moran's I, Kulldorff and Flexible scan statistic) and space-time scan statistics (Kulldorff). A dramatic decline was found in the infection rate during the study period, which was shown to be maintained at a low level. The number of spatial clusters declined over time and were concentrated in counties around Poyang Lake, including Yugan, Yongxiu, Nanchang, Xingzi, Xinjian, De'an as well as Pengze, situated along the Yangtze River and the most serious area found in this study. Space-time analysis revealed that the clustering time frame appeared between 2009 and 2011 and the most likely cluster with the widest range was particularly concentrated in Pengze County. This study detected areas at high risk for schistosomiasis both in space and time at the village level from 2009 to 2014 in Poyang Lake Region. The high-risk areas are now more concentrated and mainly distributed at the river inflows Poyang Lake and along Yangtze River in Pengze County. It was assumed that the water projects including reservoirs and a recently breached dyke in this area were partly to blame. This study points out that attempts to reduce the negative effects of water projects in China should focus on the Poyang Lake Region.
AMMI adjustment for statistical analysis of an international wheat yield trial.

PubMed

Crossa, J; Fox, P N; Pfeiffer, W H; Rajaram, S; Gauch, H G

1991-01-01

Multilocation trials are important for the CIMMYT Bread Wheat Program in producing high-yielding, adapted lines for a wide range of environments. This study investigated procedures for improving predictive success of a yield trial, grouping environments and genotypes into homogeneous subsets, and determining the yield stability of 18 CIMMYT bread wheats evaluated at 25 locations. Additive Main effects and Multiplicative Interaction (AMMI) analysis gave more precise estimates of genotypic yields within locations than means across replicates. This precision facilitated formation by cluster analysis of more cohesive groups of genotypes and locations for biological interpretation of interactions than occurred with unadjusted means. Locations were clustered into two subsets for which genotypes with positive interactions manifested in high, stable yields were identified. The analyses highlighted superior selections with both broad and specific adaptation.
Proposed shade guide for human facial skin and lip: a pilot study.

PubMed

Wee, Alvin G; Beatty, Mark W; Gozalo-Diaz, David J; Kim-Pusateri, Seungyee; Marx, David B

2013-08-01

Currently, no commercially available facial shade guide exists in the United States for the fabrication of facial prostheses. The purpose of this study was to measure facial skin and lip color in a human population sample stratified by age, gender, and race. Clustering analysis was used to determine optimal color coordinates for a proposed facial shade guide. Participants (n=119) were recruited from 4 racial/ethnic groups, 5 age groups, and both genders. Reflectance measurements of participants' noses and lower lips were made by using a spectroradiometer and xenon arc lamp with a 45/0 optical configuration. Repeated measures ANOVA (α=.05), to identify skin and lip color differences, resulting from race, age, gender, and location, and a hierarchical clustering analysis, to identify clusters of skin colors) were used. Significant contributors to L*a*b* facial color were race and facial location (P<.01). b* affected all factors (P<.05). Age affected only b* (P<.001), while gender affected only L* (P<.05) and b* (P<.05). Analyses identified 5 clusters of skin color. The study showed that skin color caused by age and gender primarily occurred within the yellow-blue axis. A significant lightness difference between gender groups was also found. Clustering analysis identified 5 distinct skin shade tabs. Copyright © 2013 The Editorial Council of the Journal of Prosthetic Dentistry. Published by Mosby, Inc. All rights reserved.
Phrase Mining of Textual Data to Analyze Extracellular Matrix Protein Patterns Across Cardiovascular Disease.

PubMed

Liem, David Alexandre; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, J Harry; Wang, Wei; Ping, Peipei; Han, Jiawei

2018-05-18

Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. By using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely ischemic heart disease (IHD), cardiomyopathies (CM), cerebrovascular accident (CVA), congenital heart disease (CHD), arrhythmias (ARR), and valve disease (VD), anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the six groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-aware Semantic Online Analytical Processing (CaseOLAP) was applied to semantically rank the association of proteins to each and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein is visualized as a six dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all six CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters associated with a targeted CVD; analyses revealed unexpected insights underlying ECM-pathogenesis of CVDs.
A cross-sectional cluster analysis of the combined association of physical activity and sleep with sociodemographic and health characteristics in mid-aged and older adults.

PubMed

Rayward, Anna T; Duncan, Mitch J; Brown, Wendy J; Plotnikoff, Ronald C; Burton, Nicola W

2017-08-01

This study aimed to identify how different patterns of physical activity, sleep duration and sleep quality cluster together, and to examine how the identified clusters differ in terms of socio-demographic and health characteristics. Participants were adults from Brisbane, Australia, aged 42-72 years who reported their physical activity, sleep duration, sleep quality, socio-demographic and health characteristics in 2011 (n=5854). Two-step Cluster Analyses were used to identify clusters. Cluster differences in socio-demographic and health characteristics were examined using chi square tests (p<0.05). Four clusters were identified: 'Poor Sleepers' (31.2%), 'Moderate Sleepers' (30.7%), 'Mixed Sleepers/Highly Active' (20.5%), and 'Excellent Sleepers/Mixed Activity' (17.6%). The 'Poor Sleepers' cluster had the highest proportion of participants with less-than-recommended sleep duration and poor sleep quality, had the poorest health characteristics and a high proportion of participants with low physical activity. Physical activity, sleep duration and sleep quality cluster together in distinct patterns and clusters of poor behaviours are associated with poor health status. Multiple health behaviour change interventions which target both physical activity and sleep should be prioritised to improve health outcomes in mid-aged adults. Copyright © 2017 Elsevier B.V. All rights reserved.
A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation.

PubMed

Fretheim, Atle; Zhang, Fang; Ross-Degnan, Dennis; Oxman, Andrew D; Cheyne, Helen; Foy, Robbie; Goodacre, Steve; Herrin, Jeph; Kerse, Ngaire; McKinlay, R James; Wright, Adam; Soumerai, Stephen B

2015-03-01

There is often substantial uncertainty about the impacts of health system and policy interventions. Despite that, randomized controlled trials (RCTs) are uncommon in this field, partly because experiments can be difficult to carry out. An alternative method for impact evaluation is the interrupted time-series (ITS) design. Little is known, however, about how results from the two methods compare. Our aim was to explore whether ITS studies yield results that differ from those of randomized trials. We conducted single-arm ITS analyses (segmented regression) based on data from the intervention arm of cluster randomized trials (C-RCTs), that is, discarding control arm data. Secondarily, we included the control group data in the analyses, by subtracting control group data points from intervention group data points, thereby constructing a time series representing the difference between the intervention and control groups. We compared the results from the single-arm and controlled ITS analyses with results based on conventional aggregated analyses of trial data. The findings were largely concordant, yielding effect estimates with overlapping 95% confidence intervals (CI) across different analytical methods. However, our analyses revealed the importance of a concurrent control group and of taking baseline and follow-up trends into account in the analysis of C-RCTs. The ITS design is valuable for evaluation of health systems interventions, both when RCTs are not feasible and in the analysis and interpretation of data from C-RCTs. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

A census of variability in globular cluster M 68 (NGC 4590)

NASA Astrophysics Data System (ADS)

Kains, N.; Arellano Ferro, A.; Figuera Jaimes, R.; Bramich, D. M.; Skottfelt, J.; Jørgensen, U. G.; Tsapras, Y.; Street, R. A.; Browne, P.; Dominik, M.; Horne, K.; Hundertmark, M.; Ipatov, S.; Snodgrass, C.; Steele, I. A.; Lcogt/Robonet Consortium; Alsubai, K. A.; Bozza, V.; Calchi Novati, S.; Ciceri, S.; D'Ago, G.; Galianni, P.; Gu, S.-H.; Harpsøe, K.; Hinse, T. C.; Juncher, D.; Korhonen, H.; Mancini, L.; Popovas, A.; Rabus, M.; Rahvar, S.; Southworth, J.; Surdej, J.; Vilela, C.; Wang, X.-B.; Wertz, O.; Mindstep Consortium

2015-06-01

Aims: We analyse 20 nights of CCD observations in the V and I bands of the globular cluster M 68 (NGC 4590) and use them to detect variable objects. We also obtained electron-multiplying CCD (EMCCD) observations for this cluster in order to explore its core with unprecedented spatial resolution from the ground. Methods: We reduced our data using difference image analysis to achieve the best possible photometry in the crowded field of the cluster. In doing so, we show that when dealing with identical networked telescopes, a reference image from any telescope may be used to reduce data from any other telescope, which facilitates the analysis significantly. We then used our light curves to estimate the properties of the RR Lyrae (RRL) stars in M 68 through Fourier decomposition and empirical relations. The variable star properties then allowed us to derive the cluster's metallicity and distance. Results: M 68 had 45 previously confirmed variables, including 42 RRL and 2 SX Phoenicis (SX Phe) stars. In this paper we determine new periods and search for new variables, especially in the core of the cluster where our method performs particularly well. We detect 4 additional SX Phe stars and confirm the variability of another star, bringing the total number of confirmed variable stars in this cluster to 50. We also used archival data stretching back to 1951 to derive period changes for some of the single-mode RRL stars, and analyse the significant number of double-mode RRL stars in M 68. Furthermore, we find evidence for double-mode pulsation in one of the SX Phe stars in this cluster. Using the different classes of variables, we derived values for the metallicity of the cluster of [Fe/H] = -2.07 ± 0.06 on the ZW scale, or -2.20 ± 0.10 on the UVES scale, and found true distance moduli μ0 = 15.00 ± 0.11 mag (using RR0 stars), 15.00 ± 0.05 mag (using RR1 stars), 14.97 ± 0.11 mag (using SX Phe stars), and 15.00 ± 0.07 mag (using the MV -[Fe/H] relation for RRL stars), corresponding to physical distances of 10.00 ± 0.49, 9.99 ± 0.21, 9.84 ± 0.50, and 10.00 ± 0.30 kpc, respectively. Thanks to the first use of difference image analysis on time-series observations of M 68, we are now confident that we have a complete census of the RRL stars in this cluster. The full Table 2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/578/A128
Software system for data management and distributed processing of multichannel biomedical signals.

PubMed

Franaszczuk, P J; Jouny, C C

2004-01-01

The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.
Probing the History of Galaxy Clusters with Metallicity and Entropy Measurements

NASA Astrophysics Data System (ADS)

Elkholy, Tamer Yohanna

Galaxy clusters are the largest gravitationally bound objects found today in our Universe. The gas they contain, the intra-cluster medium (ICM), is heated to temperatures in the approximate range of 1 to 10 keV, and thus emits X-ray radiation. Studying the ICM through the spatial and spectral analysis of its emission returns the richest information about both the overall cosmological context which governs the formation of clusters, as well as the physical processes occurring within. The aim of this thesis is to learn about the history of the physical processes that drive the evolution of galaxy clusters, through careful, spatially resolved measurements of their metallicity and entropy content. A sample of 45 nearby clusters observed with Chandra is analyzed to produce radial density, temperature, entropy and metallicity profiles. The entropy profiles are computed to larger radial extents than in previous Chandra analyses. The results of this analysis are made available to the scientific community in an electronic database. Comparing metallicity and entropy in the outskirts of clusters, we find no signature on the entropy profiles of the ensemble of supernovae that produced the observed metals. In the centers of clusters, we find that the metallicities of high-mass clusters are much less dispersed than those of low-mass clusters. A comparison of metallicity with the regularity of the X-ray emission morphology suggests that metallicities in low-mass clusters are more susceptible to increase from violent events such as mergers. We also find that the variation in the stellar-to-gas mass ratio as a function of cluster mass can explain the variation of central metallicity with cluster mass, only if we assume that there is a constant level of metallicity for clusters of all masses, above which the observed galaxies add more metals in proportion to their mass. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
Using coordinate-based meta-analyses to explore structural imaging genetics.

PubMed

Janouschek, Hildegard; Eickhoff, Claudia R; Mühleisen, Thomas W; Eickhoff, Simon B; Nickl-Jockschat, Thomas

2018-05-05

Imaging genetics has become a highly popular approach in the field of schizophrenia research. A frequently reported finding is that effects from common genetic variation are associated with a schizophrenia-related structural endophenotype. Genetic contributions to a structural endophenotype may be easier to delineate, when referring to biological rather than diagnostic criteria. We used coordinate-based meta-analyses, namely the anatomical likelihood estimation (ALE) algorithm on 30 schizophrenia-related imaging genetics studies, representing 44 single-nucleotide polymorphisms at 26 gene loci investigated in 4682 subjects. To test whether analyses based on biological information would improve the convergence of results, gene ontology (GO) terms were used to group the findings from the published studies. We did not find any significant results for the main contrast. However, our analysis enrolling studies on genotype × diagnosis interaction yielded two clusters in the left temporal lobe and the medial orbitofrontal cortex. All other subanalyses did not yield any significant results. To gain insight into possible biological relationships between the genes implicated by these clusters, we mapped five of them to GO terms of the category "biological process" (AKT1, CNNM2, DISC1, DTNBP1, VAV3), then five to "cellular component" terms (AKT1, CNNM2, DISC1, DTNBP1, VAV3), and three to "molecular function" terms (AKT1, VAV3, ZNF804A). A subsequent cluster analysis identified representative, non-redundant subsets of semantically similar terms that aided a further interpretation. We regard this approach as a new option to systematically explore the richness of the literature in imaging genetics.
Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.

PubMed

Fiero, Mallorie H; Huang, Shuang; Oren, Eyal; Bell, Melanie L

2016-02-09

Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.
Voxel-based statistical analysis of cerebral blood flow using Tc-99m ECD brain SPECT in patients with traumatic brain injury: group and individual analyses.

PubMed

Shin, Yong Beom; Kim, Seong-Jang; Kim, In-Ju; Kim, Yong-Ki; Kim, Dong-Soo; Park, Jae Heung; Yeom, Seok-Ran

2006-06-01

Statistical parametric mapping (SPM) was applied to brain perfusion single photon emission computed tomography (SPECT) images in patients with traumatic brain injury (TBI) to investigate regional cerebral abnormalities compared to age-matched normal controls. Thirteen patients with TBI underwent brain perfusion SPECT were included in this study (10 males, three females, mean age 39.8 +/- 18.2, range 21 - 74). SPM2 software implemented in MATLAB 5.3 was used for spatial pre-processing and analysis and to determine the quantitative differences between TBI patients and age-matched normal controls. Three large voxel clusters of significantly decreased cerebral blood perfusion were found in patients with TBI. The largest clusters were area including medial frontal gyrus (voxel number 3642, peak Z-value = 4.31, 4.27, p = 0.000) in both hemispheres. The second largest clusters were areas including cingulated gyrus and anterior cingulate gyrus of left hemisphere (voxel number 381, peak Z-value = 3.67, 3.62, p = 0.000). Other clusters were parahippocampal gyrus (voxel number 173, peak Z-value = 3.40, p = 0.000) and hippocampus (voxel number 173, peak Z-value = 3.23, p = 0.001) in the left hemisphere. The false discovery rate (FDR) was less than 0.04. From this study, group and individual analyses of SPM2 could clearly identify the perfusion abnormalities of brain SPECT in patients with TBI. Group analysis of SPM2 showed hypoperfusion pattern in the areas including medial frontal gyrus of both hemispheres, cingulate gyrus, anterior cingulate gyrus, parahippocampal gyrus and hippocampus in the left hemisphere compared to age-matched normal controls. Also, left parahippocampal gyrus and left hippocampus were additional hypoperfusion areas. However, these findings deserve further investigation on a larger number of patients to be performed to allow a better validation of objective SPM analysis in patients with TBI.
Analysis of Genetic Diversity and Structure Pattern of Indigofera Pseudotinctoria in Karst Habitats of the Wushan Mountains Using AFLP Markers.

PubMed

Fan, Yan; Zhang, Chenglin; Wu, Wendan; He, Wei; Zhang, Li; Ma, Xiao

2017-10-16

Indigofera pseudotinctoria Mats is an agronomically and economically important perennial legume shrub with a high forage yield, protein content and strong adaptability, which is subject to natural habitat fragmentation and serious human disturbance. Until now, our knowledge of the genetic relationships and intraspecific genetic diversity for its wild collections is still poor, especially at small spatial scales. Here amplified fragment length polymorphism (AFLP) technology was employed for analysis of genetic diversity, differentiation, and structure of 364 genotypes of I. pseudotinctoria from 15 natural locations in Wushan Montain, a highly structured mountain with typical karst landforms in Southwest China. We also tested whether eco-climate factors has affected genetic structure by correlating genetic diversity with habitat features. A total of 515 distinctly scoreable bands were generated, and 324 of them were polymorphic. The polymorphic information content (PIC) ranged from 0.694 to 0.890 with an average of 0.789 per primer pair. On species level, Nei's gene diversity ( H j ), the Bayesian genetic diversity index ( H B ) and the Shannon information index ( I ) were 0.2465, 0.2363 and 0.3772, respectively. The high differentiation among all sampling sites was detected ( F ST = 0.2217, G ST = 0.1746, G' ST = 0.2060, θ B = 0.1844), and instead, gene flow among accessions ( N m = 1.1819) was restricted. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. This structure pattern may indicate joint effects by the neutral evolution and natural selection. Restricted N m was observed across all accessions, and genetic barriers were detected between adjacent accessions due to specifically geographical landform.
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

PubMed

Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

2017-01-01

Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features.

PubMed

Haakensen, Vilde D; Lingjaerde, Ole Christian; Lüders, Torben; Riis, Margit; Prat, Aleix; Troester, Melissa A; Holmen, Marit M; Frantzen, Jan Ole; Romundstad, Linda; Navjord, Dina; Bukholm, Ida K; Johannesen, Tom B; Perou, Charles M; Ursin, Giske; Kristensen, Vessela N; Børresen-Dale, Anne-Lise; Helland, Aslaug

2011-11-01

Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer.
The community structure of endophytic bacteria in different parts of Huanglongbing-affected citrus plants

USDA-ARS?s Scientific Manuscript database

The analyses methods of Pearson correlation coefficient (PCC), hierarchical cluster analysis and diversity index were used to study the relevance between citrus huanglongbing (HLB) and the endophytic bacteria in different branches and leaves as well as roots of huanglongbing (HLB)-affected citrus tr...
Antagonists in Mutual Antipathies: A Person-Oriented Approach

ERIC Educational Resources Information Center

Guroglu, Berna; Haselager, Gerbert J. T.; van Lieshout, Cornelis F. M.; Scholte, Ron H. J.

2009-01-01

This study investigated the heterogeneity of mutual antipathy relationships. Separate cluster analyses of peer interactions of early adolescents (mean age 11 years) and adolescents (mean age of 14) yielded 3 "types of individuals" in each age group, namely Prosocial, Antisocial, and Withdrawn. Prevalence analysis of the 6 possible combinations of…
Geographic atrophy phenotype identification by cluster analysis.

PubMed

Monés, Jordi; Biarnés, Marc

2018-03-01

To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm 2 /year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

PubMed

Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

2015-02-01

Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.
Patterns of Gender Equality at Workplaces and Psychological Distress

PubMed Central

Bolin, Malin; Hammarström, Anne

2013-01-01

Research in the field of occupational health often uses a risk factor approach which has been criticized by feminist researchers for not considering the combination of many different variables that are at play simultaneously. To overcome this shortcoming this study aims to identify patterns of gender equality at workplaces and to investigate how these patterns are associated with psychological distress. Questionnaire data from the Northern Swedish Cohort (n = 715) have been analysed and supplemented with register data about the participants' workplaces. The register data were used to create gender equality indicators of women/men ratios of number of employees, educational level, salary and parental leave. Cluster analysis was used to identify patterns of gender equality at the workplaces. Differences in psychological distress between the clusters were analysed by chi-square test and logistic regression analyses, adjusting for individual socio-demographics and previous psychological distress. The cluster analysis resulted in six distinctive clusters with different patterns of gender equality at the workplaces that were associated to psychological distress for women but not for men. For women the highest odds of psychological distress was found on traditionally gender unequal workplaces. The lowest overall occurrence of psychological distress as well as same occurrence for women and men was found on the most gender equal workplaces. The results from this study support the convergence hypothesis as gender equality at the workplace does not only relate to better mental health for women, but also more similar occurrence of mental ill-health between women and men. This study highlights the importance of utilizing a multidimensional view of gender equality to understand its association to health outcomes. Health policies need to consider gender equality at the workplace level as a social determinant of health that is of importance for reducing differences in health outcomes for women and men. PMID:23326404
The Impact of Multilocus Variable-Number Tandem-Repeat Analysis on PulseNet Canada Escherichia coli O157:H7 Laboratory Surveillance and Outbreak Support, 2008-2012.

PubMed

Rumore, Jillian Leigh; Tschetter, Lorelee; Nadon, Celine

2016-05-01

The lack of pattern diversity among pulsed-field gel electrophoresis (PFGE) profiles for Escherichia coli O157:H7 in Canada does not consistently provide optimal discrimination, and therefore, differentiating temporally and/or geographically associated sporadic cases from potential outbreak cases can at times impede investigations. To address this limitation, DNA sequence-based methods such as multilocus variable-number tandem-repeat analysis (MLVA) have been explored. To assess the performance of MLVA as a supplemental method to PFGE from the Canadian perspective, a retrospective analysis of all E. coli O157:H7 isolated in Canada from January 2008 to December 2012 (inclusive) was conducted. A total of 2285 E. coli O157:H7 isolates and 63 clusters of cases (by PFGE) were selected for the study. Based on the qualitative analysis, the addition of MLVA improved the categorization of cases for 60% of clusters and no change was observed for ∼40% of clusters investigated. In such situations, MLVA serves to confirm PFGE results, but may not add further information per se. The findings of this study demonstrate that MLVA data, when used in combination with PFGE-based analyses, provide additional resolution to the detection of clusters lacking PFGE diversity as well as demonstrate good epidemiological concordance. In addition, MLVA is able to identify cluster-associated isolates with variant PFGE pattern combinations that may have been previously missed by PFGE alone. Optimal laboratory surveillance in Canada is achieved with the application of PFGE and MLVA in tandem for routine surveillance, cluster detection, and outbreak response.
The inner formal structure of the H-T-P drawings: an exploratory study.

PubMed

Vass, Z

1998-08-01

The study describes some interrelated patterns of traits of the House-Tree-Person (H-T-P) drawings with the instruments of hierarchical cluster analysis. First, according to the literature 1 7 formal or structural aspects of the projective drawings were collected, after which a detailed manual for coding was compiled. Second, the interrater reliability and the consistency of this manual was tested. Third, the hierarchical cluster structure of the reliable and consistent formal aspects was analysed. Results are: (a) a psychometrically tested coding manual of the investigated formal-structural aspects, each of them illustrated with drawings that showed the highest interrater agreement; and (b) the hierarchic cluster structure of the formal aspects of the H-T-P drawings of "normal" adults.
Intertumoral Heterogeneity within Medulloblastoma Subgroups.

PubMed

Cavalli, Florence M G; Remke, Marc; Rampasek, Ladislav; Peacock, John; Shih, David J H; Luu, Betty; Garzia, Livia; Torchia, Jonathon; Nor, Carolina; Morrissy, A Sorana; Agnihotri, Sameer; Thompson, Yuan Yao; Kuzan-Fischer, Claudia M; Farooq, Hamza; Isaev, Keren; Daniels, Craig; Cho, Byung-Kyu; Kim, Seung-Ki; Wang, Kyu-Chang; Lee, Ji Yeoun; Grajkowska, Wieslawa A; Perek-Polnik, Marta; Vasiljevic, Alexandre; Faure-Conter, Cecile; Jouvet, Anne; Giannini, Caterina; Nageswara Rao, Amulya A; Li, Kay Ka Wai; Ng, Ho-Keung; Eberhart, Charles G; Pollack, Ian F; Hamilton, Ronald L; Gillespie, G Yancey; Olson, James M; Leary, Sarah; Weiss, William A; Lach, Boleslaw; Chambless, Lola B; Thompson, Reid C; Cooper, Michael K; Vibhakar, Rajeev; Hauser, Peter; van Veelen, Marie-Lise C; Kros, Johan M; French, Pim J; Ra, Young Shin; Kumabe, Toshihiro; López-Aguilar, Enrique; Zitterbart, Karel; Sterba, Jaroslav; Finocchiaro, Gaetano; Massimino, Maura; Van Meir, Erwin G; Osuka, Satoru; Shofuda, Tomoko; Klekner, Almos; Zollo, Massimo; Leonard, Jeffrey R; Rubin, Joshua B; Jabado, Nada; Albrecht, Steffen; Mora, Jaume; Van Meter, Timothy E; Jung, Shin; Moore, Andrew S; Hallahan, Andrew R; Chan, Jennifer A; Tirapelli, Daniela P C; Carlotti, Carlos G; Fouladi, Maryam; Pimentel, José; Faria, Claudia C; Saad, Ali G; Massimi, Luca; Liau, Linda M; Wheeler, Helen; Nakamura, Hideo; Elbabaa, Samer K; Perezpeña-Diazconti, Mario; Chico Ponce de León, Fernando; Robinson, Shenandoah; Zapotocky, Michal; Lassaletta, Alvaro; Huang, Annie; Hawkins, Cynthia E; Tabori, Uri; Bouffet, Eric; Bartels, Ute; Dirks, Peter B; Rutka, James T; Bader, Gary D; Reimand, Jüri; Goldenberg, Anna; Ramaswamy, Vijay; Taylor, Michael D

2017-06-12

While molecular subgrouping has revolutionized medulloblastoma classification, the extent of heterogeneity within subgroups is unknown. Similarity network fusion (SNF) applied to genome-wide DNA methylation and gene expression data across 763 primary samples identifies very homogeneous clusters of patients, supporting the presence of medulloblastoma subtypes. After integration of somatic copy-number alterations, and clinical features specific to each cluster, we identify 12 different subtypes of medulloblastoma. Integrative analysis using SNF further delineates group 3 from group 4 medulloblastoma, which is not as readily apparent through analyses of individual data types. Two clear subtypes of infants with Sonic Hedgehog medulloblastoma with disparate outcomes and biology are identified. Medulloblastoma subtypes identified through integrative clustering have important implications for stratification of future clinical trials. Copyright © 2017 Elsevier Inc. All rights reserved.
Formation of multiply charged ions from large molecules using massive-cluster impact.

PubMed

Mahoney, J F; Cornett, D S; Lee, T D

1994-05-01

Massive-cluster impact is demonstrated to be an effective ionization technique for the mass analysis of proteins as large as 17 kDa. The design of the cluster source permits coupling to both magnetic-sector and quadrupole mass spectrometers. Mass spectra are characterized by the almost total absence of chemical background and a predominance of multiply charged ions formed from 100% glycerol matrix. The number of charge states produced by the technique is observed to range from +3 to +9 for chicken egg lysozyme (14,310 Da). The lower m/z values provided by higher charge states increase the effective mass range of analyses performed with conventional ionization by fast-atom bombardment or liquid secondary ion mass spectrometry.
Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

NASA Astrophysics Data System (ADS)

Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

2015-11-01

In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

PubMed Central

Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

2003-01-01

Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292

Hydrogeochemical processes and isotopes analysis. Study case: "La Línea Tunnel", Colombia

NASA Astrophysics Data System (ADS)

Piña, Adriana; Donado, Leonardo; Cramer, Thomas

2017-04-01

Hydrogeochemical and stable isotopes analyses have been widely used to identify recharge and discharge zones, flowpaths, type, origin and age of water, chemical processes between minerals and groundwater as well as effects caused by anthropogenic or natural pollution. In this paper we analyze the interactions between groundwater and surface water using as laboratory the tunnels located at the La Línea Massif in the Cordillera Central of the Colombian Andes. The massif is formed by two igneous-metamorphic fractured complexes (Cajamarca and Quebradagrande group) plus andesithic porphyry rocks from the tertiary period. There, eight main fault zones related to surface creeks were identified and main inflows inside the tunnels were reported. 60 water samples were collected in surface and inside the tunnel in fault zones in two different years, 2010 and 2015. To classify water samples, a multivariate statistical analysis combining Factor Analysis (FA) with Hierarchical Cluster Analysis (HCA) was performed. Then, analyses of the major chemical elements and water isotopes (18O, 2H and 3H) were used to define the origin of dissolved components and to analyse the evolution in time. Most samples were classified as bicarbonate calcite water or bicarbonate magnesium water type. Isotopic analyses show a characteristic behavior for east and west watershed and each geologic group. According to the FA and HCA, obtained factors and clusters are first related to the location of the samples (surface or tunnel samples) followed by the geology. Surface samples behave according to the Colombian meteoric line as inflows related to permeable faults while less permeable faults show hydrothermal processes. Finally, water evolution in time shows a decrease of pH, conductivity and Mg2+ related to silicate weathering or precipitation/dissolution processes that affect the spacing in fractures and consequently, the hydraulic properties.
Intermediate and advanced topics in multilevel logistic regression analysis

PubMed Central

Merlo, Juan

2017-01-01

Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
New natural products isolated from Metarhizium robertsii ARSEF 23 by chemical screening and identification of the gene cluster through engineered biosynthesis in Aspergillus nidulans A1145.

PubMed

Kato, Hiroki; Tsunematsu, Yuta; Yamamoto, Tsuyoshi; Namiki, Takuya; Kishimoto, Shinji; Noguchi, Hiroshi; Watanabe, Kenji

2016-07-01

To rapidly identify novel natural products and their associated biosynthetic genes from underutilized and genetically difficult-to-manipulate microbes, we developed a method that uses (1) chemical screening to isolate novel microbial secondary metabolites, (2) bioinformatic analyses to identify a potential biosynthetic gene cluster and (3) heterologous expression of the genes in a convenient host to confirm the identity of the gene cluster and the proposed biosynthetic mechanism. The chemical screen was achieved by searching known natural product databases with data from liquid chromatographic and high-resolution mass spectrometric analyses collected on the extract from a target microbe culture. Using this method, we were able to isolate two new meroterpenes, subglutinols C (1) and D (2), from an entomopathogenic filamentous fungus Metarhizium robertsii ARSEF 23. Bioinformatics analysis of the genome allowed us to identify a gene cluster likely to be responsible for the formation of subglutinols. Heterologous expression of three genes from the gene cluster encoding a polyketide synthase, a prenyltransferase and a geranylgeranyl pyrophosphate synthase in Aspergillus nidulans A1145 afforded an α-pyrone-fused uncyclized diterpene, the expected intermediate of the subglutinol biosynthesis, thereby confirming the gene cluster to be responsible for the subglutinol biosynthesis. These results indicate the usefulness of our methodology in isolating new natural products and identifying their associated biosynthetic gene cluster from microbes that are not amenable to genetic manipulation. Our method should facilitate the natural product discovery efforts by expediting the identification of new secondary metabolites and their associated biosynthetic genes from a wider source of microbes.
A new clustering of antibody CDR loop conformations

PubMed Central

North, Benjamin; Lehmann, Andreas; Dunbrack, Roland L.

2010-01-01

Previous analyses of the complementarity determining regions (CDRs) of antibodies have focused on a small number of “canonical” conformations for each loop. This is primarily the result of the work of Chothia and colleagues, most recently in 1997. Because of the widespread utility of antibodies, we have revisited the clustering of conformations of the six CDR loops with the much larger amount of structural information currently available. In this work, we were careful to use a high-quality data set by eliminating low-resolution structures and CDRs with high B-factors or high conformational energies. We used a distance function based on directional statistics and an effective clustering algorithm using affinity propagation. With this data set of over 300 non-redundant antibody structures, we were able to cover 28 CDR-length combinations (e.g., L1 length 11, or “L1-11” in our nomenclature) for L1, L2, L3, H1 and H2. The Chothia analysis covered only 20 CDR-lengths. Only four of these had more than one conformational cluster, of which two could easily be distinguished by gene source (mouse/human; κ/λ) and one purely by the presence and positions of Pro residues (L3-9). Thus using the Chothia analysis does not require the complicated set of “structure-determining residues” that is often assumed. Of our 28 CDR-lengths, 15 of them have multiple conformational clusters including ten for which Chothia had only one canonical class. We have a total of 72 clusters for the non-H3 CDRs; approximately 85% of the non-H3 sequences can be assigned to a conformational cluster based on gene source and/or sequence. We found that earlier predictions of “bulged” vs. “non-bulged” conformations based on the presence or absence of anchor residues Arg/Lys94 and Asp101 of H3 have not held up, since all four combinations lead to a majority of conformations that are bulged. Thus the earlier analyses have been significantly enhanced by the increased data. We believe the new classification will lead to improved methods for antibody structure prediction and design. PMID:21035459
The Prevalence of Comorbid Personality Disorders in Treatment-Seeking Problem Gamblers: A Systematic Review and Meta-Analysis.

PubMed

Dowling, Nicki A; Cowlishaw, S; Jackson, A C; Merkouris, S S; Francis, K L; Christensen, D R

2015-12-01

The aim of this study was to systematically review and meta-analyze the prevalence of comorbid personality disorders among treatment-seeking problem gamblers. Almost one half (47.9%) of problem gamblers displayed comorbid personality disorders. They were most likely to display Cluster B disorders (17.6%), with smaller proportions reporting Cluster C disorders (12.6%) and Cluster A disorders (6.1%). The most prevalent personality disorders were narcissistic (16.6%), antisocial (14.0%), avoidant (13.4%), obsessive-compulsive (13.4%), and borderline (13.1%) personality disorders. Sensitivity analyses suggested that these prevalence estimates were robust to the inclusion of clinical trials and self-selected samples. Although there was significant variability in reported rates, subgroup analyses revealed no significant differences in estimates of antisocial personality disorder according to problem gambling severity, measure of comorbidity employed, and study jurisdiction. The findings highlight the need for gambling treatment services to conduct routine screening and assessment of co-occurring personality disorders and to provide treatment approaches that adequately address these comorbid conditions.
The MUSE-Wide survey: detection of a clustering signal from Lyman α emitters in the range 3 < z < 6

NASA Astrophysics Data System (ADS)

Diener, C.; Wisotzki, L.; Schmidt, K. B.; Herenz, E. C.; Urrutia, T.; Garel, T.; Kerutt, J.; Saust, R. L.; Bacon, R.; Cantalupo, S.; Contini, T.; Guiderdoni, B.; Marino, R. A.; Richard, J.; Schaye, J.; Soucail, G.; Weilbacher, P. M.

2017-11-01

We present a clustering analysis of a sample of 238 Ly α emitters at redshift 3 ≲ z ≲ 6 from the MUSE-Wide survey. This survey mosaics extragalactic legacy fields with 1h MUSE pointings to detect statistically relevant samples of emission line galaxies. We analysed the first year observations from MUSE-Wide making use of the clustering signal in the line-of-sight direction. This method relies on comparing pair-counts at close redshifts for a fixed transverse distance and thus exploits the full potential of the redshift range covered by our sample. A clear clustering signal with a correlation length of r0=2.9^{+1.0}_{-1.1} Mpc (comoving) is detected. Whilst this result is based on only about a quarter of the full survey size, it already shows the immense potential of MUSE for efficiently observing and studying the clustering of Ly α emitters.
Russian consumers' motives for food choice.

PubMed

Honkanen, Pirjo; Frewer, Lynn

2009-04-01

Knowledge about food choice motives which have potential to influence consumer consumption decisions is important when designing food and health policies, as well as marketing strategies. Russian consumers' food choice motives were studied in a survey (1081 respondents across four cities), with the purpose of identifying consumer segments based on these motives. These segments were then profiled using consumption, attitudinal and demographic variables. Face-to-face interviews were used to sample the data, which were analysed with two-step cluster analysis (SPSS). Three clusters emerged, representing 21.5%, 45.8% and 32.7% of the sample. The clusters were similar in terms of the order of motivations, but differed in motivational level. Sensory factors and availability were the most important motives for food choice in all three clusters, followed by price. This may reflect the turbulence which Russia has recently experienced politically and economically. Cluster profiles differed in relation to socio-demographic factors, consumption patterns and attitudes towards health and healthy food.
Sloshing Gas in the Core of the Most Luminous Galaxy Cluster RXJ1347.5-1145

NASA Technical Reports Server (NTRS)

Johnson, Ryan E.; Zuhone, John; Jones, Christine; Forman, William R.; Markevitvh, Maxim

2011-01-01

We present new constraints on the merger history of the most X-ray luminous cluster of galaxies, RXJ1347.5-1145, based on its unique multiwavelength morphology. Our X-ray analysis confirms the core gas is undergoing "sloshing" resulting from a prior, large scale, gravitational perturbation. In combination with extensive multiwavelength observations, the sloshing gas points to the primary and secondary clusters having had at least two prior strong gravitational interactions. The evidence supports a model in which the secondary subcluster with mass M=4.8+/-2.4x10(exp 14) solar Mass has previously (> or approx.0.6 Gyr ago) passed by the primary cluster, and has now returned for a subsequent crossing where the subcluster's gas has been completely stripped from its dark matter halo. RXJ1347 is a prime example of how core gas sloshing may be used to constrain the merger histories of galaxy clusters through multiwavelength analyses.
Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets

PubMed Central

Griss, Johannes; Perez-Riverol, Yasset; Lewis, Steve; Tabb, David L.; Dianes, José A.; del-Toro, Noemi; Rurik, Marc; Walzer, Mathias W.; Kohlbacher, Oliver; Hermjakob, Henning; Wang, Rui; Vizcaíno, Juan Antonio

2016-01-01

Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra. PMID:27493588
Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets.

PubMed

Griss, Johannes; Perez-Riverol, Yasset; Lewis, Steve; Tabb, David L; Dianes, José A; Del-Toro, Noemi; Rurik, Marc; Walzer, Mathias W; Kohlbacher, Oliver; Hermjakob, Henning; Wang, Rui; Vizcaíno, Juan Antonio

2016-08-01

Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
Improving clustering with metabolic pathway data.

PubMed

Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando

2014-04-10

It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
Development of a model of the tobacco industry's interference with tobacco control programmes

PubMed Central

Trochim, W; Stillman, F; Clark, P; Schmitt, C

2003-01-01

Objective: To construct a conceptual model of tobacco industry tactics to undermine tobacco control programmes for the purposes of: (1) developing measures to evaluate industry tactics, (2) improving tobacco control planning, and (3) supplementing current or future frameworks used to classify and analyse tobacco industry documents. Design: Web based concept mapping was conducted, including expert brainstorming, sorting, and rating of statements describing industry tactics. Statistical analyses used multidimensional scaling and cluster analysis. Interpretation of the resulting maps was accomplished by an expert panel during a face-to-face meeting. Subjects: 34 experts, selected because of their previous encounters with industry resistance or because of their research into industry tactics, took part in some or all phases of the project. Results: Maps with eight non-overlapping clusters in two dimensional space were developed, with importance ratings of the statements and clusters. Cluster and quadrant labels were agreed upon by the experts. Conclusions: The conceptual maps summarise the tactics used by the industry and their relationships to each other, and suggest a possible hierarchy for measures that can be used in statistical modelling of industry tactics and for review of industry documents. Finally, the maps enable hypothesis of a likely progression of industry reactions as public health programmes become more successful, and therefore more threatening to industry profits. PMID:12773723
Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis.

PubMed

Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I

2016-02-01

Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at-risk groups. Finally, determining predictors of comorbidity for the moderate and severe strata of these phenotypes implies a need to take these factors into account when considering obstructive sleep apnea syndrome treatment options. © 2015 The Authors. Journal of Sleep Research published by John Wiley & Sons Ltd on behalf of European Sleep Research Society.
Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks.

PubMed

Dinov, Martin; Leech, Robert

2017-01-01

Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks

PubMed Central

Dinov, Martin; Leech, Robert

2017-01-01

Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
Identifying Patient Attitudinal Clusters Associated with Asthma Control: The European REALISE Survey.

PubMed

van der Molen, Thys; Fletcher, Monica; Price, David

Asthma is a highly heterogeneous disease that can be classified into different clinical phenotypes, and treatment may be tailored accordingly. However, factors beyond purely clinical traits, such as patient attitudes and behaviors, can also have a marked impact on treatment outcomes. The objective of this study was to further analyze data from the REcognise Asthma and LInk to Symptoms and Experience (REALISE) Europe survey, to identify distinct patient groups sharing common attitudes toward asthma and its management. Factor analysis of respondent data (N = 7,930) from the REALISE Europe survey consolidated the 34 attitudinal variables provided by the study population into a set of 8 summary factors. Cluster analyses were used to identify patient clusters that showed similar attitudes and behaviors toward each of the 8 summary factors. Five distinct patient clusters were identified and named according to the key characteristics comprising that cluster: "Confident and self-managing," "Confident and accepting of their asthma," "Confident but dependent on others," "Concerned but confident in their health care professional (HCP)," and "Not confident in themselves or their HCP." Clusters showed clear variability in attributes such as degree of confidence in managing their asthma, use of reliever and preventer medication, and level of asthma control. The 5 patient clusters identified in this analysis displayed distinctly different personal attitudes that would require different approaches in the consultation room certainly for asthma but probably also for other chronic diseases. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Pattern recognition approach to the subsequent event of damaging earthquakes in Italy

NASA Astrophysics Data System (ADS)

Gentili, S.; Di Giovambattista, R.

2017-05-01

In this study, we investigate the occurrence of large aftershocks following the most significant earthquakes that occurred in Italy after 1980. In accordance with previous studies (Vorobieva and Panza, 1993; Vorobieva, 1999), we group clusters associated with mainshocks into two categories: ;type A; if, given a main shock of magnitude M, the subsequent strongest earthquake in the cluster has magnitude ≥M - 1 or type B otherwise. In this paper, we apply a pattern recognition approach using statistical features to foresee the class of the analysed clusters. The classification of the two categories is based on some features of the time, space, and magnitude distribution of the aftershocks. Specifically, we analyse the temporal evolution of the radiated energy at different elapsed times after the mainshock, the spatio-temporal evolution of the aftershocks occurring within a few days, and the probability of a strong earthquake. An attempt is made to classify the studied region into smaller seismic zones with a prevalence of type A and B clusters. We demonstrate that the two types of clusters have distinct preferred geographic locations inside the Italian territory that likely reflected key properties of the deforming regions, different crustal domains and faulting style. We use decision trees as classifiers of single features to characterize the features depending on the cluster type. The performance of the classification is tested by the Leave-One-Out method. The analysis is performed on different time-spans after the mainshock to simulate the dependence of the accuracy on the information available as data increased over a longer period with increasing time after the mainshock.
Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing.

PubMed

Sun, Peng; Guo, Jiong; Baumbach, Jan

2012-07-17

The explosion of biological data has largely influenced the focus of today’s biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing.

PubMed

Sun, Peng; Guo, Jiong; Baumbach, Jan

2012-06-01

The explosion of biological data has largely influenced the focus of today's biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
Health-related fitness profiles in adolescents with complex congenital heart disease.

PubMed

Klausen, Susanne Hwiid; Wetterslev, Jørn; Søndergaard, Lars; Andersen, Lars L; Mikkelsen, Ulla Ramer; Dideriksen, Kasper; Zoffmann, Vibeke; Moons, Philip

2015-04-01

This study investigates whether subgroups of different health-related fitness (HrF) profiles exist among girls and boys with complex congenital heart disease (ConHD) and how these are associated with lifestyle behaviors. We measured the cardiorespiratory fitness, muscle strength, and body composition of 158 adolescents aged 13-16 years with previous surgery for a complex ConHD. Data on lifestyle behaviors were collected concomitantly between October 2010 and April 2013. A cluster analysis was conducted to identify profiles with similar HrF. For comparisons between clusters, multivariate analyses of covariance were used to test the differences in lifestyle behaviors. Three distinct profiles were formed: (1) Robust (43, 27%; 20 girls and 23 boys); (2) Moderately Robust (85, 54%; 37 girls and 48 boys); and (3) Less robust (30, 19%; 9 girls and 21 boys). The participants in the Robust clusters reported leading a physically active lifestyle and participants in the Less robust cluster reported leading a sedentary lifestyle. Diagnoses were evenly distributed between clusters. The cluster analysis attributed some of the variability in cardiorespiratory fitness among adolescents with complex ConHD to lifestyle behaviors and physical activity. Profiling of HrF offers a valuable new option in the management of person-centered health promotion. Copyright © 2015 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

Constraining the Mass of the Spectacular Pandora's Cluster, Abell 2744

NASA Astrophysics Data System (ADS)

Carrasco, Rodrigo; Frye, Brenda; Coe, Dan; Dupke, Renato; Merten, Julian; Sodre, Laerte; Massey, Richard; Braglia, Filberto; Cypriano, Eduardo; Zitrin, Adi; Krick, Jessica; Benitez, Narciso

2011-08-01

Violent cluster mergers provide a unique opportunity to study the interplay between dark matter (DM) and ICM and to set constraints on the nature of DM. In particular, cluster mergers near first core passage allow us to ``see'' DM by comparing the spatial distribution of the intra-cluster gas (baryonic) to that of DM. We have recently finished a lensing analysis of the particularly interesting merging system, A2744, the Pandora cluster. We found that it is the result of a spectacular merging event, significantly more complex than the "Bullet Cluster", that produced a wide variety of new phenomenologies, among them, a Bullet, a Dark sub-cluster (no gas), a Ghost sub-cluster (no DM), which can provide fundamental insights to the physics of the ICM, and begs further observations. Our analyses revealed 34 arcs produced by strong gravitational lensing, none of which had been published to date. Spectroscopic redshifts of these arcs are essential to determine precise masses of the main merging system providing crucial information for further numerical simulations and to set stronger constraints on the DM self-interaction cross-section. Therefore we are requesting 17.2 hours on Gemini+GMOS-S, primarily to obtain spectroscopic redshifts of multiply strongly lensed arcs produced by this impressive cluster.
A randomised trial of adaptive pacing therapy, cognitive behaviour therapy, graded exercise, and specialist medical care for chronic fatigue syndrome (PACE): statistical analysis plan

PubMed Central

2013-01-01

Background The publication of protocols by medical journals is increasingly becoming an accepted means for promoting good quality research and maximising transparency. Recently, Finfer and Bellomo have suggested the publication of statistical analysis plans (SAPs).The aim of this paper is to make public and to report in detail the planned analyses that were approved by the Trial Steering Committee in May 2010 for the principal papers of the PACE (Pacing, graded Activity, and Cognitive behaviour therapy: a randomised Evaluation) trial, a treatment trial for chronic fatigue syndrome. It illustrates planned analyses of a complex intervention trial that allows for the impact of clustering by care providers, where multiple care-providers are present for each patient in some but not all arms of the trial. Results The trial design, objectives and data collection are reported. Considerations relating to blinding, samples, adherence to the protocol, stratification, centre and other clustering effects, missing data, multiplicity and compliance are described. Descriptive, interim and final analyses of the primary and secondary outcomes are then outlined. Conclusions This SAP maximises transparency, providing a record of all planned analyses, and it may be a resource for those who are developing SAPs, acting as an illustrative example for teaching and methodological research. It is not the sum of the statistical analysis sections of the principal papers, being completed well before individual papers were drafted. Trial registration ISRCTN54285094 assigned 22 May 2003; First participant was randomised on 18 March 2005. PMID:24225069
Classification of municipal occupations.

PubMed

Ilmarinen, J; Suurnäkki, T; Nygård, C H; Landau, K

1991-01-01

Eighty-eight job titles were analyzed with the "ergonomic job analysis procedure" [Arbeitswissenschaftliche Erhebungsverfahren zur Tätigkeits-analyse abbreviated (AET) in German]. The objective was to classify the wide range of municipal jobs into homogeneous groups according to job demand and to provide better possibilities to study the relationships between work and health among the aging municipal working population. Altogether 216 items were classified. First, a hierarchical cluster analysis was made, and a dendrogram of the analyzed job titles was drawn. Second, a profile analysis was done in which the single items were grouped into 39 sum items, and a graphic profile was drawn. Finally, the stress factors were listed and drawn in ranking order. The cluster analysis formed 13 groups. Groups exposed to the highest stress factor level were kitchen supervisors, dentists, and physicians. More than 10 stress factors (greater than 50% of the maximum) were found in nursing, administration, installation, transport, and technical supervision.
[On measuring of factors influencing the complex need for cultural entertainments of the inhabitants in geriatric nursing homes (3rd information) (author's transl)].

PubMed

Kuhlmey, J; Lautsch, E

1980-01-01

In our 2. information on the investigation of the need for cultural entertainments of inhabitants in geriatric nursing homes we tested the influence of the factors age, sex, kind of work and during of stay in the geriatric nursing home singly and successively for each single indicator of this complex need. In this 3. information the influence of this four factors was investigated in these contradictory dependency on the indicators under synchronous consideration of their contradictory dependency. The contradictory dependency of the factors was presented by typisation (cluster analysis). As a result of the cluster analysis same classes arose--similar disposed inhabitants belong to same classes. The average coinage in this classes was obtained and differences were analysed by statistical methods multidimensional analysis of variance and analysis of discriminance).
Analysis of Radiation Damage in Light Water Reactors: Comparison of Cluster Analysis Methods for the Analysis of Atom Probe Data.

PubMed

Hyde, Jonathan M; DaCosta, Gérald; Hatzoglou, Constantinos; Weekes, Hannah; Radiguet, Bertrand; Styman, Paul D; Vurpillot, Francois; Pareige, Cristelle; Etienne, Auriane; Bonny, Giovanni; Castin, Nicolas; Malerba, Lorenzo; Pareige, Philippe

2017-04-01

Irradiation of reactor pressure vessel (RPV) steels causes the formation of nanoscale microstructural features (termed radiation damage), which affect the mechanical properties of the vessel. A key tool for characterizing these nanoscale features is atom probe tomography (APT), due to its high spatial resolution and the ability to identify different chemical species in three dimensions. Microstructural observations using APT can underpin development of a mechanistic understanding of defect formation. However, with atom probe analyses there are currently multiple methods for analyzing the data. This can result in inconsistencies between results obtained from different researchers and unnecessary scatter when combining data from multiple sources. This makes interpretation of results more complex and calibration of radiation damage models challenging. In this work simulations of a range of different microstructures are used to directly compare different cluster analysis algorithms and identify their strengths and weaknesses.
Migratory connectivity and effects of winter temperatures on migratory behaviour of the European robin Erithacus rubecula: a continent-wide analysis.

PubMed

Ambrosini, Roberto; Cuervo, José Javier; du Feu, Chris; Fiedler, Wolfgang; Musitelli, Federica; Rubolini, Diego; Sicurella, Beatrice; Spina, Fernando; Saino, Nicola; Møller, Anders Pape

2016-05-01

Many partially migratory species show phenotypically divergent populations in terms of migratory behaviour, with climate hypothesized to be a major driver of such variability through its differential effects on sedentary and migratory individuals. Based on long-term (1947-2011) bird ringing data, we analysed phenotypic differentiation of migratory behaviour among populations of the European robin Erithacus rubecula across Europe. We showed that clusters of populations sharing breeding and wintering ranges varied from partial (British Isles and Western Europe, NW cluster) to completely migratory (Scandinavia and north-eastern Europe, NE cluster). Distance migrated by birds of the NE (but not of the NW) cluster decreased through time because of a north-eastwards shift in the wintering grounds. Moreover, when winter temperatures in the breeding areas were cold, individuals from the NE cluster also migrated longer distances, while those of the NW cluster moved over shorter distances. Climatic conditions may therefore affect migratory behaviour of robins, although large geographical variation in response to climate seems to exist. © 2016 The Authors. Journal of Animal Ecology © 2016 British Ecological Society.
Ab Initio Electronic Structure Calculation of [4Fe-3S] Cluster of Hydrogenase as Dihydrogen Dissociation/Production Catalyst

NASA Astrophysics Data System (ADS)

Kim, Jaehyun; Kang, Jiyoung; Nishigami, Hiroshi; Kino, Hiori; Tateno, Masaru

2018-03-01

Hydrogenases catalyze both the dissociation and production of dihydrogen (H2). Most hydrogenases are inactivated rapidly and reactivated slowly (in vitro), in the presence of dioxygen (O2) and H2, respectively. However, membrane-bound [NiFe] hydrogenases (MBHs) sustain their activity even together with O2, which is termed "O2 tolerance". In previous experimental analyses, an MBH was shown to include a hydroxyl ion (OH-) bound to an Fe of the super-oxidized [4Fe-3S]5+ cluster in the proximity of the [NiFe] catalytic cluster. In this study, the functional role of the OH- in the O2 tolerance was investigated by ab initio electronic structure calculation of the [4Fe-3S] proximal cluster. The analysis revealed that the OH- significantly altered the electronic structure, thereby inducing the delocalization of the lowest unoccupied molecular orbital (LUMO) toward the [NiFe] catalytic cluster, which may intermediate the electron transfer between the catalytic and proximal clusters. This can promote the O2-tolerant catalytic cycle in the hydrogenase reaction.
Representation of Tinnitus in the US Newspaper Media and in Facebook Pages: Cross-Sectional Analysis of Secondary Data

PubMed Central

Ratinaud, Pierre; Andersson, Gerhard

2018-01-01

Background When people with health conditions begin to manage their health issues, one important issue that emerges is the question as to what exactly do they do with the information that they have obtained through various sources (eg, news media, social media, health professionals, friends, and family). The information they gather helps form their opinions and, to some degree, influences their attitudes toward managing their condition. Objective This study aimed to understand how tinnitus is represented in the US newspaper media and in Facebook pages (ie, social media) using text pattern analysis. Methods This was a cross-sectional study based upon secondary analyses of publicly available data. The 2 datasets (ie, text corpuses) analyzed in this study were generated from US newspaper media during 1980-2017 (downloaded from the database US Major Dailies by ProQuest) and Facebook pages during 2010-2016. The text corpuses were analyzed using the Iramuteq software using cluster analysis and chi-square tests. Results The newspaper dataset had 432 articles. The cluster analysis resulted in 5 clusters, which were named as follows: (1) brain stimulation (26.2%), (2) symptoms (13.5%), (3) coping (19.8%), (4) social support (24.2%), and (5) treatment innovation (16.4%). A time series analysis of clusters indicated a change in the pattern of information presented in newspaper media during 1980-2017 (eg, more emphasis on cluster 5, focusing on treatment inventions). The Facebook dataset had 1569 texts. The cluster analysis resulted in 7 clusters, which were named as: (1) diagnosis (21.9%), (2) cause (4.1%), (3) research and development (13.6%), (4) social support (18.8%), (5) challenges (11.1%), (6) symptoms (21.4%), and (7) coping (9.2%). A time series analysis of clusters indicated no change in information presented in Facebook pages on tinnitus during 2011-2016. Conclusions The study highlights the specific aspects about tinnitus that the US newspaper media and Facebook pages focus on, as well as how these aspects change over time. These findings can help health care providers better understand the presuppositions that tinnitus patients may have. More importantly, the findings can help public health experts and health communication experts in tailoring health information about tinnitus to promote self-management, as well as assisting in appropriate choices of treatment for those living with tinnitus. PMID:29739734
Understanding Statistical Power in Cluster Randomized Trials: Challenges Posed by Differences in Notation and Terminology

ERIC Educational Resources Information Center

Spybrook, Jessaca; Hedges, Larry; Borenstein, Michael

2014-01-01

Research designs in which clusters are the unit of randomization are quite common in the social sciences. Given the multilevel nature of these studies, the power analyses for these studies are more complex than in a simple individually randomized trial. Tools are now available to help researchers conduct power analyses for cluster randomized…
An investigation on thermal patterns in Iran based on spatial autocorrelation

NASA Astrophysics Data System (ADS)

Fallah Ghalhari, Gholamabbas; Dadashi Roudbari, Abbasali

2018-02-01

The present study aimed at investigating temporal-spatial patterns and monthly patterns of temperature in Iran using new spatial statistical methods such as cluster and outlier analysis, and hotspot analysis. To do so, climatic parameters, monthly average temperature of 122 synoptic stations, were assessed. Statistical analysis showed that January with 120.75% had the most fluctuation among the studied months. Global Moran's Index revealed that yearly changes of temperature in Iran followed a strong spatially clustered pattern. Findings showed that the biggest thermal cluster pattern in Iran, 0.975388, occurred in May. Cluster and outlier analyses showed that thermal homogeneity in Iran decreases in cold months, while it increases in warm months. This is due to the radiation angle and synoptic systems which strongly influence thermal order in Iran. The elevations, however, have the most notable part proved by Geographically weighted regression model. Iran's thermal analysis through hotspot showed that hot thermal patterns (very hot, hot, and semi-hot) were dominant in the South, covering an area of 33.5% (about 552,145.3 km2). Regions such as mountain foot and low lands lack any significant spatial autocorrelation, 25.2% covering about 415,345.1 km2. The last is the cold thermal area (very cold, cold, and semi-cold) with about 25.2% covering about 552,145.3 km2 of the whole area of Iran.
Wing morphometrics as a possible tool for the diagnosis of the Ceratitis fasciventris, C. anonae, C. rosa complex (Diptera, Tephritidae).

PubMed

Van Cann, Joannes; Virgilio, Massimiliano; Jordaens, Kurt; De Meyer, Marc

2015-01-01

Previous attempts to resolve the Ceratitis FAR complex (Ceratitis fasciventris, Ceratitis anonae, Ceratitis rosa, Diptera, Tephritidae) showed contrasting results and revealed the occurrence of five microsatellite genotypic clusters (A, F1, F2, R1, R2). In this paper we explore the potential of wing morphometrics for the diagnosis of FAR morphospecies and genotypic clusters. We considered a set of 227 specimens previously morphologically identified and genotyped at 16 microsatellite loci. Seventeen wing landmarks and 6 wing band areas were used for morphometric analyses. Permutational multivariate analysis of variance detected significant differences both across morphospecies and genotypic clusters (for both males and females). Unconstrained and constrained ordinations did not properly resolve groups corresponding to morphospecies or genotypic clusters. However, posterior group membership probabilities (PGMPs) of the Discriminant Analysis of Principal Components (DAPC) allowed the consistent identification of a relevant proportion of specimens (but with performances differing across morphospecies and genotypic clusters). This study suggests that wing morphometrics and PGMPs might represent a possible tool for the diagnosis of species within the FAR complex. Here, we propose a tentative diagnostic method and provide a first reference library of morphometric measures that might be used for the identification of additional and unidentified FAR specimens.
Dark Energy Survey Year 1 Results: Methodology and Projections for Joint Analysis of Galaxy Clustering, Galaxy Lensing, and CMB Lensing Two-point Functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Giannantonio, T.; et al.

Optical imaging surveys measure both the galaxy density and the gravitational lensing-induced shear fields across the sky. Recently, the Dark Energy Survey (DES) collaboration used a joint fit to two-point correlations between these observables to place tight constraints on cosmology (DES Collaboration et al. 2017). In this work, we develop the methodology to extend the DES Collaboration et al. (2017) analysis to include cross-correlations of the optical survey observables with gravitational lensing of the cosmic microwave background (CMB) as measured by the South Pole Telescope (SPT) and Planck. Using simulated analyses, we show how the resulting set of five two-pointmore » functions increases the robustness of the cosmological constraints to systematic errors in galaxy lensing shear calibration. Additionally, we show that contamination of the SPT+Planck CMB lensing map by the thermal Sunyaev-Zel'dovich effect is a potentially large source of systematic error for two-point function analyses, but show that it can be reduced to acceptable levels in our analysis by masking clusters of galaxies and imposing angular scale cuts on the two-point functions. The methodology developed here will be applied to the analysis of data from the DES, the SPT, and Planck in a companion work.« less
Visualizing statistical significance of disease clusters using cartograms.

PubMed

Kronenfeld, Barry J; Wong, David W S

2017-05-15

Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.
Disrupted Cerebro-cerebellar Intrinsic Functional Connectivity in Young Adults with High-functioning Autism Spectrum Disorder: A Data-driven, Whole-brain, High Temporal Resolution fMRI Study.

PubMed

Arnold Anteraper, Sheeba; Guell, Xavier; D'Mello, Anila; Joshi, Neha; Whitfield-Gabrieli, Susan; Joshi, Gagan

2018-06-13

To examine the resting-state functional-connectivity (RsFc) in young adults with high-functioning autism spectrum disorder (HF-ASD) using state-of-the-art fMRI data acquisition and analysis techniques. Simultaneous multi-slice, high temporal resolution fMRI acquisition; unbiased whole-brain connectome-wide multivariate pattern analysis (MVPA) techniques for assessing RsFc; and post-hoc whole-brain seed-to-voxel analyses using MVPA results as seeds. MVPA revealed two clusters of abnormal connectivity in the cerebellum. Whole-brain seed-based functional connectivity analyses informed by MVPA-derived clusters showed significant under connectivity between the cerebellum and social, emotional, and language brain regions in the HF-ASD group compared to healthy controls. The results we report are coherent with existing structural, functional, and RsFc literature in autism, extend previous literature reporting cerebellar abnormalities in the neuropathology of autism, and highlight the cerebellum as a potential target for therapeutic, diagnostic, predictive, and prognostic developments in ASD. The description of functional connectivity abnormalities using whole-brain, data-driven analyses as reported in the present study may crucially advance the development of ASD biomarkers, targets for therapeutic interventions, and neural predictors for measuring treatment response.
Potential Environmental Justice (EJ) areas in Region 2 based on 2000 Census [EPA.EJAREAS_2000

EPA Pesticide Factsheets

Potential Environmental Justice (EJ) areas in Region 2 . This dataset was derived from 2000 census data and based on the criteria setforth in the Region 2 Interim Environmental Justice Policy. The two criteria for Region 2's EJ demographic analysis are percent poverty and percent minority. The percent minority and percent poverty numbers for each blockgroup are compared to the benchmark value for the state. Census blockgroups with percent poverty or percent minority higher than the state threshold are considered potential EJ areas. The cutoffs for each state were derived by using the statistical method - cluster analysis.Cluster analysis was chosen as the most objective way of evaluating the demographic data and determining cutoff values for minority and low income. With cluster analysis, data are divided into two distinct groups (e.g., minority and non-minority, and low income and non-low income). Cluster analysis examines natural breaks of the data. Separate analyses were conducted for minority and low income, respectively, for each State. All census block groups within a State were ranked in descending order according to the demographic factor under evaluation. This resulted in a ranking for percent minority by block group and a separate ranking for percent low income by block group. An iterative process was employed where the data were (1) split into two groups; (2) the means for each of the two groups were calculated; (3) the difference between the
Hemodynamic Response to Interictal Epileptiform Discharges Addressed by Personalized EEG-fNIRS Recordings

PubMed Central

Pellegrino, Giovanni; Machado, Alexis; von Ellenrieder, Nicolas; Watanabe, Satsuki; Hall, Jeffery A.; Lina, Jean-Marc; Kobayashi, Eliane; Grova, Christophe

2016-01-01

Objective: We aimed at studying the hemodynamic response (HR) to Interictal Epileptic Discharges (IEDs) using patient-specific and prolonged simultaneous ElectroEncephaloGraphy (EEG) and functional Near InfraRed Spectroscopy (fNIRS) recordings. Methods: The epileptic generator was localized using Magnetoencephalography source imaging. fNIRS montage was tailored for each patient, using an algorithm to optimize the sensitivity to the epileptic generator. Optodes were glued using collodion to achieve prolonged acquisition with high quality signal. fNIRS data analysis was handled with no a priori constraint on HR time course, averaging fNIRS signals to similar IEDs. Cluster-permutation analysis was performed on 3D reconstructed fNIRS data to identify significant spatio-temporal HR clusters. Standard (GLM with fixed HRF) and cluster-permutation EEG-fMRI analyses were performed for comparison purposes. Results: fNIRS detected HR to IEDs for 8/9 patients. It mainly consisted oxy-hemoglobin increases (seven patients), followed by oxy-hemoglobin decreases (six patients). HR was lateralized in six patients and lasted from 8.5 to 30 s. Standard EEG-fMRI analysis detected an HR in 4/9 patients (4/9 without enough IEDs, 1/9 unreliable result). The cluster-permutation EEG-fMRI analysis restricted to the region investigated by fNIRS showed additional strong and non-canonical BOLD responses starting earlier than the IEDs and lasting up to 30 s. Conclusions: (i) EEG-fNIRS is suitable to detect the HR to IEDs and can outperform EEG-fMRI because of prolonged recordings and greater chance to detect IEDs; (ii) cluster-permutation analysis unveils additional HR features underestimated when imposing a canonical HR function (iii) the HR is often bilateral and lasts up to 30 s. PMID:27047325
Within-Group Differences in Sexual Orientation and Identity

ERIC Educational Resources Information Center

Worthington, Roger L.; Reynolds, Amy L.

2009-01-01

The purpose of this investigation was to examine within-group differences among self-identified sexual orientation and identity groups. To understand these within-group differences, 2 types of analysis were conducted. First, a sample of 2,732 participants completed the Sexual Orientation and Identity Scale. Cluster analyses were used to identify 3…
Substitutability and Independence: Matching Analyses of Brands and Products

ERIC Educational Resources Information Center

Foxall, Gordon R.; Wells, Victoria K.; Chang, Shing Wan; Oliveira-Castro, Jorge M.

2010-01-01

This article presents a comprehensive examination of panel data for 1,847 consumers and 2,209 brands of "biscuits" (a total of 76,682 records) in which matching analysis is employed to define brand substitutability and potential product clusters within the overall category. The results indicate that, while brands performed as expected as perfect…
Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis

USDA-ARS?s Scientific Manuscript database

In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T formed a cluster with 5 other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these ot...
Associations Across Caregiver and Care Recipient Symptoms: Self-Organizing Map and Meta-analysis.

PubMed

Voutilainen, Ari; Ruokostenpohja, Nora; Välimäki, Tarja

2018-03-19

The main objective of this study was to reveal generalizable associations across caregiver burden (CGB), caregiver depression (CGD), care recipient cognitive ability (CRCA), and care recipient behavioral and psychological symptoms of dementia (BPSD). Studies published between 2004 and 2014 and reporting CGB and/or CGD together with CRCA and/or BPSD were included. Only 95 out of 1,955 studies provided enough data for data clustering with the Self-Organizing Map (SOM) and 27 of them for meta-analyses based on correlation coefficients. Caregiver and care recipient symptoms were not tightly associated with each other, except for the CGB-BPSD interaction at the individual level. SOM emphasized the cluster comprising studies reporting low CGB, low CGD, high CRCA, and few BPSD. Meta-analyses indicated high heterogeneity between the original studies. Relationships between caregiver and care recipient symptoms should be treated as situation-specific phenomena, at least when the symptoms are moderate at most. Dementia caregiving per se should not be understood as a source of stress and mental health problems. More systematic and coherent use of measures is necessary to enable a comprehensive analysis of caregiving.

Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

PubMed Central

Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

2011-01-01

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874
Comparative microbial modules resource: generation and visualization of multi-species biclusters.

PubMed

Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

2011-12-01

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.
Cluster analysis of obsessive-compulsive spectrum disorders in patients with obsessive-compulsive disorder: clinical and genetic correlates.

PubMed

Lochner, Christine; Hemmings, Sian M J; Kinnear, Craig J; Niehaus, Dana J H; Nel, Daniel G; Corfield, Valerie A; Moolman-Smook, Johanna C; Seedat, Soraya; Stein, Dan J

2005-01-01

Comorbidity of certain obsessive-compulsive spectrum disorders (OCSDs; such as Tourette's disorder) in obsessive-compulsive disorder (OCD) may serve to define important OCD subtypes characterized by differing phenomenology and neurobiological mechanisms. Comorbidity of the putative OCSDs in OCD has, however, not often been systematically investigated. The Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition , Axis I Disorders-Patient Version as well as a Structured Clinical Interview for Putative OCSDs (SCID-OCSD) were administered to 210 adult patients with OCD (N = 210, 102 men and 108 women; mean age, 35.7 +/- 13.3). A subset of Caucasian subjects (with OCD, n = 171; control subjects, n = 168), including subjects from the genetically homogeneous Afrikaner population (with OCD, n = 77; control subjects, n = 144), was genotyped for polymorphisms in genes involved in monoamine function. Because the items of the SCID-OCSD are binary (present/absent), a cluster analysis (Ward's method) using the items of SCID-OCSD was conducted. The association of identified clusters with demographic variables (age, gender), clinical variables (age of onset, obsessive-compulsive symptom severity and dimensions, level of insight, temperament/character, treatment response), and monoaminergic genotypes was examined. Cluster analysis of the OCSDs in our sample of patients with OCD identified 3 separate clusters at a 1.1 linkage distance level. The 3 clusters were named as follows: (1) "reward deficiency" (including trichotillomania, Tourette's disorder, pathological gambling, and hypersexual disorder), (2) "impulsivity" (including compulsive shopping, kleptomania, eating disorders, self-injury, and intermittent explosive disorder), and (3) "somatic" (including body dysmorphic disorder and hypochondriasis). Several significant associations were found between cluster scores and other variables; for example, cluster I scores were associated with earlier age of onset of OCD and the presence of tics, cluster II scores were associated with female gender and childhood emotional abuse, and cluster III scores were associated with less insight and with somatic obsessions and compulsions. However, none of these clusters were associated with any particular genetic variant. Analysis of comorbid OCSDs in OCD suggested that these lie on a number of different dimensions. These dimensions are partially consistent with previous theoretical approaches taken toward classifying OCD spectrum disorders. The lack of genetic validation of these clusters in the present study may indicate the involvement of other, as yet untested, genes. Further genetic and cluster analyses of comorbid OCSDs in OCD may ultimately contribute to a better delineation of OCD endophenotypes.
WAIS-III index score profiles in the Canadian standardization sample.

PubMed

Lange, Rael T

2007-01-01

Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.
Cluster Physics with Merging Galaxy Clusters

NASA Astrophysics Data System (ADS)

Molnar, Sandor

Collisions between galaxy clusters provide a unique opportunity to study matter in a parameter space which cannot be explored in our laboratories on Earth. In the standard ΛCDM model, where the total density is dominated by the cosmological constant (Λ) and the matter density by cold dark matter (CDM), structure formation is hierarchical, and clusters grow mostly by merging. Mergers of two massive clusters are the most energetic events in the universe after the Big Bang, hence they provide a unique laboratory to study cluster physics. The two main mass components in clusters behave differently during collisions: the dark matter is nearly collisionless, responding only to gravity, while the gas is subject to pressure forces and dissipation, and shocks and turbulence are developed during collisions. In the present contribution we review the different methods used to derive the physical properties of merging clusters. Different physical processes leave their signatures on different wavelengths, thus our review is based on a multifrequency analysis. In principle, the best way to analyze multifrequency observations of merging clusters is to model them using N-body/HYDRO numerical simulations. We discuss the results of such detailed analyses. New high spatial and spectral resolution ground and space based telescopes will come online in the near future. Motivated by these new opportunities, we briefly discuss methods which will be feasible in the near future in studying merging clusters.
Molecular analysis of hepatitis A virus strains obtained from patients with acute hepatitis A in Mongolia, 2004-2013.

PubMed

Tsatsralt-Od, Bira; Baasanjav, Nachin; Nyamkhuu, Dulmaa; Ohnishi, Hiroshi; Takahashi, Masaharu; Kobayashi, Tominari; Nagashima, Shigeo; Nishizawa, Tsutomu; Okamoto, Hiroaki

2016-04-01

Despite the high endemicity of hepatitis A virus (HAV) in Mongolia, the genetic information on those HAV strains is limited. Serum samples obtained from 935 patients with acute hepatitis in Ulaanbaatar, Mongolia during 2004-2013 were tested for the presence of HAV RNA using reverse transcription-PCR with primers targeting the VP1-2B region (481 nucleotides, primer sequences at both ends excluded). Overall, 180 patients (19.3%) had detectable HAV RNA. These 180 isolates shared 94.6-100% identity and formed four phylogenetic clusters within subgenotype IA. One or three representative HAV isolates from each cluster exhibited 2.6-3.9% difference between clusters over the entire genome. Cluster 1 accounted for 65.0% of the total, followed by Cluster 2 (30.6%), Cluster 3 (3.3%), and Cluster 4 (1.1%). Clusters 1 and 2 were predominant throughout the observation period, whereas Cluster 3 was undetectable in 2009 and 2013 and Cluster 4 became undetectable after 2009. The Mongolian HAV isolates were closest to those of Chinese or Japanese origin (97.7-98.5% identities over the entire genome), suggesting the evolution from a common ancestor with those circulating in China and Japan. Further molecular epidemiological analyses of HAV infection are necessary to investigate the factors underlying the spread of HAV and to implement appropriate prevention measures in Mongolia. © 2015 Wiley Periodicals, Inc.
Joining X-Ray to Lensing: An Accurate Combined Analysis of MACS J0416.1-2403

NASA Astrophysics Data System (ADS)

Bonamigo, M.; Grillo, C.; Ettori, S.; Caminha, G. B.; Rosati, P.; Mercurio, A.; Annunziatella, M.; Balestra, I.; Lombardi, M.

2017-06-01

We present a novel approach for a combined analysis of X-ray and gravitational lensing data and apply this technique to the merging galaxy cluster MACS J0416.1-2403. The method exploits the information on the intracluster gas distribution that comes from a fit of the X-ray surface brightness and then includes the hot gas as a fixed mass component in the strong-lensing analysis. With our new technique, we can separate the collisional from the collision-less diffuse mass components, thus obtaining a more accurate reconstruction of the dark matter distribution in the core of a cluster. We introduce an analytical description of the X-ray emission coming from a set of dual pseudo-isothermal elliptical mass distributions, which can be directly used in most lensing softwares. By combining Chandra observations with Hubble Frontier Fields imaging and Multi Unit Spectroscopic Explorer spectroscopy in MACS J0416.1-2403, we measure a projected gas-to-total mass fraction of approximately 10% at 350 kpc from the cluster center. Compared to the results of a more traditional cluster mass model (diffuse halos plus member galaxies), we find a significant difference in the cumulative projected mass profile of the dark matter component and that the dark matter over total mass fraction is almost constant, out to more than 350 kpc. In the coming era of large surveys, these results show the need of multiprobe analyses for detailed dark matter studies in galaxy clusters.
Classification of Forefoot Plantar Pressure Distribution in Persons with Diabetes: A Novel Perspective for the Mechanical Management of Diabetic Foot?

PubMed Central

Deschamps, Kevin; Matricali, Giovanni Arnoldo; Roosen, Philip; Desloovere, Kaat; Bruyninckx, Herman; Spaepen, Pieter; Nobels, Frank; Tits, Jos; Flour, Mieke; Staes, Filip

2013-01-01

Background The aim of this study was to identify groups of subjects with similar patterns of forefoot loading and verify if specific groups of patients with diabetes could be isolated from non-diabetics. Methodology/Principal Findings Ninety-seven patients with diabetes and 33 control participants between 45 and 70 years were prospectively recruited in two Belgian Diabetic Foot Clinics. Barefoot plantar pressure measurements were recorded and subsequently analysed using a semi-automatic total mapping technique. Kmeans cluster analysis was applied on relative regional impulses of six forefoot segments in order to pursue a classification for the control group separately, the diabetic group separately and both groups together. Cluster analysis led to identification of three distinct groups when considering only the control group. For the diabetic group, and the computation considering both groups together, four distinct groups were isolated. Compared to the cluster analysis of the control group an additional forefoot loading pattern was identified. This group comprised diabetic feet only. The relevance of the reported clusters was supported by ANOVA statistics indicating significant differences between different regions of interest and different clusters. Conclusion/s Significance There seems to emerge a new era in diabetic foot medicine which embraces the classification of diabetic patients according to their biomechanical profile. Classification of the plantar pressure distribution has the potential to provide a means to determine mechanical interventions for the prevention and/or treatment of the diabetic foot. PMID:24278219
Temporal and spatial analysis of psittacosis in association with poultry farming in the Netherlands, 2000-2015.

PubMed

Hogerwerf, Lenny; Holstege, Manon M C; Benincà, Elisa; Dijkstra, Frederika; van der Hoek, Wim

2017-07-26

Human psittacosis is a highly under diagnosed zoonotic disease, commonly linked to psittacine birds. Psittacosis in birds, also known as avian chlamydiosis, is endemic in poultry, but the risk for people living close to poultry farms is unknown. Therefore, our study aimed to explore the temporal and spatial patterns of human psittacosis infections and identify possible associations with poultry farming in the Netherlands. We analysed data on 700 human cases of psittacosis notified between 01-01-2000 and 01-09-2015. First, we studied the temporal behaviour of psittacosis notifications by applying wavelet analysis. Then, to identify possible spatial patterns, we applied spatial cluster analysis. Finally, we investigated the possible spatial association between psittacosis notifications and data on the Dutch poultry sector at municipality level using a multivariable model. We found a large spatial cluster that covered a highly poultry-dense area but additional clusters were found in areas that had a low poultry density. There were marked geographical differences in the awareness of psittacosis and the amount and the type of laboratory diagnostics used for psittacosis, making it difficult to draw conclusions about the correlation between the large cluster and poultry density. The multivariable model showed that the presence of chicken processing plants and slaughter duck farms in a municipality was associated with a higher rate of human psittacosis notifications. The significance of the associations was influenced by the inclusion or exclusion of farm density in the model. Our temporal and spatial analyses showed weak associations between poultry-related variables and psittacosis notifications. Because of the low number of psittacosis notifications available for analysis, the power of our analysis was relative low. Because of the exploratory nature of this research, the associations found cannot be interpreted as evidence for airborne transmission of psittacosis from poultry to the general population. Further research is needed to determine the prevalence of C. psittaci in Dutch poultry. Also, efforts to promote PCR-based testing for C. psittaci and genotyping for source tracing are important to reduce the diagnostic deficit, and to provide better estimates of the human psittacosis burden, and the possible role of poultry.
A Phylogenetic Analysis of the Genus Fragaria (Strawberry) Using Intron-Containing Sequence from the ADH-1 Gene

PubMed Central

DiMeglio, Laura M.; Yu, Hongrun; Davis, Thomas M.

2014-01-01

The genus Fragaria encompasses species at ploidy levels ranging from diploid to decaploid. The cultivated strawberry, Fragaria×ananassa, and its two immediate progenitors, F. chiloensis and F. virginiana, are octoploids. To elucidate the ancestries of these octoploid species, we performed a phylogenetic analysis using intron-containing sequences of the nuclear ADH-1 gene from 39 germplasm accessions representing nineteen Fragaria species and one outgroup species, Dasiphora fruticosa. All trees from Maximum Parsimony and Maximum Likelihood analyses showed two major clades, Clade A and Clade B. Each of the sampled octoploids contributed alleles to both major clades. All octoploid-derived alleles in Clade A clustered with alleles of diploid F. vesca, with the exception of one octoploid allele that clustered with the alleles of diploid F. mandshurica. All octoploid-derived alleles in clade B clustered with the alleles of only one diploid species, F. iinumae. When gaps encoded as binary characters were included in the Maximum Parsimony analysis, tree resolution was improved with the addition of six nodes, and the bootstrap support was generally higher, rising above the 50% threshold for an additional nine branches. These results, coupled with the congruence of the sequence data and the coded gap data, validate and encourage the employment of sequence sets containing gaps for phylogenetic analysis. Our phylogenetic conclusions, based upon sequence data from the ADH-1 gene located on F. vesca linkage group II, complement and generally agree with those obtained from analyses of protein-encoding genes GBSSI-2 and DHAR located on F. vesca linkage groups V and VII, respectively, but differ from a previous study that utilized rDNA sequences and did not detect the ancestral role of F. iinumae. PMID:25078607
Beverage consumption patterns of Canadian adults aged 19 to 65 years.

PubMed

Nikpartow, Nooshin; Danyliw, Adrienne D; Whiting, Susan J; Lim, Hyun J; Vatanparast, Hassanali

2012-12-01

To investigate the beverage intake patterns of Canadian adults and explore characteristics of participants in different beverage clusters. Analyses of nationally representative data with cross-sectional complex stratified design. Canadian Community Health Survey, Cycle 2.2 (2004). A total of 14 277 participants aged 19-65 years, in whom dietary intake was assessed using a single 24 h recall, were included in the study. After determining total intake and the contribution of beverages to total energy intake among age/sex groups, cluster analysis (K-means method) was used to classify males and females into distinct clusters based on the dominant pattern of beverage intakes. To test differences across clusters, χ2 tests and 95 % confidence intervals of the mean intakes were used. Six beverage clusters in women and seven beverage clusters in men were identified. 'Sugar-sweetened' beverage clusters - regular soft drinks and fruit drinks - as well as a 'beer' cluster, appeared for both men and women. No 'milk' cluster appeared among women. The mean consumption of the dominant beverage in each cluster was higher among men than women. The 'soft drink' cluster in men had the lowest proportion of the higher levels of education, and in women the highest proportion of inactivity, compared with other beverage clusters. Patterns of beverage intake in Canadian women indicate high consumption of sugar-sweetened beverages particularly fruit drinks, low intake of milk and high intake of beer. These patterns in women have implications for poor bone health, risk of obesity and other morbidities.
Typology of people with first-episode psychosis.

PubMed

Subramaniam, Mythily; Zheng, Huili; Soh, Pauline; Poon, Lye Yin; Vaingankar, Janhavi A; Chong, Siow Ann; Verma, Swapna

2016-08-01

The aim of the current study was to create a typology of patients with first-episode psychosis based on sociodemographic and clinical characteristics, service use and outcomes using cluster analysis. Data from all respondents who were accepted into the Early Psychosis Intervention Programme (EPIP), Singapore from 2007 to 2011 were analysed. A two-step clustering method was carried out to classify the patients into distinct clusters. Two clusters were identified. Cluster 1 comprised largely of younger people with mean age of 25.5 (6.0) years at treatment contact, who were predominantly male (55.3%), single (98.3%) and living with parents (86.3%). Cluster 1 had a higher proportion of people diagnosed with the schizophrenia spectrum disorder (71.4%) and with a positive family history of psychiatric illness. Patients in cluster 2 were generally older with a mean age of 33.6 (4.7) years and the majority were women (74.2%). Cluster 1 had people with higher Positive and Negative Syndrome Scale (PANSS) scores at baseline as compared with cluster 2. After a 1-year follow up, their scores were still poorer than their counterparts in cluster 2, especially for PANSS negative score. The functioning level of people in cluster 1 showed less improvement than the people in cluster 2 after a year of treatment. There is a compelling need to develop new therapies and intensively treat young people presenting with psychosis as this group tends to have poorer outcomes even after 1 year of treatment. © 2014 Wiley Publishing Asia Pty Ltd.
Characterizing cognitive heterogeneity on the schizophrenia-bipolar disorder spectrum.

PubMed

Van Rheenen, T E; Lewandowski, K E; Tan, E J; Ospina, L H; Ongur, D; Neill, E; Gurvich, C; Pantelis, C; Malhotra, A K; Rossell, S L; Burdick, K E

2017-07-01

Current group-average analysis suggests quantitative but not qualitative cognitive differences between schizophrenia (SZ) and bipolar disorder (BD). There is increasing recognition that cognitive within-group heterogeneity exists in both disorders, but it remains unclear as to whether between-group comparisons of performance in cognitive subgroups emerging from within each of these nosological categories uphold group-average findings. We addressed this by identifying cognitive subgroups in large samples of SZ and BD patients independently, and comparing their cognitive profiles. The utility of a cross-diagnostic clustering approach to understanding cognitive heterogeneity in these patients was also explored. Hierarchical clustering analyses were conducted using cognitive data from 1541 participants (SZ n = 564, BD n = 402, healthy control n = 575). Three qualitatively and quantitatively similar clusters emerged within each clinical group: a severely impaired cluster, a mild-moderately impaired cluster and a relatively intact cognitive cluster. A cross-diagnostic clustering solution also resulted in three subgroups and was superior in reducing cognitive heterogeneity compared with disorder clustering independently. Quantitative SZ-BD cognitive differences commonly seen using group averages did not hold when cognitive heterogeneity was factored into our sample. Members of each corresponding subgroup, irrespective of diagnosis, might be manifesting the outcome of differences in shared cognitive risk factors.
International linkage of two food-borne hepatitis A clusters through traceback of mussels, the Netherlands, 2012.

PubMed

Boxman, Ingeborg L A; Verhoef, Linda; Vennema, Harry; Ngui, Siew-Lin; Friesema, Ingrid H M; Whiteside, Chris; Lees, David; Koopmans, Marion

2016-01-01

This report describes an outbreak investigation starting with two closely related suspected food-borne clusters of Dutch hepatitis A cases, nine primary cases in total, with an unknown source in the Netherlands. The hepatitis A virus (HAV) genotype IA sequences of both clusters were highly similar (459/460 nt) and were not reported earlier. Food questionnaires and a case-control study revealed an association with consumption of mussels. Analysis of mussel supply chains identified the most likely production area. International enquiries led to identification of a cluster of patients near this production area with identical HAV sequences with onsets predating the first Dutch cluster of cases. The most likely source for this cluster was a case who returned from an endemic area in Central America, and a subsequent household cluster from which treated domestic sewage was discharged into the suspected mussel production area. Notably, mussels from this area were also consumed by a separate case in the United Kingdom sharing an identical strain with the second Dutch cluster. In conclusion, a small number of patients in a non-endemic area led to geographically dispersed hepatitis A outbreaks with food as vehicle. This link would have gone unnoticed without sequence analyses and international collaboration.
Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

PubMed

Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

2017-11-01

Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Condom Use among Immigrant Latino Sexual Minorities: Multilevel Analysis after Respondent-Driven Sampling

PubMed Central

Rhodes, Scott D.; McCoy, Thomas P.

2014-01-01

This study explored correlates of condom use within a respondent-driven sample of 190 Spanish-speaking immigrant Latino sexual minorities, including gay and bisexual men, other men who have sex with men (MSM), and transgender person, in North Carolina. Five analytic approaches for modeling data collected using respondent-driven sampling (RDS) were compared. Across most approaches, knowledge of HIV and sexually transmitted infections (STIs) and increased condom use self-efficacy predicted consistent condom use and increased homophobia predicted decreased consistent condom use. The same correlates were not significant in all analyses but were consistent in most. Clustering due to recruitment chains was low, while clustering due to recruiter was substantial. This highlights the importance accounting for clustering when analyzing RDS data. PMID:25646728
Application of cluster and discriminant analyses to diagnose lithological heterogeneity of the parent material according to its particle-size distribution

NASA Astrophysics Data System (ADS)

Giniyatullin, K. G.; Valeeva, A. A.; Smirnova, E. V.

2017-08-01

Particle-size distribution in soddy-podzolic and light gray forest soils of the Botanical Garden of Kazan Federal University has been studied. The cluster analysis of data on the samples from genetic soil horizons attests to the lithological heterogeneity of the profiles of all the studied soils. It is probable that they are developed from the two-layered sediments with the upper colluvial layer underlain by the alluvial layer. According to the discriminant analysis, the major contribution to the discrimination of colluvial and alluvial layers is that of the fraction >0.25 mm. The results of canonical analysis show that there is only one significant discriminant function that separates alluvial and colluvial sediments on the investigated territory. The discriminant function correlates with the contents of fractions 0.05-0.01, 0.25-0.05, and >0.25 mm. Classification functions making it possible to distinguish between alluvial and colluvial sediments have been calculated. Statistical assessment of particle-size distribution data obtained for the plow horizons on ten plowed fields within the garden indicates that this horizon is formed from colluvial sediments. We conclude that the contents of separate fractions and their ratios cannot be used as a universal criterion of the lithological heterogeneity. However, adequate combination of the cluster and discriminant analyses makes it possible to give a comprehensive assessment of the lithology of soil samples from data on the contents of sand and silt fractions, which considerably increases the information value and reliability of the results.
A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

PubMed

Rajan, Vaibhav

2013-03-01

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, automated methods are necessary to handle the much larger data sets being prepared today. In this study, we introduce the concept of subsplits and demonstrate their use in extracting phylogenetic signal from alignments. We design a clustering approach for alignment masking where each cluster contains similar columns-similarity being defined on the basis of compatible subsplits; our approach then identifies noisy clusters and eliminates them. Trees inferred from the columns in the retained clusters are found to be topologically closer to the reference trees. We test our method on numerous standard benchmarks (both synthetic and biological data sets) and compare its performance with other methods of alignment masking. We find that our method can eliminate sites more accurately than other methods, particularly on divergent data, and can improve the topologies of the inferred trees in likelihood-based analyses. Software available upon request from the author.
Fingerprints of resistant Escherichia coli O157:H7 from vegetables and environmental samples.

PubMed

Abakpa, Grace Onyukwo; Umoh, Veronica J; Kamaruzaman, Sijam; Ibekwe, Mark

2018-01-01

Some routes of transmission of Escherichia coli O157:H7 to fresh produce include contaminated irrigation water and manure polluted soils. The aim of the present study was to determine the genetic relationships of E. coli O157:H7 isolated from some produce growing region in Nigeria using enterobacterial repetitive intergenic consensus (ERIC) DNA fingerprinting analysis. A total of 440 samples comprising leafy greens, irrigation water, manure and soil were obtained from vegetable producing regions in Kano and Plateau States, Nigeria. Genes coding for the quinolone resistance-determinant (gyrA) and plasmid (pCT) coding for multidrug resistance (MDR) were determined using polymerase chain reaction (PCR) in 16 isolates that showed MDR. Cluster analysis of the ERIC-PCR profiles based on band sizes revealed six main clusters from the sixteen isolates analysed. The largest cluster (cluster 3) grouped isolates from vegetables and manure at a similarity coefficient of 0.72. The present study provides data that support the potential transmission of resistant strains of E. coli O157:H7 from vegetables and environmental sources to humans with potential public health implications, especially in developing countries. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Spatial analysis of malaria in Anhui province, China

PubMed Central

Zhang, Wenyi; Wang, Liping; Fang, Liqun; Ma, Jiaqi; Xu, Youfu; Jiang, Jiafu; Hui, Fengming; Wang, Jianjun; Liang, Song; Yang, Hong; Cao, Wuchun

2008-01-01

Background Malaria has re-emerged in Anhui Province, China, and this province was the most seriously affected by malaria during 2005–2006. It is necessary to understand the spatial distribution of malaria cases and to identify highly endemic areas for future public health planning and resource allocation in Anhui Province. Methods The annual average incidence at the county level was calculated using malaria cases reported between 2000 and 2006 in Anhui Province. GIS-based spatial analyses were conducted to detect spatial distribution and clustering of malaria incidence at the county level. Results The spatial distribution of malaria cases in Anhui Province from 2000 to 2006 was mapped at the county level to show crude incidence, excess hazard and spatial smoothed incidence. Spatial cluster analysis suggested 10 and 24 counties were at increased risk for malaria (P < 0.001) with the maximum spatial cluster sizes at < 50% and < 25% of the total population, respectively. Conclusion The application of GIS, together with spatial statistical techniques, provide a means to quantify explicit malaria risks and to further identify environmental factors responsible for the re-emerged malaria risks. Future public health planning and resource allocation in Anhui Province should be focused on the maximum spatial cluster region. PMID:18847489

Cluster Analysis of Atmospheric Dynamics and Pollution Transport in a Coastal Area

NASA Astrophysics Data System (ADS)

Sokolov, Anton; Dmitriev, Egor; Maksimovich, Elena; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmentin, Marc; Locoge, Nadine

2016-11-01

Summertime atmospheric dynamics in the coastal zone of the industrialized Dunkerque agglomeration in northern France was characterized by a cluster analysis of back trajectories in the context of pollution transport. The MESO-NH atmospheric model was used to simulate the local dynamics at multiple scales with horizontal resolution down to 500 m, and for the online calculation of the Lagrangian backward trajectories with 30-min temporal resolution. Airmass transport was performed along six principal pathways obtained by the weighted k-means clustering technique. Four of these centroids corresponded to a range of wind speeds over the English Channel: two for wind directions from the north-east and two from the south-west. Another pathway corresponded to a south-westerly continental transport. The backward trajectories of the largest and most dispersed sixth cluster contained low wind speeds, including sea-breeze circulations. Based on analyses of meteorological data and pollution measurements, the principal atmospheric pathways were related to local air-contamination events. Continuous air quality and meteorological data were collected during the Benzene-Toluene-Ethylbenzene-Xylene 2006 campaign. The sites of the pollution measurements served as the endpoints for the backward trajectories. Pollutant transport pathways corresponding to the highest air contamination were defined.
Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case.

PubMed

Papaleo, Elena; Mereghetti, Paolo; Fantucci, Piercarlo; Grandori, Rita; De Gioia, Luca

2009-01-01

Several molecular dynamics (MD) simulations were used to sample conformations in the neighborhood of the native structure of holo-myoglobin (holo-Mb), collecting trajectories spanning 0.22 micros at 300 K. Principal component (PCA) and free-energy landscape (FEL) analyses, integrated by cluster analysis, which was performed considering the position and structures of the individual helices of the globin fold, were carried out. The coherence between the different structural clusters and the basins of the FEL, together with the convergence of parameters derived by PCA indicates that an accurate description of the Mb conformational space around the native state was achieved by multiple MD trajectories spanning at least 0.14 micros. The integration of FEL, PCA, and structural clustering was shown to be a very useful approach to gain an overall view of the conformational landscape accessible to a protein and to identify representative protein substates. This method could be also used to investigate the conformational and dynamical properties of Mb apo-, mutant, or delete versions, in which greater conformational variability is expected and, therefore identification of representative substates from the simulations is relevant to disclose structure-function relationship.
Rapid diversification of FoxP2 in teleosts through gene duplication in the teleost-specific whole genome duplication event.

PubMed

Song, Xiaowei; Wang, Yajun; Tang, Yezhong

2013-01-01

As one of the most conserved genes in vertebrates, FoxP2 is widely involved in a number of important physiological and developmental processes. We systematically studied the evolutionary history and functional adaptations of FoxP2 in teleosts. The duplicated FoxP2 genes (FoxP2a and FoxP2b), which were identified in teleosts using synteny and paralogon analysis on genome databases of eight organisms, were probably generated in the teleost-specific whole genome duplication event. A credible classification with FoxP2, FoxP2a and FoxP2b in phylogenetic reconstructions confirmed the teleost-specific FoxP2 duplication. The unavailability of FoxP2b in Danio rerio suggests that the gene was deleted through nonfunctionalization of the redundant copy after the Otocephala-Euteleostei split. Heterogeneity in evolutionary rates among clusters consisting of FoxP2 in Sarcopterygii (Cluster 1), FoxP2a in Teleostei (Cluster 2) and FoxP2b in Teleostei (Cluster 3), particularly between Clusters 2 and 3, reveals asymmetric functional divergence after the gene duplication. Hierarchical cluster analyses of hydrophobicity profiles demonstrated significant structural divergence among the three clusters with verification of subsequent stepwise discriminant analysis, in which FoxP2 of Leucoraja erinacea and Lepisosteus oculatus were classified into Cluster 1, whereas FoxP2b of Salmo salar was grouped into Cluster 2 rather than Cluster 3. The simulated thermodynamic stability variations of the forkhead box domain (monomer and homodimer) showed remarkable divergence in FoxP2, FoxP2a and FoxP2b clusters. Relaxed purifying selection and positive Darwinian selection probably were complementary driving forces for the accelerated evolution of FoxP2 in ray-finned fishes, especially for the adaptive evolution of FoxP2a and FoxP2b in teleosts subsequent to the teleost-specific gene duplication.
Rapid Diversification of FoxP2 in Teleosts through Gene Duplication in the Teleost-Specific Whole Genome Duplication Event

PubMed Central

Song, Xiaowei; Wang, Yajun; Tang, Yezhong

2013-01-01

As one of the most conserved genes in vertebrates, FoxP2 is widely involved in a number of important physiological and developmental processes. We systematically studied the evolutionary history and functional adaptations of FoxP2 in teleosts. The duplicated FoxP2 genes (FoxP2a and FoxP2b), which were identified in teleosts using synteny and paralogon analysis on genome databases of eight organisms, were probably generated in the teleost-specific whole genome duplication event. A credible classification with FoxP2, FoxP2a and FoxP2b in phylogenetic reconstructions confirmed the teleost-specific FoxP2 duplication. The unavailability of FoxP2b in Danio rerio suggests that the gene was deleted through nonfunctionalization of the redundant copy after the Otocephala-Euteleostei split. Heterogeneity in evolutionary rates among clusters consisting of FoxP2 in Sarcopterygii (Cluster 1), FoxP2a in Teleostei (Cluster 2) and FoxP2b in Teleostei (Cluster 3), particularly between Clusters 2 and 3, reveals asymmetric functional divergence after the gene duplication. Hierarchical cluster analyses of hydrophobicity profiles demonstrated significant structural divergence among the three clusters with verification of subsequent stepwise discriminant analysis, in which FoxP2 of Leucoraja erinacea and Lepisosteus oculatus were classified into Cluster 1, whereas FoxP2b of Salmo salar was grouped into Cluster 2 rather than Cluster 3. The simulated thermodynamic stability variations of the forkhead box domain (monomer and homodimer) showed remarkable divergence in FoxP2, FoxP2a and FoxP2b clusters. Relaxed purifying selection and positive Darwinian selection probably were complementary driving forces for the accelerated evolution of FoxP2 in ray-finned fishes, especially for the adaptive evolution of FoxP2a and FoxP2b in teleosts subsequent to the teleost-specific gene duplication. PMID:24349554
On the Distribution of Orbital Poles of Milky Way Satellites

NASA Astrophysics Data System (ADS)

Palma, Christopher; Majewski, Steven R.; Johnston, Kathryn V.

2002-01-01

In numerous studies of the outer Galactic halo some evidence for accretion has been found. If the outer halo did form in part or wholly through merger events, we might expect to find coherent streams of stars and globular clusters following orbits similar to those of their parent objects, which are assumed to be present or former Milky Way dwarf satellite galaxies. We present a study of this phenomenon by assessing the likelihood of potential descendant ``dynamical families'' in the outer halo. We conduct two analyses: one that involves a statistical analysis of the spatial distribution of all known Galactic dwarf satellite galaxies (DSGs) and globular clusters, and a second, more specific analysis of those globular clusters and DSGs for which full phase space dynamical data exist. In both cases our methodology is appropriate only to members of descendant dynamical families that retain nearly aligned orbital poles today. Since the Sagittarius dwarf (Sgr) is considered a paradigm for the type of merger/tidal interaction event for which we are searching, we also undertake a case study of the Sgr system and identify several globular clusters that may be members of its extended dynamical family. In our first analysis, the distribution of possible orbital poles for the entire sample of outer (Rgc>8 kpc) halo globular clusters is tested for statistically significant associations among globular clusters and DSGs. Our methodology for identifying possible associations is similar to that used by Lynden-Bell & Lynden-Bell, but we put the associations on a more statistical foundation. Moreover, we study the degree of possible dynamical clustering among various interesting ensembles of globular clusters and satellite galaxies. Among the ensembles studied, we find the globular cluster subpopulation with the highest statistical likelihood of association with one or more of the Galactic DSGs to be the distant, outer halo (Rgc>25 kpc), second-parameter globular clusters. The results of our orbital pole analysis are supported by the great circle cell count methodology of Johnston, Hernquist, & Bolte. The space motions of the clusters Pal 4, NGC 6229, NGC 7006, and Pyxis are predicted to be among those most likely to show the clusters to be following stream orbits, since these clusters are responsible for the majority of the statistical significance of the association between outer halo, second-parameter globular clusters and the Milky Way DSGs. In our second analysis, we study the orbits of the 41 globular clusters and six Milky Way-bound DSGs having measured proper motions to look for objects with both coplanar orbits and similar angular momenta. Unfortunately, the majority of globular clusters with measured proper motions are inner halo clusters that are less likely to retain memory of their original orbit. Although four potential globular cluster/DSG associations are found, we believe three of these associations involving inner halo clusters to be coincidental. While the present sample of objects with complete dynamical data is small and does not include many of the globular clusters that are more likely to have been captured by the Milky Way, the methodology we adopt will become increasingly powerful as more proper motions are measured for distant Galactic satellites and globular clusters, and especially as results from the Space Interferometry Mission (SIM) become available.
The ergot alkaloid gene cluster: functional analyses and evolutionary aspects.

PubMed

Lorenz, Nicole; Haarmann, Thomas; Pazoutová, Sylvie; Jung, Manfred; Tudzynski, Paul

2009-01-01

Ergot alkaloids and their derivatives have been traditionally used as therapeutic agents in migraine, blood pressure regulation and help in childbirth and abortion. Their production in submerse culture is a long established biotechnological process. Ergot alkaloids are produced mainly by members of the genus Claviceps, with Claviceps purpurea as best investigated species concerning the biochemistry of ergot alkaloid synthesis (EAS). Genes encoding enzymes involved in EAS have been shown to be clustered; functional analyses of EAS cluster genes have allowed to assign specific functions to several gene products. Various Claviceps species differ with respect to their host specificity and their alkaloid content; comparison of the ergot alkaloid clusters in these species (and of clavine alkaloid clusters in other genera) yields interesting insights into the evolution of cluster structure. This review focuses on recently published and also yet unpublished data on the structure and evolution of the EAS gene cluster and on the function and regulation of cluster genes. These analyses have also significant biotechnological implications: the characterization of non-ribosomal peptide synthetases (NRPS) involved in the synthesis of the peptide moiety of ergopeptines opened interesting perspectives for the synthesis of ergot alkaloids; on the other hand, defined mutants could be generated producing interesting intermediates or only single peptide alkaloids (instead of the alkaloid mixtures usually produced by industrial strains).
The characteristics of depressive symptoms in medical students during medical education and training: a cross-sectional study.

PubMed

Baldassin, Sergio; Alves, Tânia Correa de Toledo Ferraz; de Andrade, Arthur Guerra; Nogueira Martins, Luiz Antonio

2008-12-11

Medical education and training can contribute to the development of depressive symptoms that might lead to possible academic and professional consequences. We aimed to investigate the characteristics of depressive symptoms among 481 medical students (79.8% of the total who matriculated). The Beck Depression Inventory (BDI) and cluster analyses were used in order to better describe the characteristics of depressive symptoms. Medical education and training in Brazil is divided into basic (1st and 2nd years), intermediate (3rd and 4th years), and internship (5th and 6th years) periods. The study organized each item from the BDI into the following three clusters: affective, cognitive, and somatic. Statistical analyses were performed using analysis of variance (ANOVA) with post-hoc Tukey corrected for multiple comparisons. There were 184 (38.2%) students with depressive symptoms (BDI > 9). The internship period resulted in the highest BDI scores in comparison to both the basic (p < .001) and intermediate (p < .001) periods. Affective, cognitive, and somatic clusters were significantly higher in the internship period. An exploratory analysis of possible risk factors showed that females (p = .020) not having a parent who practiced medicine (p = .016), and the internship period (p = .001) were factors for the development of depressive symptoms. There is a high prevalence towards depressive symptoms among medical students, particularly females, in the internship level, mainly involving the somatic and affective clusters, and not having a parent who practiced medicine. The active assessment of these students in evaluating their depressive symptoms is important in order to prevent the development of co-morbidities and suicide risk.
Phenotyping asthma, rhinitis and eczema in MeDALL population-based birth cohorts: an allergic comorbidity cluster.

PubMed

Garcia-Aymerich, J; Benet, M; Saeys, Y; Pinart, M; Basagaña, X; Smit, H A; Siroux, V; Just, J; Momas, I; Rancière, F; Keil, T; Hohmann, C; Lau, S; Wahn, U; Heinrich, J; Tischer, C G; Fantini, M P; Lenzi, J; Porta, D; Koppelman, G H; Postma, D S; Berdel, D; Koletzko, S; Kerkhof, M; Gehring, U; Wickman, M; Melén, E; Hallberg, J; Bindslev-Jensen, C; Eller, E; Kull, I; Lødrup Carlsen, K C; Carlsen, K-H; Lambrecht, B N; Kogevinas, M; Sunyer, J; Kauffmann, F; Bousquet, J; Antó, J M

2015-08-01

Asthma, rhinitis and eczema often co-occur in children, but their interrelationships at the population level have been poorly addressed. We assessed co-occurrence of childhood asthma, rhinitis and eczema using unsupervised statistical techniques. We included 17 209 children at 4 years and 14 585 at 8 years from seven European population-based birth cohorts (MeDALL project). At each age period, children were grouped, using partitioning cluster analysis, according to the distribution of 23 variables covering symptoms 'ever' and 'in the last 12 months', doctor diagnosis, age of onset and treatments of asthma, rhinitis and eczema; immunoglobulin E sensitization; weight; and height. We tested the sensitivity of our estimates to subject and variable selections, and to different statistical approaches, including latent class analysis and self-organizing maps. Two groups were identified as the optimal way to cluster the data at both age periods and in all sensitivity analyses. The first (reference) group at 4 and 8 years (including 70% and 79% of children, respectively) was characterized by a low prevalence of symptoms and sensitization, whereas the second (symptomatic) group exhibited more frequent symptoms and sensitization. Ninety-nine percentage of children with comorbidities (co-occurrence of asthma, rhinitis and/or eczema) were included in the symptomatic group at both ages. The children's characteristics in both groups were consistent in all sensitivity analyses. At 4 and 8 years, at the population level, asthma, rhinitis and eczema can be classified together as an allergic comorbidity cluster. Future research including time-repeated assessments and biological data will help understanding the interrelationships between these diseases. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The characteristics of depressive symptoms in medical students during medical education and training: a cross-sectional study

PubMed Central

Baldassin, Sergio; Alves, Tânia Correa de Toledo Ferraz; de Andrade, Arthur Guerra; Nogueira Martins, Luiz Antonio

2008-01-01

Background Medical education and training can contribute to the development of depressive symptoms that might lead to possible academic and professional consequences. We aimed to investigate the characteristics of depressive symptoms among 481 medical students (79.8% of the total who matriculated). Methods The Beck Depression Inventory (BDI) and cluster analyses were used in order to better describe the characteristics of depressive symptoms. Medical education and training in Brazil is divided into basic (1st and 2nd years), intermediate (3rd and 4th years), and internship (5th and 6th years) periods. The study organized each item from the BDI into the following three clusters: affective, cognitive, and somatic. Statistical analyses were performed using analysis of variance (ANOVA) with post-hoc Tukey corrected for multiple comparisons. Results There were 184 (38.2%) students with depressive symptoms (BDI > 9). The internship period resulted in the highest BDI scores in comparison to both the basic (p < .001) and intermediate (p < .001) periods. Affective, cognitive, and somatic clusters were significantly higher in the internship period. An exploratory analysis of possible risk factors showed that females (p = .020) not having a parent who practiced medicine (p = .016), and the internship period (p = .001) were factors for the development of depressive symptoms. Conclusion There is a high prevalence towards depressive symptoms among medical students, particularly females, in the internship level, mainly involving the somatic and affective clusters, and not having a parent who practiced medicine. The active assessment of these students in evaluating their depressive symptoms is important in order to prevent the development of co-morbidities and suicide risk. PMID:19077227
Phytochemical, phylogenetic, and anti-inflammatory evaluation of 43 Urtica accessions (stinging nettle) based on UPLC-Q-TOF-MS metabolomic profiles.

PubMed

Farag, Mohamed A; Weigend, Maximilian; Luebert, Federico; Brokamp, Grischa; Wessjohann, Ludger A

2013-12-01

Several species of the genus Urtica (especially Urtica dioica, Urticaceae), are used medicinally to treat a variety of ailments. To better understand the chemical diversity of the genus and to compare different accessions and different taxa of Urtica, 63 leaf samples representing a broad geographical, taxonomical and morphological diversity were evaluated under controlled conditions. A molecular phylogeny for all taxa investigated was prepared to compare phytochemical similarity with phylogenetic relatedness. Metabolites were analyzed via UPLC-PDA-MS and multivariate data analyses. In total, 43 metabolites were identified, with phenolic compounds and hydroxy fatty acids as the dominant substance groups. Principal component analysis (PCA) and hierarchical clustering analysis (HCA) provides a first structured chemotaxonomy of the genus. The molecular data present a highly resolved phylogeny with well-supported clades and subclades. U. dioica is retrieved as both para- and polyphyletic. European members of the U. dioica group and the North American subspecies share a rather similar metabolite profile and were largely retrieved as one, nearly exclusive cluster by metabolite data. This latter cluster also includes - remotely related - Urtica urens, which is pharmaceutically used in the same way as U. dioica. However, most highly supported phylogenetic clades were not retrieved in the metabolite cluster analyses. Overall, metabolite profiles indicate considerable phytochemical diversity in the genus, which largely falls into a group characterized by high contents of hydroxy fatty acids (e.g., most Andean-American taxa) and another group characterized by high contents of phenolic acids (especially the U. dioica-clade). Anti-inflammatory in vitro COX1 enzyme inhibition assays suggest that bioactivity may be predicted by gross metabolic profiling in Urtica. Copyright © 2013. Published by Elsevier Ltd.
Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering.

PubMed

Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

2014-12-30

Understanding neural functions requires knowledge from analysing electrophysiological data. The process of assigning spikes of a multichannel signal into clusters, called spike sorting, is one of the important problems in such analysis. There have been various automated spike sorting techniques with both advantages and disadvantages regarding accuracy and computational costs. Therefore, developing spike sorting methods that are highly accurate and computationally inexpensive is always a challenge in the biomedical engineering practice. An automatic unsupervised spike sorting method is proposed in this paper. The method uses features extracted by the locality preserving projection (LPP) algorithm. These features afterwards serve as inputs for the landmark-based spectral clustering (LSC) method. Gap statistics (GS) is employed to evaluate the number of clusters before the LSC can be performed. The proposed LPP-LSC is highly accurate and computationally inexpensive spike sorting approach. LPP spike features are very discriminative; thereby boost the performance of clustering methods. Furthermore, the LSC method exhibits its efficiency when integrated with the cluster evaluator GS. The proposed method's accuracy is approximately 13% superior to that of the benchmark combination between wavelet transformation and superparamagnetic clustering (WT-SPC). Additionally, LPP-LSC computing time is six times less than that of the WT-SPC. LPP-LSC obviously demonstrates a win-win spike sorting solution meeting both accuracy and computational cost criteria. LPP and LSC are linear algorithms that help reduce computational burden and thus their combination can be applied into real-time spike analysis. Copyright © 2014 Elsevier B.V. All rights reserved.
Know thy eHealth user: Development of biopsychosocial personas from a study of older adults with heart failure.

PubMed

Holden, Richard J; Kulanthaivel, Anand; Purkayastha, Saptarshi; Goggins, Kathryn M; Kripalani, Sunil

2017-12-01

Personas are a canonical user-centered design method increasingly used in health informatics research. Personas-empirically-derived user archetypes-can be used by eHealth designers to gain a robust understanding of their target end users such as patients. To develop biopsychosocial personas of older patients with heart failure using quantitative analysis of survey data. Data were collected using standardized surveys and medical record abstraction from 32 older adults with heart failure recently hospitalized for acute heart failure exacerbation. Hierarchical cluster analysis was performed on a final dataset of n=30. Nonparametric analyses were used to identify differences between clusters on 30 clustering variables and seven outcome variables. Six clusters were produced, ranging in size from two to eight patients per cluster. Clusters differed significantly on these biopsychosocial domains and subdomains: demographics (age, sex); medical status (comorbid diabetes); functional status (exhaustion, household work ability, hygiene care ability, physical ability); psychological status (depression, health literacy, numeracy); technology (Internet availability); healthcare system (visit by home healthcare, trust in providers); social context (informal caregiver support, cohabitation, marital status); and economic context (employment status). Tabular and narrative persona descriptions provide an easy reference guide for informatics designers. Personas development using approaches such as clustering of structured survey data is an important tool for health informatics professionals. We describe insights from our study of patients with heart failure, then recommend a generic ten-step personas development process. Methods strengths and limitations of the study and of personas development generally are discussed. Copyright © 2017 Elsevier B.V. All rights reserved.
Bayesian hierarchical models for cost-effectiveness analyses that use data from cluster randomized trials.

PubMed

Grieve, Richard; Nixon, Richard; Thompson, Simon G

2010-01-01

Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.
XMM-Newton X-ray and HST weak gravitational lensing study of the extremely X-ray luminous galaxy cluster Cl J120958.9+495352 (z = 0.902)

NASA Astrophysics Data System (ADS)

Thölken, Sophia; Schrabback, Tim; Reiprich, Thomas H.; Lovisari, Lorenzo; Allen, Steven W.; Hoekstra, Henk; Applegate, Douglas; Buddendiek, Axel; Hicks, Amalia

2018-03-01

Context. Observations of relaxed, massive, and distant clusters can provide important tests of standard cosmological models, for example by using the gas mass fraction. To perform this test, the dynamical state of the cluster and its gas properties have to be investigated. X-ray analyses provide one of the best opportunities to access this information and to determine important properties such as temperature profiles, gas mass, and the total X-ray hydrostatic mass. For the last of these, weak gravitational lensing analyses are complementary independent probes that are essential in order to test whether X-ray masses could be biased. Aims: We study the very luminous, high redshift (z = 0.902) galaxy cluster Cl J120958.9+495352 using XMM-Newton data. We measure global cluster properties and study the temperature profile and the cooling time to investigate the dynamical status with respect to the presence of a cool core. We use Hubble Space Telescope (HST) weak lensing data to estimate its total mass and determine the gas mass fraction. Methods: We perform a spectral analysis using an XMM-Newton observation of 15 ks cleaned exposure time. As the treatment of the background is crucial, we use two different approaches to account for the background emission to verify our results. We account for point spread function effects and deproject our results to estimate the gas mass fraction of the cluster. We measure weak lensing galaxy shapes from mosaic HST imaging and select background galaxies photometrically in combination with imaging data from the William Herschel Telescope. Results: The X-ray luminosity of Cl J120958.9+495352 in the 0.1-2.4 keV band estimated from our XMM-Newton data is LX = (13.4+1.2-1.0) × 1044 erg/s and thus it is one of the most X-ray luminous clusters known at similarly high redshift. We find clear indications for the presence of a cool core from the temperature profile and the central cooling time, which is very rare at such high redshifts. Based on the weak lensing analysis, we estimate a cluster mass of M500/1014 M⊙ = 4.4+2.2-2.0 (stat.) + 0.6 (sys.) and a gas mass fraction of fgas,2500 = 0.11-0.03+0.06 in good agreement with previous findings for high redshift and local clusters.
Characterization of HIV Transmission in South-East Austria

PubMed Central

Kessler, Harald H.; Haas, Bernhard; Stelzl, Evelyn; Weninger, Karin; Little, Susan J.; Mehta, Sanjay R.

2016-01-01

To gain deeper insight into the epidemiology of HIV-1 transmission in South-East Austria we performed a retrospective analysis of 259 HIV-1 partial pol sequences obtained from unique individuals newly diagnosed with HIV infection in South-East Austria from 2008 through 2014. After quality filtering, putative transmission linkages were inferred when two sequences were ≤1.5% genetically different. Multiple linkages were resolved into putative transmission clusters. Further phylogenetic analyses were performed using BEAST v1.8.1. Finally, we investigated putative links between the 259 sequences from South-East Austria and all publicly available HIV polymerase sequences in the Los Alamos National Laboratory HIV sequence database. We found that 45.6% (118/259) of the sampled sequences were genetically linked with at least one other sequence from South-East Austria forming putative transmission clusters. Clustering individuals were more likely to be men who have sex with men (MSM; p<0.001), infected with subtype B (p<0.001) or subtype F (p = 0.02). Among clustered males who reported only heterosexual (HSX) sex as an HIV risk, 47% clustered closely with MSM (either as pairs or within larger MSM clusters). One hundred and seven of the 259 sequences (41.3%) from South-East Austria had at least one putative inferred linkage with sequences from a total of 69 other countries. In conclusion, analysis of HIV-1 sequences from newly diagnosed individuals residing in South-East Austria revealed a high degree of national and international clustering mainly within MSM. Interestingly, we found that a high number of heterosexual males clustered within MSM networks, suggesting either linkage between risk groups or misrepresentation of sexual risk behaviors by subjects. PMID:26967154
Characterization of HIV Transmission in South-East Austria.

PubMed

Hoenigl, Martin; Chaillon, Antoine; Kessler, Harald H; Haas, Bernhard; Stelzl, Evelyn; Weninger, Karin; Little, Susan J; Mehta, Sanjay R

2016-01-01

To gain deeper insight into the epidemiology of HIV-1 transmission in South-East Austria we performed a retrospective analysis of 259 HIV-1 partial pol sequences obtained from unique individuals newly diagnosed with HIV infection in South-East Austria from 2008 through 2014. After quality filtering, putative transmission linkages were inferred when two sequences were ≤1.5% genetically different. Multiple linkages were resolved into putative transmission clusters. Further phylogenetic analyses were performed using BEAST v1.8.1. Finally, we investigated putative links between the 259 sequences from South-East Austria and all publicly available HIV polymerase sequences in the Los Alamos National Laboratory HIV sequence database. We found that 45.6% (118/259) of the sampled sequences were genetically linked with at least one other sequence from South-East Austria forming putative transmission clusters. Clustering individuals were more likely to be men who have sex with men (MSM; p<0.001), infected with subtype B (p<0.001) or subtype F (p = 0.02). Among clustered males who reported only heterosexual (HSX) sex as an HIV risk, 47% clustered closely with MSM (either as pairs or within larger MSM clusters). One hundred and seven of the 259 sequences (41.3%) from South-East Austria had at least one putative inferred linkage with sequences from a total of 69 other countries. In conclusion, analysis of HIV-1 sequences from newly diagnosed individuals residing in South-East Austria revealed a high degree of national and international clustering mainly within MSM. Interestingly, we found that a high number of heterosexual males clustered within MSM networks, suggesting either linkage between risk groups or misrepresentation of sexual risk behaviors by subjects.
Planck 2015 results. XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

NASA Astrophysics Data System (ADS)

Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Battye, R.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Weller, J.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.

2016-09-01

We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing of background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. Improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.
Degree-based statistic and center persistency for brain connectivity analysis.

PubMed

Yoo, Kwangsun; Lee, Peter; Chung, Moo K; Sohn, William S; Chung, Sun Ju; Na, Duk L; Ju, Daheen; Jeong, Yong

2017-01-01

Brain connectivity analyses have been widely performed to investigate the organization and functioning of the brain, or to observe changes in neurological or psychiatric conditions. However, connectivity analysis inevitably introduces the problem of mass-univariate hypothesis testing. Although, several cluster-wise correction methods have been suggested to address this problem and shown to provide high sensitivity, these approaches fundamentally have two drawbacks: the lack of spatial specificity (localization power) and the arbitrariness of an initial cluster-forming threshold. In this study, we propose a novel method, degree-based statistic (DBS), performing cluster-wise inference. DBS is designed to overcome the above-mentioned two shortcomings. From a network perspective, a few brain regions are of critical importance and considered to play pivotal roles in network integration. Regarding this notion, DBS defines a cluster as a set of edges of which one ending node is shared. This definition enables the efficient detection of clusters and their center nodes. Furthermore, a new measure of a cluster, center persistency (CP) was introduced. The efficiency of DBS with a known "ground truth" simulation was demonstrated. Then they applied DBS to two experimental datasets and showed that DBS successfully detects the persistent clusters. In conclusion, by adopting a graph theoretical concept of degrees and borrowing the concept of persistence from algebraic topology, DBS could sensitively identify clusters with centric nodes that would play pivotal roles in an effect of interest. DBS is potentially widely applicable to variable cognitive or clinical situations and allows us to obtain statistically reliable and easily interpretable results. Hum Brain Mapp 38:165-181, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

DOE PAGES

Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...

2016-09-20

In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ade, P. A. R.; Aghanim, N.; Arnaud, M.

In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less

UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data.

PubMed

Adamczak, Rafal; Meller, Jarek

2016-12-28

Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust . uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs.
Micro-PIXE analysis of trace element concentrations of natural rubies from different locations in Myanmar

NASA Astrophysics Data System (ADS)

Sanchez, J. L.; Osipowicz, T.; Tang, S. M.; Tay, T. S.; Win, T. T.

1997-07-01

The trace element concentrations found in geological samples can shed light on the formation process. In the case of gemstones, which might be of artificial or natural origin, there is also considerable interest in the development of methods that provide identification of the origin of a sample. For rubies, trace element concentrations present in natural samples were shown previously to be significant indicators of the region of origin [S.M. Tang et al., Appl. Spectr. 42 (1988) 44, and 43 (1989) 219]. Here we report the results of micro-PIXE analyses of trace element (Ti, V, Cr, Fe, Cu and Ga) concentrations of a large set ( n = 130) of natural rough rubies from nine locations in Myanmar (Burma). The resulting concentrations are subjected to statistical analysis. Six of the nine groups form clusters when the data base is evaluated using tree clustering and principal component analysis.
Definition of run-off-road crash clusters-For safety benefit estimation and driver assistance development.

PubMed

Nilsson, Daniel; Lindman, Magdalena; Victor, Trent; Dozza, Marco

2018-04-01

Single-vehicle run-off-road crashes are a major traffic safety concern, as they are associated with a high proportion of fatal outcomes. In addressing run-off-road crashes, the development and evaluation of advanced driver assistance systems requires test scenarios that are representative of the variability found in real-world crashes. We apply hierarchical agglomerative cluster analysis to define similarities in a set of crash data variables, these clusters can then be used as the basis in test scenario development. Out of 13 clusters, nine test scenarios are derived, corresponding to crashes characterised by: drivers drifting off the road in daytime and night-time, high speed departures, high-angle departures on narrow roads, highways, snowy roads, loss-of-control on wet roadways, sharp curves, and high speeds on roads with severe road surface conditions. In addition, each cluster was analysed with respect to crash variables related to the crash cause and reason for the unintended lane departure. The study shows that cluster analysis of representative data provides a statistically based method to identify relevant properties for run-off-road test scenarios. This was done to support development of vehicle-based run-off-road countermeasures and driver behaviour models used in virtual testing. Future studies should use driver behaviour from naturalistic driving data to further define how test-scenarios and behavioural causation mechanisms should be included. Copyright © 2018 Elsevier Ltd. All rights reserved.
Denaturing gradient gel electrophoresis profiles of bacteria from the saliva of twenty four different individuals form clusters that showed no relationship to the yeasts present.

PubMed

M Weerasekera, Manjula; H Sissons, Chris; Wong, Lisa; A Anderson, Sally; R Holmes, Ann; D Cannon, Richard

2017-10-01

The aim was to investigate the relationship between groups of bacteria identified by cluster analysis of the DGGE fingerprints and the amounts and diversity of yeast present. Bacterial and yeast populations in saliva samples from 24 adults were analysed using denaturing gradient gel electrophoresis (DGGE) of the bacteria present and by yeast culture. Eubacterial DGGE banding patterns showed considerable variation between individuals. Seventy one different amplicon bands were detected, the band number per saliva sample ranged from 21 to 39 (mean±SD=29.3±4.9). Cluster and principal component analysis of the bacterial DGGE patterns yielded three major clusters containing 20 of the samples. Seventeen of the 24 (71%) saliva samples were yeast positive with concentrations up to 10 3 cfu/mL. Candida albicans was the predominant species in saliva samples although six other yeast species, including Candida dubliniensis, Candida tropicalis, Candida krusei, Candida guilliermondii, Candida rugosa and Saccharomyces cerevisiae, were identified. The presence, concentration, and species of yeast in samples showed no clear relationship to the bacterial clusters. Despite indications of in vitro bacteria-yeast interactions, there was a lack of association between the presence, identity and diversity of yeasts and the bacterial DGGE fingerprint clusters in saliva. This suggests significant ecological individual-specificity of these associations in highly complex in vivo oral biofilm systems under normal oral conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

PubMed

Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

2014-01-01

Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.
Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

PubMed Central

Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

2014-01-01

Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417
Dynamic analysis and pattern visualization of forest fires.

PubMed

Lopes, António M; Tenreiro Machado, J A

2014-01-01

This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.
Dynamic Analysis and Pattern Visualization of Forest Fires

PubMed Central

Lopes, António M.; Tenreiro Machado, J. A.

2014-01-01

This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns. PMID:25137393
Fractal analysis of earthquake swarms of Vogtland/NW-Bohemia intraplate seismicity

NASA Astrophysics Data System (ADS)

Mittag, Reinhard J.

2003-03-01

The special type of intraplate microseismicity with swarm-like occurrence of earthquakes within the Vogtland/NW-Bohemian Region is analysed to reveal the nature and the origin of the seismogenic regime. The long-term data set of continuous seismic monitoring since 1962, including more than 26000 events within a range of about 5 units of local magnitude, provides an unique database for statistical investigations. Most earthquakes occur in narrow hypocentral volumes (clusters) within the lower part of the upper crust, but also single event occurrence outside of spatial clusters is observed. Temporal distribution of events is concentrated in clusters (swarms), which last some days until few month in dependence of intensity. Since 1962 three strong swarms occurred (1962, 1985/86, 2000), including two seismic cycles. Spatial clusters are distributed along a fault system of regional extension (Leipzig-Regensburger Störung), which is supposed to act as the joint tectonic fracture zone for the whole seismogenic region. Seismicity is analysed by fractal analysis, suggesting a unifractal behaviour of seismicity and uniform character of seismotectonic regime for the whole region. A tendency of decreasing fractal dimension values is observed for temporal distribution of earthquakes, indicating an increasing degree of temporal clustering from swarm to swarm. Following the idea of earthquake triggering by magma intrusions and related fluid and gas release into the tectonically pre-stressed parts of the crust, a steady increased intensity of intrusion and/or fluid and gas release might account for that observation. Additionally, seismic parameters for Vogtland/NW-Bohemia intraplate seismicity are compared with an adequate data set of mining-induced seismicity in a nearby mine of Lubin/Poland and with synthetic data sets to evaluate parameter estimation. Due to different seismogenic regime of tectonic and induced seismicity, significant differences between b-values and temporal dimension values are observed. Most significant for intraplate seismicity are relatively low fractal dimension values for temporal distribution. That observation reflects the strong degree of temporal earthquake clustering, which might explain the episodic character of earthquake swarms and support the idea of push-like triggering of earthquake avalanches by intruding magma.
A comprehensive study of large-scale structures in the GOODS-SOUTH field up to z ˜ 2.5

NASA Astrophysics Data System (ADS)

Salimbeni, S.; Castellano, M.; Pentericci, L.; Trevese, D.; Fiore, F.; Grazian, A.; Fontana, A.; Giallongo, E.; Boutsia, K.; Cristiani, S.; de Santis, C.; Gallozzi, S.; Menci, N.; Nonino, M.; Paris, D.; Santini, P.; Vanzella, E.

2009-07-01

Aims: The aim of the present paper is to identify and study the properties and galactic content of groups and clusters in the GOODS-South field up to z˜ 2.5, and to analyse the physical properties of galaxies as a continuous function of environmental density up to high redshift. Methods: We used the deep (z850˜ 26), multi-wavelength GOODS-MUSIC catalogue, which has a 15% of spectroscopic redshifts and accurate photometric redshifts for the remaining fraction. On these data, we applied a (2+1)D algorithm, previously developed by our group, that provides an adaptive estimate of the 3D density field. We supported our analysis with simulations to evaluate the purity and the completeness of the cluster catalogue produced by our algorithm. Results: We find several high-density peaks embedded in larger structures in the redshift range 0.4-2.5. From the analysis of their physical properties (mass profile, M200, σ_v, L_X, U-B vs. B diagram), we find that most of them are groups of galaxies, while two are poor clusters with masses a few times 1014~M_⊙. For these two clusters we find from the Chandra 2Ms data an X-ray emission significantly lower than expected from their optical properties, suggesting that the two clusters are either not virialised or are gas poor. We find that the slope of the colour magnitude relation, for these groups and clusters, is constant at least up to z ˜ 1. We also analyse the dependence on environment of galaxy colours, luminosities, stellar masses, ages, and star formations. We find that galaxies in high-density regions are, on average, more luminous and massive than field galaxies up to z ˜ 2. The fraction of red galaxies increases with luminosity and with density up to z˜ 1.2. At higher z this dependence on density disappears. The variation of galaxy properties as a function of redshift and density suggests that a significant change occurs at z ˜ 1.5-2.
A Web service substitution method based on service cluster nets

NASA Astrophysics Data System (ADS)

Du, YuYue; Gai, JunJing; Zhou, MengChu

2017-11-01

Service substitution is an important research topic in the fields of Web services and service-oriented computing. This work presents a novel method to analyse and substitute Web services. A new concept, called a Service Cluster Net Unit, is proposed based on Web service clusters. A service cluster is converted into a Service Cluster Net Unit. Then it is used to analyse whether the services in the cluster can satisfy some service requests. Meanwhile, the substitution methods of an atomic service and a composite service are proposed. The correctness of the proposed method is proved, and the effectiveness is shown and compared with the state-of-the-art method via an experiment. It can be readily applied to e-commerce service substitution to meet the business automation needs.
Selection of key ambient particulate variables for epidemiological studies - applying cluster and heatmap analyses as tools for data reduction.

PubMed

Gu, Jianwei; Pitz, Mike; Breitner, Susanne; Birmili, Wolfram; von Klot, Stephanie; Schneider, Alexandra; Soentgen, Jens; Reller, Armin; Peters, Annette; Cyrys, Josef

2012-10-01

The success of epidemiological studies depends on the use of appropriate exposure variables. The purpose of this study is to extract a relatively small selection of variables characterizing ambient particulate matter from a large measurement data set. The original data set comprised a total of 96 particulate matter variables that have been continuously measured since 2004 at an urban background aerosol monitoring site in the city of Augsburg, Germany. Many of the original variables were derived from measured particle size distribution (PSD) across the particle diameter range 3 nm to 10 μm, including size-segregated particle number concentration, particle length concentration, particle surface concentration and particle mass concentration. The data set was complemented by integral aerosol variables. These variables were measured by independent instruments, including black carbon, sulfate, particle active surface concentration and particle length concentration. It is obvious that such a large number of measured variables cannot be used in health effect analyses simultaneously. The aim of this study is a pre-screening and a selection of the key variables that will be used as input in forthcoming epidemiological studies. In this study, we present two methods of parameter selection and apply them to data from a two-year period from 2007 to 2008. We used the agglomerative hierarchical cluster method to find groups of similar variables. In total, we selected 15 key variables from 9 clusters which are recommended for epidemiological analyses. We also applied a two-dimensional visualization technique called "heatmap" analysis to the Spearman correlation matrix. 12 key variables were selected using this method. Moreover, the positive matrix factorization (PMF) method was applied to the PSD data to characterize the possible particle sources. Correlations between the variables and PMF factors were used to interpret the meaning of the cluster and the heatmap analyses. Copyright © 2012 Elsevier B.V. All rights reserved.
Childhood cancer in small geographical areas and proximity to air-polluting industries.

PubMed

Ortega-García, Juan A; López-Hernández, Fernando A; Cárceles-Álvarez, Alberto; Fuster-Soler, José L; Sotomayor, Diana I; Ramis, Rebeca

2017-07-01

Pediatric cancer has been associated with exposure to certain environmental carcinogens. The purpose of this work is to analyse the relationship between environmental pollution and pediatric cancer risk. We analysed all incidences of pediatric cancer (<15) diagnosed in a Spanish region during the period 1998-2015. The place of residence of each patient and the exact geographical coordinates of main industrial facilities was codified in order to analyse the spatial distribution of cases of cancer in relation to industrial areas. Focal tests and focused Scan methodology were used for the identification of high-incidence-rate spatial clusters around the main industrial pollution foci. The crude rate for the period was 148.0 cases per 1,000,0000 children. The incidence of pediatric cancer increased significantly along the period of study. With respect to spatial distribution, results showed significant high incidence around some industrial pollution foci group and the Scan methodology identify spatial clustering. We observe a global major incidence of non Hodgkin lymphomas (NHL) considering all foci, and high incidence of Sympathetic Nervous System Tumour (SNST) around Energy and Electric and organic and inorganic chemical industries foci group. In the analysis foci to foci, the focused Scan test identifies several significant spatial clusters. Particularly, three significant clusters were identified: the first of SNST was around energy-generating chemical industries (2 cases versus the expected 0.26), another of NHL was around residue-valorisation plants (5 cases versus the expected 0.91) and finally one cluster of Hodgkin lymphoma around building materials (3 cases versus the expected 2.2) CONCLUSION: Results suggest a possible association between proximity to certain industries and pediatric cancer risk. More evidences are necessary before establishing the relationship between industrial pollution and pediatric cancer incidence. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Social Support, Academic Adversity and Academic Buoyancy: A Person-Centred Analysis and Implications for Academic Outcomes

ERIC Educational Resources Information Center

Collie, Rebecca J.; Martin, Andrew J.; Bottrell, Dorothy; Armstrong, Derrick; Ungar, Michael; Liebenberg, Linda

2017-01-01

The present study employed person-centred analyses that enabled identification of groups of students separated on the basis of their perceptions of social support (home and community), academic support, academic adversity and academic buoyancy. Among a sample of 249 young people, including many from high-needs communities, cluster analysis…
Geographic variation of jack pine (Pinus banksiana Lamb.)

Treesearch

Jung Oh Hyun

1977-01-01

Ten traits were measured on 10-year-old jack pine grown at Cloquet, Minnesota, from seed collected from 90 provenances. The traits were examined by using analysis of variance and computing correlations for all combinations of 9 traits plus latitude , longitude, and elevation of the seed sources and cluster analyses using the D2 values from the...
Consequences of Not Accounting for One-Group Clustering in Meta-Analysis

ERIC Educational Resources Information Center

Citkowicz, Martyna; Polanin, Joshua R.

2014-01-01

Meta-analyses are syntheses of effect-size estimates obtained from a collection of studies to summarize a particular field or topic (Hedges, 1992; Lipsey & Wilson, 2001). These reviews are used to integrate knowledge that can inform both scientific inquiry and public policy, therefore it is important to ensure that the estimates of the effect…
Psychopathic Traits in Youth: Is There Evidence for Primary and Secondary Subtypes?

ERIC Educational Resources Information Center

Lee, Zina; Salekin, Randall T.; Iselin, Anne-Marie R.

2010-01-01

The current study employed model-based cluster analysis in a sample of male adolescent offenders (n = 94) to examine subtypes based on psychopathic traits and anxiety. Using the Psychopathy Checklist: Youth Version (PCL:YV; Forth et al. 2003) and the self-report Antisocial Process Screening Device (APSD; Caputo et al. 1999), analyses identified…
Classification of microvascular patterns via cluster analysis reveals their prognostic significance in glioblastoma.

PubMed

Chen, Long; Lin, Zhi-Xiong; Lin, Guo-Shi; Zhou, Chang-Fu; Chen, Yu-Peng; Wang, Xing-Fu; Zheng, Zong-Qing

2015-01-01

There are limited researches focusing on microvascular patterns (MVPs) in human glioblastoma and their prognostic impact. We evaluated MVPs of 78 glioblastomas by CD34/periodic acid-Schiff dual staining and by cluster analysis of the percentage of microvascular area for distinct microvascular formations. The distribution of 5 types of basic microvascular formations, that is, microvascular sprouting (MS), vascular cluster (VC), vascular garland (VG), glomeruloid vascular proliferation (GVP), and vasculogenic mimicry (VM), was variable. Accordingly, cluster analysis classified MVPs into 2 types: type I MVP displayed prominent MSs and VCs, whereas type II MVP had numerous VGs, GVPs, and VMs. By analyzing the proportion of microvascular area for each type of formation, we determined that glioblastomas with few MSs and VCs had many GVPs and VMs, and vice versa. VG seemed to be a transitional type of formation. In case of type I MVP, expression of Ki-67 and p53 but not MGMT was significantly higher as compared with those of type II MVP (P < .05). Survival analysis showed that the type of MVPs presented as an independent prognostic factor of progression-free survival (PFS) and overall survival (OS) (both P < .001). Type II MVP had a more negative influence on PFS and OS than did type I MVP. We conclude that the heterogeneous MVPs in glioblastoma can be categorized properly by certain histopathologic and statistical analyses and may influence clinical outcome. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor)

PubMed Central

Zhang, Jingjing; Dennis, Todd E.

2015-01-01

We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known ‘artificial behaviours’ comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified. PMID:25922935
Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor).

PubMed

Zhang, Jingjing; O'Reilly, Kathleen M; Perry, George L W; Taylor, Graeme A; Dennis, Todd E

2015-01-01

We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known 'artificial behaviours' comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified.

Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing

PubMed Central

Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali

2016-01-01

Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641
Clinical evaluation of a novel population-based regression analysis for detecting glaucomatous visual field progression.

PubMed

Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C

2011-04-01

The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.
Cluster analysis of intradiurnal holm oak pollen cycles at peri-urban and rural sampling sites in southwestern Spain

NASA Astrophysics Data System (ADS)

Hernández-Ceballos, M. A.; García-Mozo, H.; Galán, C.

2015-08-01

The impact of regional and local weather and of local topography on intradiurnal variations in airborne pollen levels was assessed by analysing bi-hourly holm oak ( Quercus ilex subsp. ballota (Desf.) Samp.) pollen counts at two sampling stations located 40 km apart, in southwestern Spain (Cordoba city and El Cabril nature reserve) over the period 2010-2011. Pollen grains were captured using Hirst-type volumetric spore traps. Analysis of regional weather conditions was based on the computation of backward trajectories using the HYSPLIT model. Sampling days were selected on the basis of phenological data; rainy days were eliminated, as were days lying outside a given range of percentiles (P95-P5). Analysis of cycles for the study period, as a whole, revealed differences between sampling sites, with peak bi-hourly pollen counts at night in Cordoba and at midday in El Cabril. Differences were also noted in the influence of surface weather conditions (temperature, relative humidity and wind). Cluster analysis of diurnal holm oak pollen cycles revealed the existence of five clusters at each sampling site. Analysis of backward trajectories highlighted specific regional air-flow patterns associated with each site. Findings indicated the contribution of both nearby and distant pollen sources to diurnal cycles. The combined use of cluster analysis and meteorological analysis proved highly suitable for charting the impact of local weather conditions on airborne pollen-count patterns. This method, and the specific tools used here, could be used not only to study diurnal variations in counts for other pollen types and in other biogeographical settings, but also in a number of other research fields involving airborne particle transport modelling, e.g. radionuclide transport in emergency preparedness exercises.
Cluster analysis of intradiurnal holm oak pollen cycles at peri-urban and rural sampling sites in southwestern Spain.

PubMed

Hernández-Ceballos, M A; García-Mozo, H; Galán, C

2015-08-01

The impact of regional and local weather and of local topography on intradiurnal variations in airborne pollen levels was assessed by analysing bi-hourly holm oak (Quercus ilex subsp. ballota (Desf.) Samp.) pollen counts at two sampling stations located 40 km apart, in southwestern Spain (Cordoba city and El Cabril nature reserve) over the period 2010-2011. Pollen grains were captured using Hirst-type volumetric spore traps. Analysis of regional weather conditions was based on the computation of backward trajectories using the HYSPLIT model. Sampling days were selected on the basis of phenological data; rainy days were eliminated, as were days lying outside a given range of percentiles (P95-P5). Analysis of cycles for the study period, as a whole, revealed differences between sampling sites, with peak bi-hourly pollen counts at night in Cordoba and at midday in El Cabril. Differences were also noted in the influence of surface weather conditions (temperature, relative humidity and wind). Cluster analysis of diurnal holm oak pollen cycles revealed the existence of five clusters at each sampling site. Analysis of backward trajectories highlighted specific regional air-flow patterns associated with each site. Findings indicated the contribution of both nearby and distant pollen sources to diurnal cycles. The combined use of cluster analysis and meteorological analysis proved highly suitable for charting the impact of local weather conditions on airborne pollen-count patterns. This method, and the specific tools used here, could be used not only to study diurnal variations in counts for other pollen types and in other biogeographical settings, but also in a number of other research fields involving airborne particle transport modelling, e.g. radionuclide transport in emergency preparedness exercises.
Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

PubMed

Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

2013-01-01

Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.
Spectral characteristics and the extent of paleosols of the Palouse formation

NASA Technical Reports Server (NTRS)

Frazier, B. E.; Busacca, A.; Cheng, Y.; Wherry, D.; Hart, J.; Gill, S.

1986-01-01

Spectral relationships were investigated for several bare soil fields which were in summer fallow rotation on the date of the imagery. Printouts of each band were examined and compared to aerial photography. Bands with dissimilar reflectance patterns for known areas were then combined using ratio techniques which were proven useful in other studies (Williams, 1983). Selected ratios were Thematic Mapper (TM) 1/TM4, TM3/TM4, and TM5/TM4. Cluster analyses and Baysian and Fastclass classifier images were produced using the three ratio images. Plots of cluster analysis outputs revealed distinct groupings of reflectance data representing green crops, ripened crops, soil and green plants, and bare soil. Bare soil was represented by a line of clusters on plots of the ratios TM5/TM4 and TM3/TM4. The soil line was investigated further to determine factors involved in the distributin of clusters alone the line. The clusters representing the bare soil line were also studied by plotting the Tm5/TM4, TM1/TM4 dimension. A total of 76 soil samples were gathered and analyzed for organic carbon.
Compositional variability in Mediterranean archaeofaunas from Upper Paleolithic Southwest Europe

NASA Astrophysics Data System (ADS)

Jones, Emily Lena

2018-03-01

Recent meta-analyses of Upper Paleolithic Southwestern European archaeofaunas (Jones, 2015, 2016) have identified a consistent "Mediterranean" cluster from the Last Glacial Maximum through the early Holocene, suggesting similarities in environment and/or consistency in hunting strategy across this region through time despite radical changes in climate. However, while these archaeofaunas from this cluster all derive from sites located within today's Mediterranean bioclimatic region, many of them are from locations far from the Mediterranean Sea - Atlantic Portugal, the Spanish Meseta - which today differ significantly from each other in biotic composition. In this paper, I explore clustering (through cluster analysis and non-metric multidimensional scaling) within the Mediterranean archaeofaunal group. I test for the influence of sample size as well as the geographic variables of site elevation, latitude, and longitude on variability in the large mammal portions of archaeofaunal assemblages. ANOVA shows no relationship between cluster-defined groups and site elevation or longitude; instead, site latitude appears to be a primary contributor to patterning. However, the overall compositional similarity of the Mediterranean archaeofaunas in this dataset suggests more consistency than variability in Upper Paleolithic hunting strategy in this region.
Race, deprivation, and immigrant isolation: The spatial demography of air-toxic clusters in the continental United States.

PubMed

Liévanos, Raoul S

2015-11-01

This article contributes to environmental inequality outcomes research on the spatial and demographic factors associated with cumulative air-toxic health risks at multiple geographic scales across the United States. It employs a rigorous spatial cluster analysis of census tract-level 2005 estimated lifetime cancer risk (LCR) of ambient air-toxic emissions from stationary (e.g., facility) and mobile (e.g., vehicular) sources to locate spatial clusters of air-toxic LCR risk in the continental United States. It then tests intersectional environmental inequality hypotheses on the predictors of tract presence in air-toxic LCR clusters with tract-level principal component factor measures of economic deprivation by race and immigrant status. Logistic regression analyses show that net of controls, isolated Latino immigrant-economic deprivation is the strongest positive demographic predictor of tract presence in air-toxic LCR clusters, followed by black-economic deprivation and isolated Asian/Pacific Islander immigrant-economic deprivation. Findings suggest scholarly and practical implications for future research, advocacy, and policy. Copyright © 2015 Elsevier Inc. All rights reserved.
MUSE crowded field 3D spectroscopy of over 12 000 stars in the globular cluster NGC 6397. I. The first comprehensive HRD of a globular cluster

NASA Astrophysics Data System (ADS)

Husser, Tim-Oliver; Kamann, Sebastian; Dreizler, Stefan; Wendt, Martin; Wulff, Nina; Bacon, Roland; Wisotzki, Lutz; Brinchmann, Jarle; Weilbacher, Peter M.; Roth, Martin M.; Monreal-Ibero, Ana

2016-04-01

Aims: We demonstrate the high multiplex advantage of crowded field 3D spectroscopy with the new integral field spectrograph MUSE by means of a spectroscopic analysis of more than 12 000 individual stars in the globular cluster NGC 6397. Methods: The stars are deblended with a point spread function fitting technique, using a photometric reference catalogue from HST as prior, including relative positions and brightnesses. This catalogue is also used for a first analysis of the extracted spectra, followed by an automatic in-depth analysis via a full-spectrum fitting method based on a large grid of PHOENIX spectra. Results: We analysed the largest sample so far available for a single globular cluster of 18 932 spectra from 12 307 stars in NGC 6397. We derived a mean radial velocity of vrad = 17.84 ± 0.07 km s-1 and a mean metallicity of [Fe/H] = -2.120 ± 0.002, with the latter seemingly varying with temperature for stars on the red giant branch (RGB). We determine Teff and [Fe/H] from the spectra, and log g from HST photometry. This is the first very comprehensive Hertzsprung-Russell diagram (HRD) for a globular cluster based on the analysis of several thousands of stellar spectra, ranging from the main sequence to the tip of the RGB. Furthermore, two interesting objects were identified; one is a post-AGB star and the other is a possible millisecond-pulsar companion. Data products are available at http://muse-vlt.eu/scienceBased on observations obtained at the Very Large Telescope (VLT) of the European Southern Observatory, Paranal, Chile (ESO Programme ID 60.A-9100(C)).
Dropping Out or Keeping Up? Early-Dropouts, Late-Dropouts, and Maintainers Differ in Their Automatic Evaluations of Exercise Already before a 14-Week Exercise Course.

PubMed

Antoniewicz, Franziska; Brand, Ralf

2016-01-01

The aim of this study was to examine how automatic evaluations of exercising (AEE) varied according to adherence to an exercise program. Eighty-eight participants (24.98 years ± 6.88; 51.1% female) completed a Brief-Implicit Association Task assessing their AEE, positive and negative associations to exercising at the beginning of a 3-month exercise program. Attendance data were collected for all participants and used in a cluster analysis of adherence patterns. Three different adherence patterns (52 maintainers, 16 early dropouts, 20 late dropouts; 40.91% overall dropouts) were detected using cluster analyses. Participants from these three clusters differed significantly with regard to their positive and negative associations to exercising before the first course meeting ([Formula: see text] = 0.07). Discriminant function analyses revealed that positive associations to exercising was a particularly good discriminating factor. This is the first study to provide evidence of the differential impact of positive and negative associations on exercise behavior over the medium term. The findings contribute to theoretical understanding of evaluative processes from a dual-process perspective and may provide a basis for targeted interventions.
Dropping Out or Keeping Up? Early-Dropouts, Late-Dropouts, and Maintainers Differ in Their Automatic Evaluations of Exercise Already before a 14-Week Exercise Course

PubMed Central

Antoniewicz, Franziska; Brand, Ralf

2016-01-01

The aim of this study was to examine how automatic evaluations of exercising (AEE) varied according to adherence to an exercise program. Eighty-eight participants (24.98 years ± 6.88; 51.1% female) completed a Brief-Implicit Association Task assessing their AEE, positive and negative associations to exercising at the beginning of a 3-month exercise program. Attendance data were collected for all participants and used in a cluster analysis of adherence patterns. Three different adherence patterns (52 maintainers, 16 early dropouts, 20 late dropouts; 40.91% overall dropouts) were detected using cluster analyses. Participants from these three clusters differed significantly with regard to their positive and negative associations to exercising before the first course meeting (ηp2 = 0.07). Discriminant function analyses revealed that positive associations to exercising was a particularly good discriminating factor. This is the first study to provide evidence of the differential impact of positive and negative associations on exercise behavior over the medium term. The findings contribute to theoretical understanding of evaluative processes from a dual-process perspective and may provide a basis for targeted interventions. PMID:27313559
Algorithmic localisation of noise sources in the tip region of a low-speed axial flow fan

NASA Astrophysics Data System (ADS)

Tóth, Bence; Vad, János

2017-04-01

An objective and algorithmised methodology is proposed to analyse beamform data obtained for axial fans. Its application is demonstrated in a case study regarding the tip region of a low-speed cooling fan. First, beamforming is carried out in a co-rotating frame of reference. Then, a distribution of source strength is extracted along the circumference of the rotor at the blade tip radius in each analysed third-octave band. The circumferential distributions are expanded into Fourier series, which allows for filtering out the effects of perturbations, on the basis of an objective criterion. The remaining Fourier components are then considered as base sources to determine the blade-passage-periodic flow mechanisms responsible for the broadband noise. Based on their frequency and angular location, the base sources are grouped together. This is done using the fuzzy c-means clustering method to allow the overlap of the source mechanisms. The number of clusters is determined in a validity analysis. Finally, the obtained clusters are assigned to source mechanisms based on the literature. Thus, turbulent boundary layer - trailing edge interaction noise, tip leakage flow noise, and double leakage flow noise are identified.
Geospatial Characterization of Fluvial Wood Arrangement in a Semi-confined Alluvial River

NASA Astrophysics Data System (ADS)

Martin, D. J.; Harden, C. P.; Pavlowsky, R. T.

2014-12-01

Large woody debris (LWD) has become universally recognized as an integral component of fluvial systems, and as a result, has become increasingly common as a river restoration tool. However, "natural" processes of wood recruitment and the subsequent arrangement of LWD within the river network are poorly understood. This research used a suite of spatial statistics to investigate longitudinal arrangement patterns of LWD in a low-gradient, Midwestern river. First, a large-scale GPS inventory of LWD, performed on the Big River in the eastern Missouri Ozarks, resulted in over 4,000 logged positions of LWD along seven river segments that covered nearly 100 km of the 237 km river system. A global Moran's I analysis indicates that LWD density is spatially autocorrelated and displays a clustering tendency within all seven river segments (P-value range = 0.000 to 0.054). A local Moran's I analysis identified specific locations along the segments where clustering occurs and revealed that, on average, clusters of LWD density (high or low) spanned 400 m. Spectral analyses revealed that, in some segments, LWD density is spatially periodic. Two segments displayed strong periodicity, while the remaining segments displayed varying degrees of noisiness. Periodicity showed a positive association with gravel bar spacing and meander wavelength, although there were insufficient data to statistically confirm the relationship. A wavelet analysis was then performed to investigate periodicity relative to location along the segment. The wavelet analysis identified significant (α = 0.05) periodicity at discrete locations along each of the segments. Those reaches yielding strong periodicity showed stronger relationships between LWD density and the geomorphic/riparian independent variables tested. Analyses consistently identified valley width and sinuosity as being associated with LWD density. The results of these analyses contribute a new perspective on the longitudinal distribution of LWD in a river system, which should help identify physical and/or riparian control mechanisms of LWD arrangement and support the development of models of LWD arrangement. Additionally, the spatial statistical tools presented here have shown to be valuable for identifying longitudinal patterns in river system components.
Classification of Cowpox Viruses into Several Distinct Clades and Identification of a Novel Lineage

PubMed Central

Franke, Annika; Pfaff, Florian; Jenckel, Maria; Hoffmann, Bernd; Höper, Dirk; Antwerpen, Markus; Meyer, Hermann; Beer, Martin; Hoffmann, Donata

2017-01-01

Cowpox virus (CPXV) was considered as uniform species within the genus Orthopoxvirus (OPV). Previous phylogenetic analysis indicated that CPXV is polyphyletic and isolates may cluster into different clades with two of these clades showing genetic similarities to either variola (VARV) or vaccinia viruses (VACV). Further analyses were initiated to assess both the genetic diversity and the evolutionary background of circulating CPXVs. Here we report the full-length sequences of 20 CPXV strains isolated from different animal species and humans in Germany. A phylogenetic analysis of altogether 83 full-length OPV genomes confirmed the polyphyletic character of the species CPXV and suggested at least four different clades. The German isolates from this study mainly clustered into two CPXV-like clades, and VARV- and VACV-like strains were not observed. A single strain, isolated from a cotton-top tamarin, clustered distantly from all other CPXVs and might represent a novel and unique evolutionary lineage. The classification of CPXV strains into clades roughly followed their geographic origin, with the highest clade diversity so far observed for Germany. Furthermore, we found evidence for recombination between OPV clades without significant disruption of the observed clustering. In conclusion, this analysis markedly expands the number of available CPXV full-length sequences and confirms the co-circulation of several CPXV clades in Germany, and provides the first data about a new evolutionary CPXV lineage. PMID:28604604
Identification of a current hot spot of HIV type 1 transmission in Mongolia by molecular epidemiological analysis.

PubMed

Davaalkham, Jagdagsuren; Unenchimeg, Puntsag; Baigalmaa, Chultem; Erdenetuya, Gombo; Nyamkhuu, Dulmaa; Shiino, Teiichiro; Tsuchiya, Kiyoto; Hayashida, Tsunefusa; Gatanaga, Hiroyuki; Oka, Shinichi

2011-10-01

We investigated the current molecular epidemiological status of HIV-1 in Mongolia, a country with very low incidence of HIV-1 though with rapid expansion in recent years. HIV-1 pol (1065 nt) and env (447 nt) genes were sequenced to construct phylogenetic trees. The evolutionary rates, molecular clock phylogenies, and other evolutionary parameters were estimated from heterochronous genomic sequences of HIV-1 subtype B by the Bayesian Markov chain Monte Carlo method. We obtained 41 sera from 56 reported HIV-1-positive cases as of May 2009. The main route of infection was men who have sex with men (MSM). Dominant subtypes were subtype B in 32 cases (78%) followed by subtype CRF02_AG (9.8%). The phylogenetic analysis of the pol gene identified two clusters in subtype B sequences. Cluster 1 consisted of 21 cases including MSM and other routes of infection, and cluster 2 consisted of eight MSM cases. The tree analyses demonstrated very short branch lengths in cluster 1, suggesting a surprisingly active expansion of HIV-1 transmission during a short period with the same ancestor virus. Evolutionary analysis indicated that the outbreak started around the early 2000s. This study identified a current hot spot of HIV-1 transmission and potential seed of the epidemic in Mongolia. Comprehensive preventive measures targeting this group are urgently needed.
Inferring Phylogenetic Relationships of Indian Citron (Citrus medica L.) based on rbcL and matK Sequences of Chloroplast DNA.

PubMed

Uchoi, Ajit; Malik, Surendra Kumar; Choudhary, Ravish; Kumar, Susheel; Rohini, M R; Pal, Digvender; Ercisli, Sezai; Chaudhury, Rekha

2016-06-01

Phylogenetic relationships of Indian Citron (Citrus medica L.) with other important Citrus species have been inferred through sequence analyses of rbcL and matK gene region of chloroplast DNA. The study was based on 23 accessions of Citrus genotypes representing 15 taxa of Indian Citrus, collected from wild, semi-wild, and domesticated stocks. The phylogeny was inferred using the maximum parsimony (MP) and neighbor-joining (NJ) methods. Both MP and NJ trees separated all the 23 accessions of Citrus into five distinct clusters. The chloroplast DNA (cpDNA) analysis based on rbcL and matK sequence data carried out in Indian taxa of Citrus was useful in differentiating all the true species and species/varieties of probable hybrid origin in distinct clusters or groups. Sequence analysis based on rbcL and matK gene provided unambiguous identification and disposition of true species like C. maxima, C. medica, C. reticulata, and related hybrids/cultivars. The separation of C. maxima, C. medica, and C. reticulata in distinct clusters or sub-clusters supports their distinctiveness as the basic species of edible Citrus. However, the cpDNA sequence analysis of rbcL and matK gene could not find any clear cut differentiation between subgenera Citrus and Papeda as proposed in Swingle's system of classification.
Metabolic Analysis of Various Date Palm Fruit (Phoenix dactylifera L.) Cultivars from Saudi Arabia to Assess Their Nutritional Quality.

PubMed

Hamad, Ismail; AbdElgawad, Hamada; Al Jaouni, Soad; Zinta, Gaurav; Asard, Han; Hassan, Sherif; Hegab, Momtaz; Hagagy, Nashwa; Selim, Samy

2015-07-27

Date palm is an important crop, especially in the hot-arid regions of the world. Date palm fruits have high nutritional and therapeutic value and possess significant antibacterial and antifungal properties. In this study, we performed bioactivity analyses and metabolic profiling of date fruits of 12 cultivars from Saudi Arabia to assess their nutritional value. Our results showed that the date extracts from different cultivars have different free radical scavenging and anti-lipid peroxidation activities. Moreover, the cultivars showed significant differences in their chemical composition, e.g., the phenolic content (10.4-22.1 mg/100 g DW), amino acids (37-108 μmol·g-1 FW) and minerals (237-969 mg/100 g DW). Principal component analysis (PCA) showed a clear separation of the cultivars into four different groups. The first group consisted of the Sokary, Nabtit Ali cultivars, the second group of Khlas Al Kharj, Khla Al Qassim, Mabroom, Khlas Al Ahsa, the third group of Khals Elshiokh, Nabot Saif, Khodry, and the fourth group consisted of Ajwa Al Madinah, Saffawy, Rashodia, cultivars. Hierarchical cluster analysis (HCA) revealed clustering of date cultivars into two groups. The first cluster consisted of the Sokary, Rashodia and Nabtit Ali cultivars, and the second cluster contained all the other tested cultivars. These results indicate that date fruits have high nutritive value, and different cultivars have different chemical composition.
Organic Food Market Segmentation in Lebanon

NASA Astrophysics Data System (ADS)

Tleis, Malak; Roma, Rocco; Callieris, Roberta

2015-04-01

Organic farming in Lebanon is not a new concept. It started with the efforts of the private sector more than a decade ago and is still present even with the limited agricultural production. The local market is quite developed in comparison to neighboring countries, depending mainly on imports. Few studies were addressed to organic consumption in Lebanon, were none of them dealt with organic consumers analysis. Therefore, our objectives were to identify the profiles of Lebanese organic consumer and non organic consumer and to propose appropriate marketing strategies for each segment of consumer with the final aim of developing the Lebanese organic market. A survey, based on the use of closed-ended questionnaire, was addressed to 400 consumers in the capital, Beirut, from the end of February till the end of March 2014. Data underwent descriptive analyses, principal component analyses (PCA) and cluster analyses (k-means method) through the statistical software SPSS. Four cluster were obtained based on psychographic characteristics and willingness to pay (WTP) for the principal organic products purchased. "Localists" and "Health conscious" clusters constituted the largest proportion of the selected sample, thus were the most critical to be addressed by specific marketing strategies emphasizing the combination of local and organic food and the healthy properties of organic products. "Rational" and "Irregular" cluster were relatively small groups, addressed by pricing and promotional strategies. This study showed a positive attitude among Lebanese consumer towards organic food, where egoistic motives are prevailing over altruistic motives. High prices of organic commodities and low trust in organic farming, remain a constraint to levitating organic consumption. The combined efforts of the public and the private sector are required to spread the knowledge about positive environmental payback of organic agriculture and for the promotion of locally produced organic goods.
Calcisponges have a ParaHox gene and dynamic expression of dispersed NK homeobox genes.

PubMed

Fortunato, Sofia A V; Adamski, Marcin; Ramos, Olivia Mendivil; Leininger, Sven; Liu, Jing; Ferrier, David E K; Adamska, Maja

2014-10-30

Sponges are simple animals with few cell types, but their genomes paradoxically contain a wide variety of developmental transcription factors, including homeobox genes belonging to the Antennapedia (ANTP) class, which in bilaterians encompass Hox, ParaHox and NK genes. In the genome of the demosponge Amphimedon queenslandica, no Hox or ParaHox genes are present, but NK genes are linked in a tight cluster similar to the NK clusters of bilaterians. It has been proposed that Hox and ParaHox genes originated from NK cluster genes after divergence of sponges from the lineage leading to cnidarians and bilaterians. On the other hand, synteny analysis lends support to the notion that the absence of Hox and ParaHox genes in Amphimedon is a result of secondary loss (the ghost locus hypothesis). Here we analysed complete suites of ANTP-class homeoboxes in two calcareous sponges, Sycon ciliatum and Leucosolenia complicata. Our phylogenetic analyses demonstrate that these calcisponges possess orthologues of bilaterian NK genes (Hex, Hmx and Msx), a varying number of additional NK genes and one ParaHox gene, Cdx. Despite the generation of scaffolds spanning multiple genes, we find no evidence of clustering of Sycon NK genes. All Sycon ANTP-class genes are developmentally expressed, with patterns suggesting their involvement in cell type specification in embryos and adults, metamorphosis and body plan patterning. These results demonstrate that ParaHox genes predate the origin of sponges, thus confirming the ghost locus hypothesis, and highlight the need to analyse the genomes of multiple sponge lineages to obtain a complete picture of the ancestral composition of the first animal genome.
Typical patterns of modifiable health risk factors (MHRFs) in elderly women in Germany: results from the cross-sectional German Health Update (GEDA) study, 2009 and 2010.

PubMed

Jentsch, Franziska; Allen, Jennifer; Fuchs, Judith; von der Lippe, Elena

2017-04-04

Modifiable health risk factors (MHRFs) significantly affect morbidity and mortality rates and frequently occur in specific combinations or risk clusters. Using five MHRFs (smoking, high-risk alcohol consumption, physical inactivity, low intake of fruits and vegetables, and obesity) this study investigates the extent to which risk clusters are observed in a representative sample of women aged 65 and older in Germany. Additionally, the structural composition of the clusters is systematically compared with data and findings from other countries. A pooled data set of Germany's representative cross-sectional surveys GEDA09 and GEDA10 was used. The cohort comprised 4,617 women aged 65 and older. Specific risk clusters based on five MHRFs are identified, using hierarchical cluster analysis. The MHRFs were defined as current smoking (daily or occasionally), risk alcohol consumption (according to the Alcohol Use Disorders Identification Test, a sum score of 4 or more points), physical inactivity (less active than 5 days per week for at least 30 min and lack of sports-related activity in the last three months), low intake of fruits and vegetables (less than one serving of fruits and one of vegetables per day), and obesity (a body mass index equal to or greater than 30). A total of 4,292 cases with full information on these factors are included in the cluster analysis. Extended analyses were also performed to include the number of chronic diseases by age and socioeconomic status of group members. A total of seven risk clusters were identified. In a comparison with data from international studies, the seven risk clusters were found to be stable with a high degree of structural equivalency. Evidence of the stability of risk clusters across various study populations provides a useful starting point for long-term targeted health interventions. The structural clusters provide information through which various MHRFs can be evaluated simultaneously.

Societal burden of cluster headache in the United States: a descriptive economic analysis.

PubMed

Ford, Janet H; Nero, Damion; Kim, Gilwan; Chu, Bong Chul; Fowler, Robert; Ahl, Jonna; Martinez, James M

2018-01-01

To estimate direct and indirect costs in patients with a diagnosis of cluster headache in the US. Adult patients (18-64 years of age) enrolled in the Marketscan Commercial and Medicare Databases with ≥2 non-diagnostic outpatient (≥30 days apart between the two outpatient claims) or ≥1 inpatient diagnoses of cluster headache (ICD-9-CM code 339.00, 339.01, or 339.02) between January 1, 2009 and June 30, 2014, were included in the analyses. Patients had ≥6 months of continuous enrollment with medical and pharmacy coverage before and after the index date (first cluster headache diagnosis). Three outcomes were evaluated: (1) healthcare resource utilization, (2) direct healthcare costs, and (3) indirect costs associated with work days lost due to absenteeism and short-term disability. Direct costs included costs of all-cause and cluster headache-related outpatient, inpatient hospitalization, surgery, and pharmacy claims. Indirect costs were based on an average daily wage, which was estimated from the 2014 US Bureau of Labor Statistics and inflated to 2015 dollars. There were 9,328 patients with cluster headache claims included in the analysis. Cluster headache-related total direct costs (mean [standard deviation]) were $3,132 [$13,396] per patient per year (PPPY), accounting for 17.8% of the all-cause total direct cost. Cluster headache-related inpatient hospitalizations ($1,604) and pharmacy ($809) together ($2,413) contributed over 75% of the cluster headache-related direct healthcare cost. There were three sub-groups of patients with claims associated with indirect costs that included absenteeism, short-term disability, and absenteeism + short-term disability. Indirect costs PPPY were $4,928 [$4,860] for absenteeism, $803 [$2,621] for short-term disability, and $3,374 [$3,198] for absenteeism + disability. Patients with cluster headache have high healthcare costs that are associated with inpatient admissions and pharmacy fulfillments, and high indirect costs associated with absenteeism and short-term disability.
Gene expression profiles of breast biopsies from healthy women identify a group with claudin-low features

PubMed Central

2011-01-01

Background Increased understanding of the variability in normal breast biology will enable us to identify mechanisms of breast cancer initiation and the origin of different subtypes, and to better predict breast cancer risk. Methods Gene expression patterns in breast biopsies from 79 healthy women referred to breast diagnostic centers in Norway were explored by unsupervised hierarchical clustering and supervised analyses, such as gene set enrichment analysis and gene ontology analysis and comparison with previously published genelists and independent datasets. Results Unsupervised hierarchical clustering identified two separate clusters of normal breast tissue based on gene-expression profiling, regardless of clustering algorithm and gene filtering used. Comparison of the expression profile of the two clusters with several published gene lists describing breast cells revealed that the samples in cluster 1 share characteristics with stromal cells and stem cells, and to a certain degree with mesenchymal cells and myoepithelial cells. The samples in cluster 1 also share many features with the newly identified claudin-low breast cancer intrinsic subtype, which also shows characteristics of stromal and stem cells. More women belonging to cluster 1 have a family history of breast cancer and there is a slight overrepresentation of nulliparous women in cluster 1. Similar findings were seen in a separate dataset consisting of histologically normal tissue from both breasts harboring breast cancer and from mammoplasty reductions. Conclusion This is the first study to explore the variability of gene expression patterns in whole biopsies from normal breasts and identified distinct subtypes of normal breast tissue. Further studies are needed to determine the specific cell contribution to the variation in the biology of normal breasts, how the clusters identified relate to breast cancer risk and their possible link to the origin of the different molecular subtypes of breast cancer. PMID:22044755
Identifying and characterizing hepatitis C virus hotspots in Massachusetts: a spatial epidemiological approach.

PubMed

Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H

2017-04-20

Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.
Social phobia subtypes in the general population revealed by cluster analysis.

PubMed

Furmark, T; Tillfors, M; Stattin, H; Ekselius, L; Fredrikson, M

2000-11-01

Epidemiological data on subtypes of social phobia are scarce and their defining features are debated. Hence, the present study explored the prevalence and descriptive characteristics of empirically derived social phobia subgroups in the general population. To reveal subtypes, data on social distress, functional impairment, number of social fears and criteria fulfilled for avoidant personality disorder were extracted from a previously published epidemiological study of 188 social phobics and entered into an hierarchical cluster analysis. Criterion validity was evaluated by comparing clusters on the Social Phobia Scale (SPS) and the Social Interaction Anxiety Scale (SIAS). Finally, profile analyses were performed in which clusters were compared on a set of sociodemographic and descriptive characteristics. Three clusters emerged, consisting of phobics scoring either high (generalized subtype), intermediate (non-generalized subtype) or low (discrete subtype) on all variables. Point prevalence rates were 2.0%, 5.9% and 7.7% respectively. All subtypes were distinguished on both SPS and SIAS. Generalized or severe social phobia tended to be over-represented among individuals with low levels of educational attainment and social support. Overall, public-speaking was the most common fear. Although categorical distinctions may be used, the present data suggest that social phobia subtypes in the general population mainly differ dimensionally along a mild moderate-severe continuum, and that the number of cases declines with increasing severity.
Representation of Tinnitus in the US Newspaper Media and in Facebook Pages: Cross-Sectional Analysis of Secondary Data.

PubMed

Manchaiah, Vinaya; Ratinaud, Pierre; Andersson, Gerhard

2018-05-08

When people with health conditions begin to manage their health issues, one important issue that emerges is the question as to what exactly do they do with the information that they have obtained through various sources (eg, news media, social media, health professionals, friends, and family). The information they gather helps form their opinions and, to some degree, influences their attitudes toward managing their condition. This study aimed to understand how tinnitus is represented in the US newspaper media and in Facebook pages (ie, social media) using text pattern analysis. This was a cross-sectional study based upon secondary analyses of publicly available data. The 2 datasets (ie, text corpuses) analyzed in this study were generated from US newspaper media during 1980-2017 (downloaded from the database US Major Dailies by ProQuest) and Facebook pages during 2010-2016. The text corpuses were analyzed using the Iramuteq software using cluster analysis and chi-square tests. The newspaper dataset had 432 articles. The cluster analysis resulted in 5 clusters, which were named as follows: (1) brain stimulation (26.2%), (2) symptoms (13.5%), (3) coping (19.8%), (4) social support (24.2%), and (5) treatment innovation (16.4%). A time series analysis of clusters indicated a change in the pattern of information presented in newspaper media during 1980-2017 (eg, more emphasis on cluster 5, focusing on treatment inventions). The Facebook dataset had 1569 texts. The cluster analysis resulted in 7 clusters, which were named as: (1) diagnosis (21.9%), (2) cause (4.1%), (3) research and development (13.6%), (4) social support (18.8%), (5) challenges (11.1%), (6) symptoms (21.4%), and (7) coping (9.2%). A time series analysis of clusters indicated no change in information presented in Facebook pages on tinnitus during 2011-2016. The study highlights the specific aspects about tinnitus that the US newspaper media and Facebook pages focus on, as well as how these aspects change over time. These findings can help health care providers better understand the presuppositions that tinnitus patients may have. More importantly, the findings can help public health experts and health communication experts in tailoring health information about tinnitus to promote self-management, as well as assisting in appropriate choices of treatment for those living with tinnitus. ©Vinaya Manchaiah, Pierre Ratinaud, Gerhard Andersson. Originally published in the Interactive Journal of Medical Research (http://www.i-jmr.org/), 08.05.2018.
Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

PubMed Central

Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

2011-01-01

High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204
Comparison of a non-stationary voxelation-corrected cluster-size test with TFCE for group-Level MRI inference.

PubMed

Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong

2017-03-01

Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Streptomyces scabies 87-22 contains a coronafacic acid-like biosynthetic cluster that contributes to plant-microbe interactions.

PubMed

Bignell, Dawn R D; Seipke, Ryan F; Huguet-Tapia, José C; Chambers, Alan H; Parry, Ronald J; Loria, Rosemary

2010-02-01

Plant-pathogenic Streptomyces spp. cause scab disease on economically important root and tuber crops, the most important of which is potato. Key virulence determinants produced by these species include the cellulose synthesis inhibitor, thaxtomin A, and the secreted Nec1 protein that is required for colonization of the plant host. Recently, the genome sequence of Streptomyces scabies 87-22 was completed, and a biosynthetic cluster was identified that is predicted to synthesize a novel compound similar to coronafacic acid (CFA), a component of the virulence-associated coronatine phytotoxin produced by the plant-pathogenic bacterium Pseudomonas syringae. Southern analysis indicated that the cfa-like cluster in S. scabies 87-22 is likely conserved in other strains of S. scabies but is absent from two other pathogenic streptomycetes, S. turgidiscabies and S. acidiscabies. Transcriptional analyses demonstrated that the cluster is expressed during plant-microbe interactions and that expression requires a transcriptional regulator embedded in the cluster as well as the bldA tRNA. A knockout strain of the biosynthetic cluster displayed a reduced virulence phenotype on tobacco seedlings compared with the wild-type strain. Thus, the cfa-like biosynthetic cluster is a newly discovered locus in S. scabies that contributes to host-pathogen interactions.
Review of methods for handling confounding by cluster and informative cluster size in clustered data

PubMed Central

Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew

2014-01-01

Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis

NASA Astrophysics Data System (ADS)

Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo

2016-01-01

Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties.
A multilevel analysis of gatekeeper characteristics and consistent condom use among establishment-based female sex workers in Guangxi, China.

PubMed

Li, Qing; Li, Xiaoming; Stanton, Bonita; Fang, Xiaoyi; Zhao, Ran

2010-11-01

Multilevel analytical techniques are being applied in condom use research to ensure the validity of investigation on environmental/structural influences and clustered data from venue-based sampling. The literature contains reports of consistent associations between perceived gatekeeper support and condom use among entertainment establishment-based female sex workers (FSWs) in Guangxi, China. However, the clustering inherent in the data (FSWs being clustered within establishment) has not been accounted in most of the analyses. We used multilevel analyses to examine perceived features of gatekeepers and individual correlates of consistent condom use among FSWs and to validate the findings in the existing literature. We analyzed cross-sectional data from 318 FSWs from 29 entertainment establishments in Guangxi, China in 2004, with a minimum of 5 FSWs per establishment. The Hierarchical Linear Models program with Laplace estimation was used to estimate the parameters in models containing random effects and binary outcomes. About 11.6% of women reported consistent condom use with clients. The intraclass correlation coefficient indicated 18.5% of the variance in condom use could be attributed to their similarity between FSWs within the same establishments. Women's perceived gatekeeper support and education remained positively associated with condom use (P < 0.05), after controlling for other individual characteristics and clustering. After adjusting for data clustering, perceived gatekeeper support remains associated with consistent condom use with clients among FSWs in China. The results imply that combined interventions to intervene both gatekeepers and individual FSW may effectively promote consistent condom use.
A Multilevel Analysis of Gatekeeper Characteristics and Consistent Condom Use Among Establishment-Based Female Sex Workers in Guangxi, China

PubMed Central

Li, Qing; Li, Xiaoming; Stanton, Bonita; Fang, Xiaoyi; Zhao, Ran

2010-01-01

Background Multilevel analytical techniques are being applied in condom use research to ensure the validity of investigation on environmental/structural influences and clustered data from venue-based sampling. The literature contains reports of consistent associations between perceived gatekeeper support and condom use among entertainments establishment-based female sex workers (FSWs) in Guangxi, China. However, the clustering inherent in the data (FSWs being clustered within establishment) has not been accounted in most of the analyses. We used multilevel analyses to examine perceived features of gatekeepers and individual correlates of consistent condom use among FSWs and to validate the findings in the existing literature. Methods We analyzed cross-sectional data from 318 FSWs from 29 entertainment establishments in Guangxi, China in 2004, with a minimum of 5 FSWs per establishment. The Hierarchical Linear Models program with Laplace estimation was used to estimate the parameters in models containing random effects and binary outcomes. Results About 11.6% of women reported consistent condom use with clients. The intraclass correlation coefficient indicated 18.5% of the variance in condom use could be attributed to their similarity between FSWs within the same establishments. Women’s perceived gatekeeper support and education remained positively associated with condom use (P < 0.05), after controlling for other individual characteristics and clustering. Conclusions After adjusting for data clustering, perceived gatekeeper support remains associated with consistent condom use with clients among FSWs in China. The results imply that combined interventions to intervene both gatekeepers and individual FSW may effectively promote consistent condom use. PMID:20539262
Evaluation of the Social Motivation Hypothesis of Autism: A Systematic Review and Meta-analysis.

PubMed

Clements, Caitlin C; Zoltowski, Alisa R; Yankowitz, Lisa D; Yerys, Benjamin E; Schultz, Robert T; Herrington, John D

2018-06-13

The social motivation hypothesis posits that individuals with autism spectrum disorder (ASD) find social stimuli less rewarding than do people with neurotypical activity. However, functional magnetic resonance imaging (fMRI) studies of reward processing have yielded mixed results. To examine whether individuals with ASD process rewarding stimuli differently than typically developing individuals (controls), whether differences are limited to social rewards, and whether contradictory findings in the literature might be due to sample characteristics. Articles were identified in PubMed, Embase, and PsycINFO from database inception until June 1, 2017. Functional MRI data from these articles were provided by most authors. Publications were included that provided brain activation contrasts between a sample with ASD and controls on a reward task, determined by multiple reviewer consensus. When fMRI data were not provided by authors, multiple reviewers extracted peak coordinates and effect sizes from articles to recreate statistical maps using seed-based d mapping software. Random-effects meta-analyses of responses to social, nonsocial, and restricted interest stimuli, as well as all of these domains together, were performed. Secondary analyses included meta-analyses of wanting and liking, meta-regression with age, and correlations with ASD severity. All procedures were conducted in accordance with Meta-analysis of Observational Studies in Epidemiology guidelines. Brain activation differences between groups with ASD and typically developing controls while processing rewards. All analyses except the domain-general meta-analysis were planned before data collection. The meta-analysis included 13 studies (30 total fMRI contrasts) from 259 individuals with ASD and 246 controls. Autism spectrum disorder was associated with aberrant processing of both social and nonsocial rewards in striatal regions and increased activation in response to restricted interests (social reward, caudate cluster: d = -0.25 [95% CI, -0.41 to -0.08]; nonsocial reward, caudate and anterior cingulate cluster: d = -0.22 [95% CI, -0.42 to -0.02]; restricted interests, caudate and nucleus accumbens cluster: d = 0.42 [95% CI, 0.07 to 0.78]). Individuals with ASD show atypical processing of social and nonsocial rewards. Findings support a broader interpretation of the social motivation hypothesis of ASD whereby general atypical reward processing encompasses social reward, nonsocial reward, and perhaps restricted interests. This meta-analysis also suggests that prior mixed results could be driven by sample age differences, warranting further study of the developmental trajectory for reward processing in ASD.
Co-citation Network Analysis of Religious Texts

NASA Astrophysics Data System (ADS)

Murai, Hajime; Tokosumi, Akifumi

This paper introduces a method of representing in a network the thoughts of individual authors of dogmatic texts numerically and objectively by means of co-citation analysis and a method of distinguishing between the thoughts of various authors by clustering and analysis of clustered elements, generated by the clustering process. Using these methods, this paper creates and analyzes the co-citation networks for five authoritative Christian theologians through history (Augustine, Thomas Aquinas, Jean Calvin, Karl Barth, John Paul II). These analyses were able to extract the core element of Christian thought (Jn 1:14, Ph 2:6, Ph 2:7, Ph 2:8, Ga 4:4), as well as distinctions between the individual theologians in terms of their sect (Catholic or Protestant) and era (thinking about the importance of God's creation and the necessity of spreading the Gospel). By supplementing conventional literary methods in areas such as philosophy and theology, with these numerical and objective methods, it should be possible to compare the characteristics of various doctrines. The ability to numerically and objectively represent the characteristics of various thoughts opens up the possibilities of utilizing new information technology, such as web ontology and the Artificial Intelligence, in order to process information about ideological thoughts in the future.
Biomarker clusters are differentially associated with longitudinal cognitive decline in late midlife

PubMed Central

Racine, Annie M.; Koscik, Rebecca L.; Berman, Sara E.; Nicholas, Christopher R.; Clark, Lindsay R.; Okonkwo, Ozioma C.; Rowley, Howard A.; Asthana, Sanjay; Bendlin, Barbara B.; Blennow, Kaj; Zetterberg, Henrik; Gleason, Carey E.; Carlsson, Cynthia M.

2016-01-01

The ability to detect preclinical Alzheimer’s disease is of great importance, as this stage of the Alzheimer’s continuum is believed to provide a key window for intervention and prevention. As Alzheimer’s disease is characterized by multiple pathological changes, a biomarker panel reflecting co-occurring pathology will likely be most useful for early detection. Towards this end, 175 late middle-aged participants (mean age 55.9 ± 5.7 years at first cognitive assessment, 70% female) were recruited from two longitudinally followed cohorts to undergo magnetic resonance imaging and lumbar puncture. Cluster analysis was used to group individuals based on biomarkers of amyloid pathology (cerebrospinal fluid amyloid-β42/amyloid-β40 assay levels), magnetic resonance imaging-derived measures of neurodegeneration/atrophy (cerebrospinal fluid-to-brain volume ratio, and hippocampal volume), neurofibrillary tangles (cerebrospinal fluid phosphorylated tau181 assay levels), and a brain-based marker of vascular risk (total white matter hyperintensity lesion volume). Four biomarker clusters emerged consistent with preclinical features of (i) Alzheimer’s disease; (ii) mixed Alzheimer’s disease and vascular aetiology; (iii) suspected non-Alzheimer’s disease aetiology; and (iv) healthy ageing. Cognitive decline was then analysed between clusters using longitudinal assessments of episodic memory, semantic memory, executive function, and global cognitive function with linear mixed effects modelling. Cluster 1 exhibited a higher intercept and greater rates of decline on tests of episodic memory. Cluster 2 had a lower intercept on a test of semantic memory and both Cluster 2 and Cluster 3 had steeper rates of decline on a test of global cognition. Additional analyses on Cluster 3, which had the smallest hippocampal volume, suggest that its biomarker profile is more likely due to hippocampal vulnerability and not to detectable specific volume loss exceeding the rate of normal ageing. Our results demonstrate that pathology, as indicated by biomarkers, in a preclinical timeframe is related to patterns of longitudinal cognitive decline. Such biomarker patterns may be useful for identifying at-risk populations to recruit for clinical trials. PMID:27324877
Biomarker clusters are differentially associated with longitudinal cognitive decline in late midlife.

PubMed

Racine, Annie M; Koscik, Rebecca L; Berman, Sara E; Nicholas, Christopher R; Clark, Lindsay R; Okonkwo, Ozioma C; Rowley, Howard A; Asthana, Sanjay; Bendlin, Barbara B; Blennow, Kaj; Zetterberg, Henrik; Gleason, Carey E; Carlsson, Cynthia M; Johnson, Sterling C

2016-08-01

The ability to detect preclinical Alzheimer's disease is of great importance, as this stage of the Alzheimer's continuum is believed to provide a key window for intervention and prevention. As Alzheimer's disease is characterized by multiple pathological changes, a biomarker panel reflecting co-occurring pathology will likely be most useful for early detection. Towards this end, 175 late middle-aged participants (mean age 55.9 ± 5.7 years at first cognitive assessment, 70% female) were recruited from two longitudinally followed cohorts to undergo magnetic resonance imaging and lumbar puncture. Cluster analysis was used to group individuals based on biomarkers of amyloid pathology (cerebrospinal fluid amyloid-β42/amyloid-β40 assay levels), magnetic resonance imaging-derived measures of neurodegeneration/atrophy (cerebrospinal fluid-to-brain volume ratio, and hippocampal volume), neurofibrillary tangles (cerebrospinal fluid phosphorylated tau181 assay levels), and a brain-based marker of vascular risk (total white matter hyperintensity lesion volume). Four biomarker clusters emerged consistent with preclinical features of (i) Alzheimer's disease; (ii) mixed Alzheimer's disease and vascular aetiology; (iii) suspected non-Alzheimer's disease aetiology; and (iv) healthy ageing. Cognitive decline was then analysed between clusters using longitudinal assessments of episodic memory, semantic memory, executive function, and global cognitive function with linear mixed effects modelling. Cluster 1 exhibited a higher intercept and greater rates of decline on tests of episodic memory. Cluster 2 had a lower intercept on a test of semantic memory and both Cluster 2 and Cluster 3 had steeper rates of decline on a test of global cognition. Additional analyses on Cluster 3, which had the smallest hippocampal volume, suggest that its biomarker profile is more likely due to hippocampal vulnerability and not to detectable specific volume loss exceeding the rate of normal ageing. Our results demonstrate that pathology, as indicated by biomarkers, in a preclinical timeframe is related to patterns of longitudinal cognitive decline. Such biomarker patterns may be useful for identifying at-risk populations to recruit for clinical trials. © The Author (2016). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Using Public Data for Comparative Proteome Analysis in Precision Medicine Programs.

PubMed

Hughes, Christopher S; Morin, Gregg B

2018-03-01

Maximizing the clinical utility of information obtained in longitudinal precision medicine programs would benefit from robust comparative analyses to known information to assess biological features of patient material toward identifying the underlying features driving their disease phenotype. Herein, the potential for utilizing publically deposited mass-spectrometry-based proteomics data to perform inter-study comparisons of cell-line or tumor-tissue materials is investigated. To investigate the robustness of comparison between MS-based proteomics studies carried out with different methodologies, deposited data representative of label-free (MS1) and isobaric tagging (MS2 and MS3 quantification) are utilized. In-depth quantitative proteomics data acquired from analysis of ovarian cancer cell lines revealed the robust recapitulation of observable gene expression dynamics between individual studies carried out using significantly different methodologies. The observed signatures enable robust inter-study clustering of cell line samples. In addition, the ability to classify and cluster tumor samples based on observed gene expression trends when using a single patient sample is established. With this analysis, relevant gene expression dynamics are obtained from a single patient tumor, in the context of a precision medicine analysis, by leveraging a large cohort of repository data as a comparator. Together, these data establish the potential for state-of-the-art MS-based proteomics data to serve as resources for robust comparative analyses in precision medicine applications. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using exploratory data analysis to identify and predict patterns of human Lyme disease case clustering within a multistate region, 2010-2014.

PubMed

Hendricks, Brian; Mark-Carew, Miguella

2017-02-01

Lyme disease is the most commonly reported vectorborne disease in the United States. The objective of our study was to identify patterns of Lyme disease reporting after multistate inclusion to mitigate potential border effects. County-level human Lyme disease surveillance data were obtained from Kentucky, Maryland, Ohio, Pennsylvania, Virginia, and West Virginia state health departments. Rate smoothing and Local Moran's I was performed to identify clusters of reporting activity and identify spatial outliers. A logistic generalized estimating equation was performed to identify significant associations in disease clustering over time. Resulting analyses identified statistically significant (P=0.05) clusters of high reporting activity and trends over time. High reporting activity aggregated near border counties in high incidence states, while low reporting aggregated near shared county borders in non-high incidence states. Findings highlight the need for exploratory surveillance approaches to describe the extent to which state level reporting affects accurate estimation of Lyme disease progression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Parallel and Scalable Clustering and Classification for Big Data in Geosciences

NASA Astrophysics Data System (ADS)

Riedel, M.

2015-12-01

Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
Including foreshocks and aftershocks in time-independent probabilistic seismic hazard analyses

USGS Publications Warehouse

Boyd, Oliver S.

2012-01-01

Time‐independent probabilistic seismic‐hazard analysis treats each source as being temporally and spatially independent; hence foreshocks and aftershocks, which are both spatially and temporally dependent on the mainshock, are removed from earthquake catalogs. Yet, intuitively, these earthquakes should be considered part of the seismic hazard, capable of producing damaging ground motions. In this study, I consider the mainshock and its dependents as a time‐independent cluster, each cluster being temporally and spatially independent from any other. The cluster has a recurrence time of the mainshock; and, by considering the earthquakes in the cluster as a union of events, dependent events have an opportunity to contribute to seismic ground motions and hazard. Based on the methods of the U.S. Geological Survey for a high‐hazard site, the inclusion of dependent events causes ground motions that are exceeded at probability levels of engineering interest to increase by about 10% but could be as high as 20% if variations in aftershock productivity can be accounted for reliably.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andersson, Gunther G., E-mail: gunther.andersson@flinders.edu.au, E-mail: vladimir.golovko@canterbury.ac.nz, E-mail: greg.metha@adelaide.edu.au; Al Qahtani, Hassan S.; Golovko, Vladimir B., E-mail: gunther.andersson@flinders.edu.au, E-mail: vladimir.golovko@canterbury.ac.nz, E-mail: greg.metha@adelaide.edu.au

Chemically made, atomically precise phosphine-stabilized clusters Au{sub 9}(PPh{sub 3}){sub 8}(NO{sub 3}){sub 3} were deposited on titania and silica from solutions at various concentrations and the samples heated under vacuum to remove the ligands. Metastable induced electron spectroscopy was used to determine the density of states at the surface, and X-ray photoelectron spectroscopy for analysing the composition of the surface. It was found for the Au{sub 9} cluster deposited on titania that the ligands react with the titania substrate. Based on analysis using the singular value decomposition algorithm, the series of MIE spectra can be described as a linear combination ofmore » 3 base spectra that are assigned to the spectra of the substrate, the phosphine ligands on the substrate, and the Au clusters anchored to titania after removal of the ligands. On silica, the Au clusters show significant agglomeration after heat treatment and no interaction of the ligands with the substrate can be identified.« less
The MMPI-2 in sexual harassment and discrimination litigants.

PubMed

Long, Barbara; Rouse, Steven V; Nelsen, R Owen; Butcher, James N

2004-06-01

In order to understand patterns of respondents on validity and clinical scales, this study analyzed archival Minnesota Multiphasic Personality Inventory 2s (MMPI-2s) produced by 192 women and 14 men who initiated legal claims of ongoing emotional harm related to workplace sexual harassment and discrimination. The MMPI-2s were administered as a part of a comprehensive psychiatric forensic evaluation of the claimants' current psychological condition. All validity and clinical scale scores were manually entered into the computer, and codetype and cluster analyses were obtained. Among the women, 28% produced a "normal limits" profile, providing no MMPI-2 support for their claims of ongoing emotional distress. Cluster analysis of the validity scales of the remaining profiles produced four distinctive clusters of profiles representing different approaches to the test items. Copyright 2004 Wiley Periodicals, Inc.
The co-occurrence of autistic traits and borderline personality disorder traits is associated to increased suicidal ideation in nonclinical young adults.

PubMed

Chabrol, Henri; Raynal, Patrick

2018-04-01

The co-occurrence of Autism Spectrum Disorder (ASD) and Borderline Personality Disorder (BPD) is not rare and has been linked to increased suicidality. Despite this significant comorbidity between ASD and BPD, no study had examined the co-occurrence of autistic traits and borderline personality disorder traits in the general population. The aim of the present study was to examine the co-occurrence of autistic and borderline traits in a non-clinical sample of young adults and its influence on the levels of suicidal ideation and depressive symptomatology. Participants were 474 college students who completed self-report questionnaires. Data were analysed using correlation and cluster analyses. Borderline personality traits and autistic traits were weakly correlated. However, cluster analysis yielded four groups: a low traits group, a borderline traits group, an autistic traits group, and a group characterized by high levels of both traits. Cluster analysis revealed that autistic and borderline traits can co-occur in a significant proportion of young adults. The high autistic and borderline traits group constituted 17% of the total sample and had higher level of suicidal ideation than the borderline traits group, despite similar levels of depressive symptoms. This result suggests that the higher suicidality observed in patients with comorbid ASD and BPD may extent to non-clinical individuals with high levels of co-occurrent autistic and borderline traits. Copyright © 2018 Elsevier Inc. All rights reserved.
Novel linkage disequilibrium clustering algorithm identifies new lupus genes on meta-analysis of GWAS datasets.

PubMed

Saeed, Mohammad

2017-05-01

Systemic lupus erythematosus (SLE) is a complex disorder. Genetic association studies of complex disorders suffer from the following three major issues: phenotypic heterogeneity, false positive (type I error), and false negative (type II error) results. Hence, genes with low to moderate effects are missed in standard analyses, especially after statistical corrections. OASIS is a novel linkage disequilibrium clustering algorithm that can potentially address false positives and negatives in genome-wide association studies (GWAS) of complex disorders such as SLE. OASIS was applied to two SLE dbGAP GWAS datasets (6077 subjects; ∼0.75 million single-nucleotide polymorphisms). OASIS identified three known SLE genes viz. IFIH1, TNIP1, and CD44, not previously reported using these GWAS datasets. In addition, 22 novel loci for SLE were identified and the 5 SLE genes previously reported using these datasets were verified. OASIS methodology was validated using single-variant replication and gene-based analysis with GATES. This led to the verification of 60% of OASIS loci. New SLE genes that OASIS identified and were further verified include TNFAIP6, DNAJB3, TTF1, GRIN2B, MON2, LATS2, SNX6, RBFOX1, NCOA3, and CHAF1B. This study presents the OASIS algorithm, software, and the meta-analyses of two publicly available SLE GWAS datasets along with the novel SLE genes. Hence, OASIS is a novel linkage disequilibrium clustering method that can be universally applied to existing GWAS datasets for the identification of new genes.
Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

PubMed

Sauzet, Odile; Peacock, Janet L

2017-07-20

The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
SUPERMODEL ANALYSIS OF A1246 AND J255: ON THE EVOLUTION OF GALAXY CLUSTERS FROM HIGH TO LOW ENTROPY STATES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fusco-Femiano, R.; Lapi, A., E-mail: roberto.fuscofemiano@iaps.inaf.it

2015-02-10

We present an analysis of high-quality X-ray data out to the virial radius for the two galaxy clusters A1246 and GMBCG J255.34805+64.23661 (J255) by means of our entropy-based SuperModel. For A1246 we find that the spherically averaged entropy profile of the intracluster medium (ICM) progressively flattens outward, and that a nonthermal pressure component amounting to ≈20% of the total is required to support hydrostatic equilibrium in the outskirts; there we also estimate a modest value C ≈ 1.6 of the ICM clumping factor. These findings agree with previous analyses on other cool-core, relaxed clusters, and lend further support to themore » picture by Lapi et al. that relates the entropy flattening, the development of the nonthermal pressure component, and the azimuthal variation of ICM properties to weakening boundary shocks. In this scenario clusters are born in a high-entropy state throughout, and are expected to develop on similar timescales a low-entropy state both at the center due to cooling, and in the outskirts due to weakening shocks. However, the analysis of J255 testifies how such a typical evolutionary course can be interrupted or even reversed by merging especially at intermediate redshift, as predicted by Cavaliere et al. In fact, a merger has rejuvenated the ICM of this cluster at z ≈ 0.45 by reestablishing a high-entropy state in the outskirts, while leaving intact or erasing only partially the low-entropy, cool core at the center.« less
Transcriptomic analysis of neuregulin-1 regulated genes following ischemic stroke by computational identification of promoter binding sites: A role for the ETS-1 transcription factor.

PubMed

Surles-Zeigler, Monique C; Li, Yonggang; Distel, Timothy J; Omotayo, Hakeem; Ge, Shaokui; Ford, Byron D

2018-01-01

Ischemic stroke is a major cause of mortality in the United States. We previously showed that neuregulin-1 (NRG1) was neuroprotective in rat models of ischemic stroke. We used gene expression profiling to understand the early cellular and molecular mechanisms of NRG1's effects after the induction of ischemia. Ischemic stroke was induced by middle cerebral artery occlusion (MCAO). Rats were allocated to 3 groups: (1) control, (2) MCAO and (3) MCAO + NRG1. Cortical brain tissues were collected three hours following MCAO and NRG1 treatment and subjected to microarray analysis. Data and statistical analyses were performed using R/Bioconductor platform alongside Genesis, Ingenuity Pathway Analysis and Enrichr software packages. There were 2693 genes differentially regulated following ischemia and NRG1 treatment. These genes were organized by expression patterns into clusters using a K-means clustering algorithm. We further analyzed genes in clusters where ischemia altered gene expression, which was reversed by NRG1 (clusters 4 and 10). NRG1, IRS1, OPA3, and POU6F1 were central linking (node) genes in cluster 4. Conserved Transcription Factor Binding Site Finder (CONFAC) identified ETS-1 as a potential transcriptional regulator of NRG1 suppressed genes following ischemia. A transcription factor activity array showed that ETS-1 activity was increased 2-fold, 3 hours following ischemia and this activity was attenuated by NRG1. These findings reveal key early transcriptional mechanisms associated with neuroprotection by NRG1 in the ischemic penumbra.
Clustering of longitudinal data by using an extended baseline: A new method for treatment efficacy clustering in longitudinal data.

PubMed

Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine

2018-01-01

Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.
Comprehensive Genomic Analyses of the OM43 Clade, Including a Novel Species from the Red Sea, Indicate Ecotype Differentiation among Marine Methylotrophs

PubMed Central

Jimenez-Infante, Francy; Ngugi, David Kamanda; Vinu, Manikandan; Alam, Intikhab; Kamau, Allan Anthony; Blom, Jochen; Bajic, Vladimir B.

2015-01-01

The OM43 clade within the family Methylophilaceae of Betaproteobacteria represents a group of methylotrophs that play important roles in the metabolism of C1 compounds in marine environments and other aquatic environments around the globe. Using dilution-to-extinction cultivation techniques, we successfully isolated a novel species of this clade (here designated MBRS-H7) from the ultraoligotrophic open ocean waters of the central Red Sea. Phylogenomic analyses indicate that MBRS-H7 is a novel species that forms a distinct cluster together with isolate KB13 from Hawaii (Hawaii-Red Sea [H-RS] cluster) that is separate from the cluster represented by strain HTCC2181 (from the Oregon coast). Phylogenetic analyses using the robust 16S-23S internal transcribed spacer revealed a potential ecotype separation of the marine OM43 clade members, which was further confirmed by metagenomic fragment recruitment analyses that showed trends of higher abundance in low-chlorophyll and/or high-temperature provinces for the H-RS cluster but a preference for colder, highly productive waters for the HTCC2181 cluster. This potential environmentally driven niche differentiation is also reflected in the metabolic gene inventories, which in the case of the H-RS cluster include those conferring resistance to high levels of UV irradiation, temperature, and salinity. Interestingly, we also found different energy conservation modules between these OM43 subclades, namely, the existence of the NADH:quinone oxidoreductase complex I (NUO) system in the H-RS cluster and the nonhomologous NADH:quinone oxidoreductase (NQR) system in the HTCC2181 cluster, which might have implications for their overall energetic yields. PMID:26655752
Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

NASA Astrophysics Data System (ADS)

Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

2018-04-01

Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.
Genetic differences in the two main groups of the Japanese population based on autosomal SNPs and haplotypes.

PubMed

Yamaguchi-Kabata, Yumi; Tsunoda, Tatsuhiko; Kumasaka, Natsuhiko; Takahashi, Atsushi; Hosono, Naoya; Kubo, Michiaki; Nakamura, Yusuke; Kamatani, Naoyuki

2012-05-01

Although the Japanese population has a rather low genetic diversity, we recently confirmed the presence of two main clusters (the Hondo and Ryukyu clusters) through principal component analysis of genome-wide single-nucleotide polymorphism (SNP) genotypes. Understanding the genetic differences between the two main clusters requires further genome-wide analyses based on a dense SNP set and comparison of haplotype frequencies. In the present study, we determined haplotypes for the Hondo cluster of the Japanese population by detecting SNP homozygotes with 388,591 autosomal SNPs from 18,379 individuals and estimated the haplotype frequencies. Haplotypes for the Ryukyu cluster were inferred by a statistical approach using the genotype data from 504 individuals. We then compared the haplotype frequencies between the Hondo and Ryukyu clusters. In most genomic regions, the haplotype frequencies in the Hondo and Ryukyu clusters were very similar. However, in addition to the human leukocyte antigen region on chromosome 6, other genomic regions (chromosomes 3, 4, 5, 7, 10 and 12) showed dissimilarities in haplotype frequency. These regions were enriched for genes involved in the immune system, cell-cell adhesion and the intracellular signaling cascade. These differentiated genomic regions between the Hondo and Ryukyu clusters are of interest because they (1) should be examined carefully in association studies and (2) likely contain genes responsible for morphological or physiological differences between the two groups.
Spectroscopic Confirmation of Five Galaxy Clusters at z > 1.25 in the 2500 deg^2 SPT-SZ Survey

NASA Astrophysics Data System (ADS)

Khullar, Gourav; Bleem, Lindsey; Bayliss, Matthew; Gladders, Michael; South Pole Telescope (SPT) Collaboration

2018-06-01

We present spectroscopic confirmation of 5 galaxy clusters at 1.25 < z < 1.5, discovered in the 2500 deg2 South Pole Telescope Sunyaev-Zel’dovich (SPT-SZ) survey. These clusters, taken from a nearly redshift-independent mass-limited sample of clusters, have multi-wavelength follow-up imaging data from the X-ray to the near-IR, and currently form the most homogenous massive high-redshift cluster sample in existence. We briefly describe the analysis pipeline used on the low S/N spectra of these faint galaxies, and describing the multiple techniques used to extract robust redshifts from a combination of absorption-line (Ca II H&K doublet - λλ3934,3968Å) and emission-line ([OII] λλ3727,3729Å) spectral features. We present several ensemble analyses of cluster member galaxies that demonstrate the reliability of the measured redshifts. We also identify modest [OII] emission and pronounced CN and Hδ absorption in a composite stacked spectrum of 28 low S/N passive galaxy spectra with redshifts derived primarily from Ca II H&K features. This work increases the number of spectroscopically-confirmed SPT-SZ galaxy clusters at z > 1.25 from 2 to 7, further demonstrating the efficacy of SZ selection for the highest redshift massive clusters, and enabling further detailed study of these confirmed systems.
Connecting Different Data Sources to Assess the Interconnections between Biosecurity, Health, Welfare, and Performance in Commercial Pig Farms in Great Britain.

PubMed

Pandolfi, Fanny; Edwards, Sandra A; Maes, Dominiek; Kyriazakis, Ilias

2018-01-01

This study aimed to provide an overview of the interconnections between biosecurity, health, welfare, and performance in commercial pig farms in Great Britain. We collected on-farm data about the level of biosecurity and animal performance in 40 fattening pig farms and 28 breeding pig farms between 2015 and 2016. We identified interconnections between these data, slaughterhouse health indicators, and welfare indicator records in fattening pig farms. After achieving the connections between databases, a secondary data analysis was performed to assess the interconnections between biosecurity, health, welfare, and performance using correlation analysis, principal component analysis, and hierarchical clustering. Although we could connect the different data sources the final sample size was limited, suggesting room for improvement in database connection to conduct secondary data analyses. The farm biosecurity scores ranged from 40 to 90 out of 100, with internal biosecurity scores being lower than external biosecurity scores. Our analysis suggested several interconnections between health, welfare, and performance. The initial correlation analysis showed that the prevalence of lameness and severe tail lesions was associated with the prevalence of enzootic pneumonia-like lesions and pyaemia, and the prevalence of severe body marks was associated with several disease indicators, including peritonitis and milk spots ( r > 0.3; P < 0.05). Higher average daily weight gain (ADG) was associated with lower prevalence of pleurisy ( r > 0.3; P < 0.05), but no connection was identified between mortality and health indicators. A subsequent cluster analysis enabled identification of patterns which considered concurrently indicators of health, welfare, and performance. Farms from cluster 1 had lower biosecurity scores, lower ADG, and higher prevalence of several disease and welfare indicators. Farms from cluster 2 had higher biosecurity scores than cluster 1, but a higher prevalence of pigs requiring hospitalization and lameness which confirmed the correlation between biosecurity and the prevalence of pigs requiring hospitalization ( r > 0.3; P < 0.05). Farms from cluster 3 had higher biosecurity, higher ADG, and lower prevalence for some disease and welfare indicators. The study suggests a smaller impact of biosecurity on issues such as mortality, prevalence of lameness, and pig requiring hospitalization. The correlations and the identified clusters suggested the importance of animal welfare for the pig industry.
Variation in the fumonisin biosynthetic gene cluster in fumonisin-producing and nonproducing black aspergilli.

PubMed

Susca, Antonia; Proctor, Robert H; Butchko, Robert A E; Haidukowski, Miriam; Stea, Gaetano; Logrieco, Antonio; Moretti, Antonio

2014-12-01

The ability to produce fumonisin mycotoxins varies among members of the black aspergilli. Previously, analyses of selected genes in the fumonisin biosynthetic gene (fum) cluster in black aspergilli from California grapes indicated that fumonisin-nonproducing isolates of Aspergillus welwitschiae lack six fum genes, but nonproducing isolates of Aspergillus niger do not. In the current study, analyses of black aspergilli from grapes from the Mediterranean Basin indicate that the genomic context of the fum cluster is the same in isolates of A. niger and A. welwitschiae regardless of fumonisin-production ability and that full-length clusters occur in producing isolates of both species and nonproducing isolates of A. niger. In contrast, the cluster has undergone an eight-gene deletion in fumonisin-nonproducing isolates of A. welwitschiae. Phylogenetic analyses suggest each species consists of a mixed population of fumonisin-producing and nonproducing individuals, and that existence of both production phenotypes may provide a selective advantage to these species. Differences in gene content of fum cluster homologues and phylogenetic relationships of fum genes suggest that the mutation(s) responsible for the nonproduction phenotype differs, and therefore arose independently, in the two species. Partial fum cluster homologues were also identified in genome sequences of four other black Aspergillus species. Gene content of these partial clusters and phylogenetic relationships of fum sequences indicate that non-random partial deletion of the cluster has occurred multiple times among the species. This in turn suggests that an intact cluster and fumonisin production were once more widespread among black aspergilli. Copyright © 2014 Elsevier Inc. All rights reserved.
Classification and discrimination of pediatric patients undergoing open heart surgery with and without methylprednisolone treatment by cytomics

NASA Astrophysics Data System (ADS)

Bocsi, Jozsef; Mittag, Anja; Pierzchalski, Arkadiusz; Osmancik, Pavel; Dähnert, Ingo; Tárnok, Attila

2011-02-01

Introduction: Methylprednisolone (MP) is frequently preoperatively administered in children undergoing open heart surgery. The aim of this medication is to inhibit overshooting immune responses. Earlier studies demonstrated cellular and humoral immunological changes in pediatric patients undergoing heart surgeries with and without MP administration. Here in a retrospective study we investigated the modulation of the cellular immune response by MP. The aim was to identify suitable parameters characterizing MP effects by cluster analysis. Methods: Blood samples were analysed from two aged matched groups with surgical correction of septum defects. Group without MP treatment consisted of 10 patients; MP was administered on 21 patients (median dose: 11mg/kg) before cardiopulmonary bypass (CPB). EDTA anticoagulated blood was obtained 24 h preoperatively, after anesthesia, at CPB begin and end (CPB2), 4h, 24h, 48h after surgery, at discharge and at out-patient followup (8.2; 3.3-12.2 month after surgery; median and IQR). Flow cytometry showed the biggest MP relevant changes at CPB2 and 4h postoperatively. They were used for clustering analysis. Classification was made by discriminant analysis and cluster analysis by means of Genes@work software. Results & conclusion: 146 parameters were obtained from analysis. Cross-validation revealed several parameters being able to discriminate between MP groups and to identify immune modulation. MP administration resulted in a delayed activation of monocytes, increased ratio of neutrophils, reduced T-lymphocytes counts. Cluster analysis demonstrated that classification of patients is possible based on the identified cytomics parameters. Further investigation of these parameters might help to understand the MP effects in pediatric open heart surgery.
Multilocus microsatellite typing shows three different genetic clusters of Leishmania major in Iran.

PubMed

Mahnaz, Tashakori; Al-Jawabreh, Amer; Kuhls, Katrin; Schönian, Gabriele

2011-10-01

Ten polymorphic microsatellite markers were used to analyse 25 strains of Leishmania major collected from cutaneous leishmaniasis cases in different endemic areas in Iran. Nine of the markers were polymorphic, revealing 21 different genotypes. The data displayed significant microsatellite polymorphism with rare allelic heterozygosity. Bayesian statistic and distance based analyses identified three genetic clusters among the 25 strains analysed. Cluster I represented mainly strains isolated in the west and south-west of Iran, with the exception of four strains originating from central Iran. Cluster II comprised strains from the central part of Iran, and cluster III included only strains from north Iran. The geographical distribution of L. major in Iran was supported by comparing the microsatellite profiles of the 25 Iranian strains to those of 105 strains collected in 19 Asian and African countries. The Iranian clusters I and II were separated from three previously described populations comprising strains from Africa, the Middle East and Central Asia whereas cluster III grouped together with the Central Asian population. The considerable genetic variability of L. major might be related to the existence of different populations of Phlebotomus papatasi and/or to differences in reservoir host abundance in different parts of Iran. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

PubMed

Yokoyama, Eiji; Uchimura, Masako

2007-11-01

Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
The X-CLASS-redMaPPer galaxy cluster comparison. I. Identification procedures

NASA Astrophysics Data System (ADS)

Sadibekova, T.; Pierre, M.; Clerc, N.; Faccioli, L.; Gastaud, R.; Le Fevre, J.-P.; Rozo, E.; Rykoff, E.

2014-11-01

Context. This paper is the first in a series undertaking a comprehensive correlation analysis between optically selected and X-ray-selected cluster catalogues. The rationale of the project is to develop a holistic picture of galaxy clusters utilising optical and X-ray-cluster-selected catalogues with well-understood selection functions. Aims: Unlike most of the X-ray/optical cluster correlations to date, the present paper focuses on the non-matching objects in either waveband. We investigate how the differences observed between the optical and X-ray catalogues may stem from (1) a shortcoming of the detection algorithms; (2) dispersion in the X-ray/optical scaling relations; or (3) substantial intrinsic differences between the cluster populations probed in the X-ray and optical bands. The aim is to inventory and elucidate these effects in order to account for selection biases in the further determination of X-ray/optical cluster scaling relations. Methods: We correlated the X-CLASS serendipitous cluster catalogue extracted from the XMM archive with the redMaPPer optical cluster catalogue derived from the Sloan Digital Sky Survey (DR8). We performed a detailed and, in large part, interactive analysis of the matching output from the correlation. The overlap between the two catalogues has been accurately determined and possible cluster positional errors were manually recovered. The final samples comprise 270 and 355 redMaPPer and X-CLASS clusters, respectively. X-ray cluster matching rates were analysed as a function of optical richness. In the second step, the redMaPPer clusters were correlated with the entire X-ray catalogue, containing point and uncharacterised sources (down to a few 10-15 erg s-1 cm-2 in the [0.5-2] keV band). A stacking analysis was performed for the remaining undetected optical clusters. Results: We find that all rich (λ ≥ 80) clusters are detected in X-rays out to z = 0.6. Below this redshift, the richness threshold for X-ray detection steadily decreases with redshift. Likewise, all X-ray bright clusters are detected by redMaPPer. After correcting for obvious pipeline shortcomings (about 10% of the cases both in optical and X-ray), ~50% of the redMaPPer (down to a richness of 20) are found to coincide with an X-CLASS cluster; when considering X-ray sources of any type, this fraction increases to ~80%; for the remaining objects, the stacking analysis finds a weak signal within 0.5 Mpc around the cluster optical centres. The fraction of clusters totally dominated by AGN-type emission appears to be a few percent. Conversely, ~40% of the X-CLASS clusters are identified with a redMaPPer (down to a richness of 20) - part of the non-matches being due to the X-CLASS sample extending further out than redMaPPer (z< 1.5 vs. z< 0.6), but extending the correlation down to a richness of 5 raises the matching rate to ~65%. Conclusions: This state-of-the-art study involving two well-validated cluster catalogues has shown itself to be complex, and it points to a number of issues inherent to blind cross-matching, owing both to pipeline shortcomings and cluster peculiar properties. These can only been accounted for after a manual check. The combined X-ray and optical scaling relations will be presented in a subsequent article.
Dark Energy Survey Year 1 Results: Multi-Probe Methodology and Simulated Likelihood Analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krause, E.; et al.

We present the methodology for and detail the implementation of the Dark Energy Survey (DES) 3x2pt DES Year 1 (Y1) analysis, which combines configuration-space two-point statistics from three different cosmological probes: cosmic shear, galaxy-galaxy lensing, and galaxy clustering, using data from the first year of DES observations. We have developed two independent modeling pipelines and describe the code validation process. We derive expressions for analytical real-space multi-probe covariances, and describe their validation with numerical simulations. We stress-test the inference pipelines in simulated likelihood analyses that vary 6-7 cosmology parameters plus 20 nuisance parameters and precisely resemble the analysis to be presented in the DES 3x2pt analysis paper, using a variety of simulated input data vectors with varying assumptions. We find that any disagreement between pipelines leads to changes in assigned likelihoodmore » $$\\Delta \\chi^2 \\le 0.045$$ with respect to the statistical error of the DES Y1 data vector. We also find that angular binning and survey mask do not impact our analytic covariance at a significant level. We determine lower bounds on scales used for analysis of galaxy clustering (8 Mpc$$~h^{-1}$$) and galaxy-galaxy lensing (12 Mpc$$~h^{-1}$$) such that the impact of modeling uncertainties in the non-linear regime is well below statistical errors, and show that our analysis choices are robust against a variety of systematics. These tests demonstrate that we have a robust analysis pipeline that yields unbiased cosmological parameter inferences for the flagship 3x2pt DES Y1 analysis. We emphasize that the level of independent code development and subsequent code comparison as demonstrated in this paper is necessary to produce credible constraints from increasingly complex multi-probe analyses of current data.« less
Clonal structure in Ichthyobacterium seriolicida, the causative agent of bacterial haemolytic jaundice in yellowtail, Seriola quinqueradiata, inferred from molecular epidemiological analysis.

PubMed

Matsuyama, T; Fukuda, Y; Sakai, T; Tanimoto, N; Nakanishi, M; Nakamura, Y; Takano, T; Nakayasu, C

2017-08-01

Bacterial haemolytic jaundice caused by Ichthyobacterium seriolicida has been responsible for mortality in farmed yellowtail, Seriola quinqueradiata, in western Japan since the 1980s. In this study, polymorphic analysis of I. seriolicida was performed using three molecular methods: amplified fragment length polymorphism (AFLP) analysis, multilocus sequence typing (MLST) and multiple-locus variable-number tandem repeat analysis (MLVA). Twenty-eight isolates were analysed using AFLP, while 31 isolates were examined by MLST and MLVA. No polymorphisms were identified by AFLP analysis using EcoRI and MseI, or by MLST of internal fragments of eight housekeeping genes. However, MLVA revealed variation in repeat numbers of three elements, allowing separation of the isolates into 16 sequence types. The unweighted pair group method using arithmetic averages cluster analysis of the MLVA data identified four major clusters, and all isolates belonged to clonal complexes. It is likely that I. seriolicida populations share a common ancestor, which may be a recently introduced strain. © 2016 John Wiley & Sons Ltd.

Effects of different preservation methods on inter simple sequence repeat (ISSR) and random amplified polymorphic DNA (RAPD) molecular markers in botanic samples.

PubMed

Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia

2017-04-01

To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Rocky Mountain spotted fever in Georgia, 1961-75: analysis of social and environmental factors affecting occurrence.

PubMed Central

Newhouse, V F; Choi, K; Holman, R C; Thacker, S B; D'Angelo, L J; Smith, J D

1986-01-01

For the period of 1961 through 1975, 10 geographic and sociologic variables in each of the 159 counties of Georgia were analyzed to determine how they were correlated with the occurrence of Rocky Mountain spotted fever (RMSF). Combinations of variables were transformed into a smaller number of factors using principal-component analysis. Based upon the relative values of these factors, geographic areas of similarity were delineated by cluster analysis. It was found by use of these analyses that the counties of the State formed four similarity clusters, which we called south, central, lower north and upper north. When the incidence of RMSF was subsequently calculated for each of these regions of similarity, the regions had differing RMSF incidence; low in the south and upper north, moderate in the central, and high in the lower north. The four similarity clusters agreed closely with the incidence of RMSF when both were plotted on a map. Thus, when analyzed simultaneously, the 10 variables selected could be used to predict the occurrence of RMSF. The most important variables were those of climate and geography. Of secondary, but still major importance, were the changes over the 15-year period in variables associated with humans and their environmental alterations. Detailed examination of these factors has permitted quantitative evaluation of the simultaneous impacts of the geographic and sociologic variables on the occurrence of RMSF in Georgia. These analyses could be updated to reflect changes in the relevant variables and tested as a means of identifying new high risk areas for RMSF in the State. More generally, this method might be adapted to clarify our understanding of the relative importance of individual variables in the ecology of other diseases or environmental health problems. PMID:3090609
Automated image analysis for quantitative fluorescence in situ hybridization with environmental samples.

PubMed

Zhou, Zhi; Pons, Marie Noëlle; Raskin, Lutgarde; Zilles, Julie L

2007-05-01

When fluorescence in situ hybridization (FISH) analyses are performed with complex environmental samples, difficulties related to the presence of microbial cell aggregates and nonuniform background fluorescence are often encountered. The objective of this study was to develop a robust and automated quantitative FISH method for complex environmental samples, such as manure and soil. The method and duration of sample dispersion were optimized to reduce the interference of cell aggregates. An automated image analysis program that detects cells from 4',6'-diamidino-2-phenylindole (DAPI) micrographs and extracts the maximum and mean fluorescence intensities for each cell from corresponding FISH images was developed with the software Visilog. Intensity thresholds were not consistent even for duplicate analyses, so alternative ways of classifying signals were investigated. In the resulting method, the intensity data were divided into clusters using fuzzy c-means clustering, and the resulting clusters were classified as target (positive) or nontarget (negative). A manual quality control confirmed this classification. With this method, 50.4, 72.1, and 64.9% of the cells in two swine manure samples and one soil sample, respectively, were positive as determined with a 16S rRNA-targeted bacterial probe (S-D-Bact-0338-a-A-18). Manual counting resulted in corresponding values of 52.3, 70.6, and 61.5%, respectively. In two swine manure samples and one soil sample 21.6, 12.3, and 2.5% of the cells were positive with an archaeal probe (S-D-Arch-0915-a-A-20), respectively. Manual counting resulted in corresponding values of 22.4, 14.0, and 2.9%, respectively. This automated method should facilitate quantitative analysis of FISH images for a variety of complex environmental samples.
A new strategy for earthquake focal mechanisms using waveform-correlation-derived relative polarities and cluster analysis: Application to the 2014 Long Valley Caldera earthquake swarm

USGS Publications Warehouse

Shelly, David R.; Hardebeck, Jeanne L.; Ellsworth, William L.; Hill, David P.

2016-01-01

In microseismicity analyses, reliable focal mechanisms can typically be obtained for only a small subset of located events. We address this limitation here, presenting a framework for determining robust focal mechanisms for entire populations of very small events. To achieve this, we resolve relative P and S wave polarities between pairs of waveforms by using their signed correlation coefficients—a by-product of previously performed precise earthquake relocation. We then use cluster analysis to group events with similar patterns of polarities across the network. Finally, we apply a standard mechanism inversion to the grouped data, using either catalog or correlation-derived P wave polarity data sets. This approach has great potential for enhancing analyses of spatially concentrated microseismicity such as earthquake swarms, mainshock-aftershock sequences, and industrial reservoir stimulation or injection-induced seismic sequences. To demonstrate its utility, we apply this technique to the 2014 Long Valley Caldera earthquake swarm. In our analysis, 85% of the events (7212 out of 8494 located by Shelly et al. [2016]) fall within five well-constrained mechanism clusters, more than 12 times the number with network-determined mechanisms. Of the earthquakes we characterize, 3023 (42%) have magnitudes smaller than 0.0. We find that mechanism variations are strongly associated with corresponding hypocentral structure, yet mechanism heterogeneity also occurs where it cannot be resolved by hypocentral patterns, often confined to small-magnitude events. Small (5–20°) rotations between mechanism orientations and earthquake location trends persist when we apply 3-D velocity models and might reflect a geometry of en echelon, interlinked shear, and dilational faulting.
Time-dependent risks of cancer clustering among couples: a nationwide population-based cohort study in Taiwan.

PubMed

Wang, Jong-Yi; Liang, Yia-Wen; Yeh, Chun-Chen; Liu, Chiu-Shong; Wang, Chen-Yu

2018-02-21

Spousal clustering of cancer warrants attention. Whether the common environment or high-age vulnerability determines cancer clustering is unclear. The risk of clustering in couples versus non-couples is undetermined. The time to cancer clustering after the first cancer diagnosis is yet to be reported. This study investigated cancer clustering over time among couples by using nationwide data. A cohort of 5643 married couples in the 2002-2013 Taiwan National Health Insurance Research Database was identified and randomly matched with 5643 non-couple pairs through dual propensity score matching. Factors associated with clustering (both spouses with tumours) were analysed by using the Cox proportional hazard model. Propensity-matched analysis revealed that the risk of clustering of all tumours among couples (13.70%) was significantly higher than that among non-couples (11.84%) (OR=1.182, 95% CI 1.058 to 1.321, P=0.0031). The median time to clustering of all tumours and of malignant tumours was 2.92 and 2.32 years, respectively. Risk characteristics associated with clustering included high age and comorbidity. Shared environmental factors among spouses might be linked to a high incidence of cancer clustering. Cancer incidence in one spouse may signal cancer vulnerability in the other spouse. Promoting family-oriented cancer care in vulnerable families and preventing shared lifestyle risk factors for cancer are suggested. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

PubMed

Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

2012-01-01

Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.
A Multivariate Model and Analysis of Competitive Strategy in the U.S. Hardwood Lumber Industry

Treesearch

Robert J. Bush; Steven A. Sinclair

1991-01-01

Business-level competitive strategy in the hardwood lumber industry was modeled through the identification of strategic groups among large U.S. hardwood lumber producers. Strategy was operationalized using a measure based on the variables developed by Dess and Davis (1984). Factor and cluster analyses were used to define strategic groups along the dimensions of cost...
Attitudes to Agricultural Policy and Farming Futures in the Context of the 2003 CAP Reform: A Comparison of Farmers in Selected Established and New Member States

ERIC Educational Resources Information Center

Gorton, Matthew; Douarin, Elodie; Davidova, Sophia; Latruffe, Laure

2008-01-01

Farmers' attitudes, to agricultural production, diversification and policy support, and behavioural intentions in five Member States of the EU (France, Lithuania, Slovakia, Sweden, England) are analysed comparatively. Groups of farmers with similarly held attitudes are identified using cluster analysis to investigate whether differences in…
Rising prevalence of non-B HIV-1 subtypes in North Carolina and evidence for local onward transmission.

PubMed

Dennis, Ann M; Hué, Stephane; Learner, Emily; Sebastian, Joseph; Miller, William C; Eron, Joseph J

2017-01-01

HIV-1 diversity is increasing in North American and European cohorts which may have public health implications. However, little is known about non-B subtype diversity in the southern United States, despite the region being the epicenter of the nation's epidemic. We characterized HIV-1 diversity and transmission clusters to identify the extent to which non-B strains are transmitted locally. We conducted cross-sectional analyses of HIV-1 partial pol sequences collected from 1997 to 2014 from adults accessing routine clinical care in North Carolina (NC). Subtypes were evaluated using COMET and phylogenetic analysis. Putative transmission clusters were identified using maximum-likelihood trees. Clusters involving non-B strains were confirmed and their dates of origin were estimated using Bayesian phylogenetics. Data were combined with demographic information collected at the time of sample collection and country of origin for a subset of patients. Among 24,972 sequences from 15,246 persons, the non-B subtype prevalence increased from 0% to 3.46% over the study period. Of 325 persons with non-B subtypes, diversity was high with over 15 pure subtypes and recombinants; subtype C (28.9%) and CRF02_AG (24.0%) were most common. While identification of transmission clusters was lower for persons with non-B versus B subtypes, several local transmission clusters (≥3 persons) involving non-B subtypes were identified and all were presumably due to heterosexual transmission. Prevalence of non-B subtype diversity remains low in NC but a statistically significant rise was identified over time which likely reflects multiple importation. However, the combined phylogenetic clustering analysis reveals evidence for local onward transmission. Detection of these non-B clusters suggests heterosexual transmission and may guide diagnostic and prevention interventions.
Projected alignment of non-sphericities of stellar, gas, and dark matter distributions in galaxy clusters: analysis of the Horizon-AGN simulation

NASA Astrophysics Data System (ADS)

Okabe, Taizo; Nishimichi, Takahiro; Oguri, Masamune; Peirani, Sébastien; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi

2018-07-01

While various observations measured ellipticities of galaxy clusters and alignments between orientations of the brightest cluster galaxies and their host clusters, there are only a handful of numerical simulations that implement realistic baryon physics to allow direct comparisons with those observations. Here, we investigate ellipticities of galaxy clusters and alignments between various components of them and the central galaxies in the state-of-the-art cosmological hydrodynamical simulation Horizon-AGN, which contains dark matter, stellar, and gas components in a large simulation box of (100h-1 Mpc)3 with high spatial resolution (˜1 kpc). We estimate ellipticities of total matter, dark matter, stellar, gas surface mass density distributions, X-ray surface brightness, and the Compton y-parameter of the Sunyaev-Zel'dovich effect, as well as alignments between these components and the central galaxies for 120 projected images of galaxy clusters with masses M200 > 5 × 1013 M⊙. Our results indicate that the distributions of these components are well aligned with the major axes of the central galaxies, with the root-mean-square value of differences of their position angles of ˜20°, which vary little from inner to the outer regions. We also estimate alignments of these various components with total matter distributions, and find tighter alignments than those for central galaxies with the root-mean-square value of ˜15°. We compare our results with previous observations of ellipticities and position angle alignments and find reasonable agreements. The comprehensive analysis presented in this paper provides useful prior information for analysing stacked lensing signals as well as designing future observations to study ellipticities and alignments of galaxy clusters.
A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa

PubMed Central

Petegrosso, Raphael; Tolar, Jakub

2018-01-01

Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. PMID:29630593
Characterizing the spatial distribution of brown marmorated stink bug, Halyomorpha halys Stål (Hemiptera: Pentatomidae), populations in peach orchards

PubMed Central

Hahn, Noel G.

2017-01-01

Geospatial analyses were used to investigate the spatial distribution of populations of Halyomorpha halys, an important invasive agricultural pest in mid-Atlantic peach orchards. This spatial analysis will improve efficiency by allowing growers and farm managers to predict insect arrangement and target management strategies. Data on the presence of H. halys were collected from five peach orchards at four farms in New Jersey from 2012–2014 located in different land-use contexts. A point pattern analysis, using Ripley’s K function, was used to describe clustering of H. halys. In addition, the clustering of damage indicative of H. halys feeding was described. With low populations early in the growing season, H. halys did not exhibit signs of clustering in the orchards at most distances. At sites with low populations throughout the season, clustering was not apparent. However, later in the season, high infestation levels led to more evident clustering of H. halys. Damage, although present throughout the entire orchard, was found at low levels. When looking at trees with greater than 10% fruit damage, damage was shown to cluster in orchards. The Moran’s I statistic showed that spatial autocorrelation of H. halys was present within the orchards on the August sample dates, in relation to both populations density and levels of damage. Kriging the abundance of H. halys and the severity of damage to peaches revealed that the estimations of these are generally found in the same region of the orchards. This information on the clustering of H. halys populations will be useful to help predict presence of insects for use in management or scouting programs. PMID:28362797
Sequence analyses reveal that a TPR-DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR-DP domains and prokaryotic GerD proteins.

PubMed

Hernández Torres, Jorge; Papandreou, Nikolaos; Chomilier, Jacques

2009-05-01

The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR-DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR-DP domains.
Analysis of Patent Databases Using VxInsight

DOE Office of Scientific and Technical Information (OSTI.GOV)

BOYACK,KEVIN W.; WYLIE,BRIAN N.; DAVIDSON,GEORGE S.

2000-12-12

We present the application of a new knowledge visualization tool, VxInsight, to the mapping and analysis of patent databases. Patent data are mined and placed in a database, relationships between the patents are identified, primarily using the citation and classification structures, then the patents are clustered using a proprietary force-directed placement algorithm. Related patents cluster together to produce a 3-D landscape view of the tens of thousands of patents. The user can navigate the landscape by zooming into or out of regions of interest. Querying the underlying database places a colored marker on each patent matching the query. Automatically generatedmore » labels, showing landscape content, update continually upon zooming. Optionally, citation links between patents may be shown on the landscape. The combination of these features enables powerful analyses of patent databases.« less
The evolution of Lachancea thermotolerans is driven by geographical determination, anthropisation and flux between different ecosystems

PubMed Central

Bely, Marina; Masneuf-Pomarede, Isabelle; Jiranek, Vladimir; Albertin, Warren

2017-01-01

The yeast Lachancea thermotolerans (formerly Kluyveromyces thermotolerans) is a species with remarkable, yet underexplored, biotechnological potential. This ubiquist occupies a range of natural and anthropic habitats covering a wide geographic span. To gain an insight into L. thermotolerans population diversity and structure, 172 isolates sourced from diverse habitats worldwide were analysed using a set of 14 microsatellite markers. The resultant clustering revealed that the evolution of L. thermotolerans has been driven by the geography and ecological niche of the isolation sources. Isolates originating from anthropic environments, in particular grapes and wine, were genetically close, thus suggesting domestication events within the species. The observed clustering was further validated by several means including, population structure analysis, F-statistics, Mantel’s test and the analysis of molecular variance (AMOVA). Phenotypic performance of isolates was tested using several growth substrates and physicochemical conditions, providing added support for the clustering. Altogether, this study sheds light on the genotypic and phenotypic diversity of L. thermotolerans, contributing to a better understanding of the population structure, ecology and evolution of this non-Saccharomyces yeast. PMID:28910346
Cluster analysis of Pinus taiwanensis for its ex situ conservation in China.

PubMed

Gao, X; Shi, L; Wu, Z

2015-06-01

Pinus taiwanensis Hayata is one of the most famous sights in the Huangshan Scenic Resort, China, because of its strong adaptability and ability to survive; however, this endemic species is currently under threat in China. Relationships between different P. taiwanensis populations have been well-documented; however, few studies have been conducted on how to protect this rare pine. In the present study, we propose the ex situ conservation of this species using geographical information system (GIS) cluster and genetic diversity analyses. The GIS cluster method was conducted as a preliminary analysis for establishing a sampling site category based on climatic factors. Genetic diversity was analyzed using morphological and genetic traits. By combining geographical information with genetic data, we demonstrate that growing conditions, morphological traits, and the genetic make-up of the population in the Huangshan Scenic Resort were most similar to conditions on Tianmu Mountain. Therefore, we suggest that Tianmu Mountain is the best choice for the ex situ conservation of P. taiwanensis. Our results provide a molecular basis for the sustainable management, utilization, and conservation of this species in Huangshan Scenic Resort.
Patterns of comorbidity in community-dwelling older people hospitalised for fall-related injury: A cluster analysis

PubMed Central

2011-01-01

Background Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. Methods We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity) we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. Results More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7). The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. Conclusions The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients should be investigated by future studies. Our findings have particular relevance for falls prevention strategies, clinical practice and planning of follow-up services for these patients. PMID:21851627
Cluster Analysis on Longitudinal Data of Patients with Adult-Onset Asthma.

PubMed

Ilmarinen, Pinja; Tuomisto, Leena E; Niemelä, Onni; Tommola, Minna; Haanpää, Jussi; Kankaanranta, Hannu

Previous cluster analyses on asthma are based on cross-sectional data. To identify phenotypes of adult-onset asthma by using data from baseline (diagnostic) and 12-year follow-up visits. The Seinäjoki Adult Asthma Study is a 12-year follow-up study of patients with new-onset adult asthma. K-means cluster analysis was performed by using variables from baseline and follow-up visits on 171 patients to identify phenotypes. Five clusters were identified. Patients in cluster 1 (n = 38) were predominantly nonatopic males with moderate smoking history at baseline. At follow-up, 40% of these patients had developed persistent obstruction but the number of patients with uncontrolled asthma (5%) and rhinitis (10%) was the lowest. Cluster 2 (n = 19) was characterized by older men with heavy smoking history, poor lung function, and persistent obstruction at baseline. At follow-up, these patients were mostly uncontrolled (84%) despite daily use of inhaled corticosteroid (ICS) with add-on therapy. Cluster 3 (n = 50) consisted mostly of nonsmoking females with good lung function at diagnosis/follow-up and well-controlled/partially controlled asthma at follow-up. Cluster 4 (n = 25) had obese and symptomatic patients at baseline/follow-up. At follow-up, these patients had several comorbidities (40% psychiatric disease) and were treated daily with ICS and add-on therapy. Patients in cluster 5 (n = 39) were mostly atopic and had the earliest onset of asthma, the highest blood eosinophils, and FEV 1 reversibility at diagnosis. At follow-up, these patients used the lowest ICS dose but 56% were well controlled. Results can be used to predict outcomes of patients with adult-onset asthma and to aid in development of personalized therapy (NCT02733016 at ClinicalTrials.gov). Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Geotemporal Analysis of Neisseria meningitidis Clones in the United States: 2000–2005

PubMed Central

Wiringa, Ann E.; Shutt, Kathleen A.; Marsh, Jane W.; Cohn, Amanda C.; Messonnier, Nancy E.; Zansky, Shelley M.; Petit, Susan; Farley, Monica M.; Gershman, Ken; Lynfield, Ruth; Reingold, Arthur; Schaffner, William; Thompson, Jamie; Brown, Shawn T.; Lee, Bruce Y.; Harrison, Lee H.

2013-01-01

Background The detection of meningococcal outbreaks relies on serogrouping and epidemiologic definitions. Advances in molecular epidemiology have improved the ability to distinguish unique Neisseria meningitidis strains, enabling the classification of isolates into clones. Around 98% of meningococcal cases in the United States are believed to be sporadic. Methods Meningococcal isolates from 9 Active Bacterial Core surveillance sites throughout the United States from 2000 through 2005 were classified according to serogroup, multilocus sequence typing, and outer membrane protein (porA, porB, and fetA) genotyping. Clones were defined as isolates that were indistinguishable according to this characterization. Case data were aggregated to the census tract level and all non-singleton clones were assessed for non-random spatial and temporal clustering using retrospective space-time analyses with a discrete Poisson probability model. Results Among 1,062 geocoded cases with available isolates, 438 unique clones were identified, 78 of which had ≥2 isolates. 702 cases were attributable to non-singleton clones, accounting for 66.0% of all geocoded cases. 32 statistically significant clusters comprised of 107 cases (10.1% of all geocoded cases) were identified. Clusters had the following attributes: included 2 to 11 cases; 1 day to 33 months duration; radius of 0 to 61.7 km; and attack rate of 0.7 to 57.8 cases per 100,000 population. Serogroups represented among the clusters were: B (n = 12 clusters, 45 cases), C (n = 11 clusters, 27 cases), and Y (n = 9 clusters, 35 cases); 20 clusters (62.5%) were caused by serogroups represented in meningococcal vaccines that are commercially available in the United States. Conclusions Around 10% of meningococcal disease cases in the U.S. could be assigned to a geotemporal cluster. Molecular characterization of isolates, combined with geotemporal analysis, is a useful tool for understanding the spread of virulent meningococcal clones and patterns of transmission in populations. PMID:24349182
Patterns of comorbidity in community-dwelling older people hospitalised for fall-related injury: a cluster analysis.

PubMed

Vu, Trang; Finch, Caroline F; Day, Lesley

2011-08-18

Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity) we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7). The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients should be investigated by future studies. Our findings have particular relevance for falls prevention strategies, clinical practice and planning of follow-up services for these patients.

Type 2 diabetes mellitus: distribution of genetic markers in Kazakh population.

PubMed

Sikhayeva, Nurgul; Talzhanov, Yerkebulan; Iskakova, Aisha; Dzharmukhanov, Jarkyn; Nugmanova, Raushan; Zholdybaeva, Elena; Ramanculov, Erlan

2018-01-01

Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy-Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool.
Factors that cause genotype by environment interaction and use of a multiple-trait herd-cluster model for milk yield of Holstein cattle from Brazil and Colombia.

PubMed

Cerón-Muñoz, M F; Tonhati, H; Costa, C N; Rojas-Sarmiento, D; Echeverri Echeverri, D M

2004-08-01

Descriptive herd variables (DVHE) were used to explain genotype by environment interactions (G x E) for milk yield (MY) in Brazilian and Colombian production environments and to develop a herd-cluster model to estimate covariance components and genetic parameters for each herd environment group. Data consisted of 180,522 lactation records of 94,558 Holstein cows from 937 Brazilian and 400 Colombian herds. Herds in both countries were jointly grouped in thirds according to 8 DVHE: production level, phenotypic variability, age at first calving, calving interval, percentage of imported semen, lactation length, and herd size. For each DVHE, REML bivariate animal model analyses were used to estimate genetic correlations for MY between upper and lower thirds of the data. Based on estimates of genetic correlations, weights were assigned to each DVHE to group herds in a cluster analysis using the FASTCLUS procedure in SAS. Three clusters were defined, and genetic and residual variance components were heterogeneous among herd clusters. Estimates of heritability in clusters 1 and 3 were 0.28 and 0.29, respectively, but the estimate was larger (0.39) in Cluster 2. The genetic correlations of MY from different clusters ranged from 0.89 to 0.97. The herd-cluster model based on DVHE properly takes into account G x E by grouping similar environments accordingly and seems to be an alternative to simply considering country borders to distinguish between environments.
[Space-time suicide clustering in the community of Antequera (Spain)].

PubMed

Pérez-Costillas, Lucía; Blasco-Fontecilla, Hilario; Benítez, Nicolás; Comino, Raquel; Antón, José Miguel; Ramos-Medina, Valentín; Lopez, Amalia; Palomo, José Luis; Madrigal, Lucía; Alcalde, Javier; Perea-Millá, Emilio; Artieda-Urrutia, Paula; de León-Martínez, Victoria; de Diego Otero, Yolanda

2015-01-01

Approximately 3,500 people commit suicide every year in Spain. The main aim of this study is to explore if a spatial and temporal clustering of suicide exists in the region of Antequera (Málaga, España). Sample and procedure: All suicides from January 1, 2004 to December 31, 2008 were identified using data from the Forensic Pathology Department of the Institute of Legal Medicine, Málaga (España). Geolocalisation. Google Earth was used to calculate the coordinates for each suicide decedent's address. Statistical analysis. A spatiotemporal permutation scan statistic and the Ripley's K function were used to explore spatiotemporal clustering. Pearson's chi-squared was used to determine whether there were differences between suicides inside and outside the spatiotemporal clusters. A total of 120 individuals committed suicide within the region of Antequera, of which 96 (80%) were included in our analyses. Statistically significant evidence for 7 spatiotemporal suicide clusters emerged within critical limits for the 0-2.5 km distance and for the first and second semanas (P<.05 in both cases) after suicide. There was not a single subject diagnosed with a current psychotic disorder, among suicides within clusters, whereas outside clusters, 20% had this diagnosis (X2=4.13; df=1; P<.05). There are spatiotemporal suicide clusters in the area surrounding Antequera. Patients diagnosed with current psychotic disorder are less likely to be influenced by the factors explaining suicide clustering. Copyright © 2013 SEP y SEPB. Published by Elsevier España. All rights reserved.
A singular value decomposition approach for improved taxonomic classification of biological sequences

PubMed Central

2011-01-01

Background Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. Results We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. Conclusions By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. PMID:22369633
D Nearest Neighbour Search Using a Clustered Hierarchical Tree Structure

NASA Astrophysics Data System (ADS)

Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

2016-06-01

Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D) method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN) analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.
Genetic diversity among air yam (Dioscorea bulbifera) varieties based on single sequence repeat markers.

PubMed

Silva, D M; Siqueira, M V B M; Carrasco, N F; Mantello, C C; Nascimento, W F; Veasey, E A

2016-05-23

Dioscorea is the largest genus in the Dioscoreaceae family, and includes a number of economically important species including the air yam, D. bulbifera L. This study aimed to develop new single sequence repeat primers and characterize the genetic diversity of local varieties that originated in several municipalities of Brazil. We developed an enriched genomic library for D. bulbifera resulting in seven primers, six of which were polymorphic, and added four polymorphic loci developed for other Dioscorea species. This resulted in 10 polymorphic primers to evaluate 42 air yam accessions. Thirty-three alleles (bands) were found, with an average of 3.3 alleles per locus. The discrimination power ranged from 0.113 to 0.834, with an average of 0.595. Both principal coordinate and cluster analyses (using the Jaccard Index) failed to clearly separate the accessions according to their origins. However, the 13 accessions from Conceição dos Ouros, Minas Gerais State were clustered above zero on the principal coordinate 2 axis, and were also clustered into one subgroup in the cluster analysis. Accessions from Ubatuba, São Paulo State were clustered below zero on the same principal coordinate 2 axis, except for one accession, although they were scattered in several subgroups in the cluster analysis. Therefore, we found little spatial structure in the accessions, although those from Conceição dos Ouros and Ubatuba exhibited some spatial structure, and that there is a considerable level of genetic diversity in D. bulbifera maintained by traditional farmers in Brazil.
A model-based cluster analysis of social experiences in clinically anxious youth: links to emotional functioning.

PubMed

Suveg, Cynthia; Jacob, Marni L; Whitehead, Monica; Jones, Anna; Kingery, Julie Newman

2014-01-01

Social difficulties are commonly associated with anxiety disorders in youth, yet are not well specified in the literature. The aim of this study was to identify patterns of social experiences in clinically anxious children and examine the associations with indices of emotional functioning. A model-based cluster analysis was conducted on parent-, teacher-, and child-reports of social experiences with 64 children, ages 7-12 years (M = 8.86 years, SD = 1.59 years; 60.3% boys; 85.7% Caucasian) with a primary diagnosis of separation anxiety disorder, social phobia, and/or generalized anxiety disorder. Follow-up analyses examined cluster differences on indices of emotional functioning. Findings yielded three clusters of social experiences that were unrelated to diagnosis: (1) Unaware Children (elevated scores on parent- and teacher-reports of social difficulties but relatively low scores on child-reports, n = 12), (2) Average Functioning (relatively average scores across all informants, n = 44), and (3) Victimized and Lonely (elevated child-reports of overt and relational victimization and loneliness and relatively low scores on parent- and teacher-reports of social difficulties, n = 8). Youth in the Unaware Children cluster were rated as more emotionally dysregulated by teachers and had a greater number of diagnoses than youth in the Average Functioning group. In contrast, the Victimized and Lonely group self-reported greater frequency of negative affect and reluctance to share emotional experiences than the Average Functioning cluster. Overall, this study demonstrates that social maladjustment in clinically anxious children can manifest in a variety of ways and assessment should include multiple informants and methods.
Pseudomonas aeruginosa in Dairy Goats: Genotypic and Phenotypic Comparison of Intramammary and Environmental Isolates

PubMed Central

Scaccabarozzi, Licia; Leoni, Livia; Ballarini, Annalisa; Barberio, Antonio; Locatelli, Clara; Casula, Antonio; Bronzo, Valerio; Pisoni, Giuliano; Jousson, Olivier; Morandi, Stefano; Rapetti, Luca; García-Fernández, Aurora; Moroni, Paolo

2015-01-01

Following the identification of a case of severe clinical mastitis in a Saanen dairy goat (goat A), an average of 26 lactating goats in the herd was monitored over a period of 11 months. Milk microbiological analysis revealed the presence of Pseudomonas aeruginosa in 7 of the goats. Among these 7 does, only goat A showed clinical signs of mastitis. The 7 P. aeruginosa isolates from the goat milk and 26 P. aeruginosa isolates from environmental samples were clustered by RAPD-PCR and PFGE analyses in 3 genotypes (G1, G2, G3) and 4 clusters (A, B, C, D), respectively. PFGE clusters A and B correlated with the G1 genotype and included the 7 milk isolates. Although it was not possible to identify the infection source, these results strongly suggest a spreading of the infection from goat A. Clusters C and D overlapped with genotypes G2 and G3, respectively, and included only environmental isolates. The outcome of the antimicrobial susceptibility test performed on the isolates revealed 2 main patterns of multiple resistance to beta-lactam antibiotics and macrolides. Virulence related phenotypes were analyzed, such as swarming and swimming motility, production of biofilm and production of secreted virulence factors. The isolates had distinct phenotypic profiles, corresponding to genotypes G1, G2 and G3. Overall, correlation analysis showed a strong correlation between sampling source, RAPD genotype, PFGE clusters, and phenotypic clusters. The comparison of the levels of virulence related phenotypes did not indicate a higher pathogenic potential in the milk isolates as compared to the environmental isolates. PMID:26606430
Psychiatrist-patient verbal and nonverbal communications during split-treatment appointments.

PubMed

Cruz, Mario; Roter, Debra; Cruz, Robyn Flaum; Wieland, Melissa; Cooper, Lisa A; Larson, Susan; Pincus, Harold Alan

2011-11-01

This study characterized psychiatrist and patient communication behaviors and affective voice tones during pharmacotherapy appointments with depressed patients at four community-based mental health clinics where psychiatrists provided medication management and other mental health professionals provided therapy ("split treatment"). Audiorecordings of 84 unique pairs of psychiatrists and patients with a depressive disorder were analyzed with the Roter Interaction Analysis System, which identifies 41 discrete speech categories that can be grouped into composites representing broad conceptual communication domains. Cluster analysis identified psychiatrist communication patterns. T test and chi square analyses compared the clusters for verbal dominance, affective voice tone, and characteristics of psychiatrist and patients. On average, 53% of psychiatrist talk was devoted to partnering and relationship building, and 67% of patient talk was about biomedical subjects, such as depression symptoms, and psychosocial information giving. Psychiatrist communication patterns were characterized by two clusters, a biomedical-centered cluster that emphasized biomedical questions (η²=.22, df=82, p<.001) and education or counseling (η²=.20, df=82, p<.001) and a patient-centered cluster focused on psychosocial and lifestyle questions (η²=.24, df=82, p<.001) and information giving (η²=.17, df=82, p<.001). The patient-centered cluster was associated with patients' expression of distress, anger, or other negative affects (t=3.22, df= 82, p=.002). Psychiatrists devoted much of their talk to partnering and relationship building while maintaining a focus on symptoms or psychosocial issues. However, patient behaviors did not reflect a similar level of partnering. Future studies should identify psychiatrist communication behaviors that activate collaborative patient communications or improve treatment outcomes.
The role of CSP in the electricity system of South Africa - technical operation, grid constraints, market structure and economics

NASA Astrophysics Data System (ADS)

Kost, Christoph; Friebertshäuser, Chris; Hartmann, Niklas; Fluri, Thomas; Nitz, Peter

2017-06-01

This paper analyses the role of solar technologies (CSP and PV) and their interaction in the South African electricity system by using a fundamental electricity system modelling (ENTIGRIS-SouthAfrica). The model is used to analyse the South African long-term electricity generation portfolio mix, optimized site selection and required transmission capacities until the year 2050. Hereby especially the location and grid integration of solar technology (PV and CSP) and wind power plants is analysed. This analysis is carried out by using detailed resource assessment of both technologies. A cluster approach is presented to reduce complexity by integrating the data in an optimization model.
A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

PubMed Central

Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A.; Tenesa, Albert

2015-01-01

Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010
SSR analysis of genetic diversity and structure of the germplasm of faba bean (Vicia faba L.).

PubMed

El-Esawi, Mohamed A

Assessing the diversity and genetic structure of faba bean (Vicia faba L.) germplasm is essential to improve the quality and yield of this economically important crop. In this study, simple sequence repeats (SSRs) were utilized to evaluate the diversity and structure of 35 faba bean genotypes originating from three different geographical regions (Northern Africa, Eastern Africa, and Near East). All 15 SSR loci generated a total of 100 alleles. The allele number per locus varied from 4 to 11, with a mean of 6.67. The expected heterozygosity (H e ) of SSR loci ranged between 0.51 and 0.81, with a mean of 0.63. The PIC value also varied from 0.44 to 0.78, with an average of 0.58. The expected heterozygosity of 22 faba bean genotypes was higher than the observed one. Interestingly, AMOVA analysis showed that much of variability resided within accessions (79.2%). A highly significant difference among regions was also evidenced, and represented 5.3% of the total variation. Moreover, cluster analysis divided the 35 faba bean genotypes into two main clusters. The first main cluster comprised all faba bean genotypes originating from the Near East region, whereas the second main cluster comprised all the genotypes originating from the Northern and Eastern Africa regions, indicating that the Northern and Eastern African faba bean genotypes were more closely related to each other than to the Near East genotypes. Structure analysis also revealed that the 35 faba bean genotypes might be assigned to two populations, in complete accordance with cluster analysis data. In conclusion, this study showed high levels of diversity in the analysed genotypes of faba bean, and could be utilized in future breeding programmes to develop new cultivars of high yield. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Defining syndromes using cattle meat inspection data for syndromic surveillance purposes: a statistical approach with the 2005-2010 data from ten French slaughterhouses.

PubMed

Dupuy, Céline; Morignat, Eric; Maugey, Xavier; Vinard, Jean-Luc; Hendrikx, Pascal; Ducrot, Christian; Calavas, Didier; Gay, Emilie

2013-04-30

The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging.The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer's lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases.
Cluster stability in the analysis of mass cytometry data.

PubMed

Melchiotti, Rossella; Gracio, Filipe; Kordasti, Shahram; Todd, Alan K; de Rinaldis, Emanuele

2017-01-01

Manual gating has been traditionally applied to cytometry data sets to identify cells based on protein expression. The advent of mass cytometry allows for a higher number of proteins to be simultaneously measured on cells, therefore providing a means to define cell clusters in a high dimensional expression space. This enhancement, whilst opening unprecedented opportunities for single cell-level analyses, makes the incremental replacement of manual gating with automated clustering a compelling need. To this aim many methods have been implemented and their successful applications demonstrated in different settings. However, the reproducibility of automatically generated clusters is proving challenging and an analytical framework to distinguish spurious clusters from more stable entities, and presumably more biologically relevant ones, is still missing. One way to estimate cell clusters' stability is the evaluation of their consistent re-occurrence within- and between-algorithms, a metric that is commonly used to evaluate results from gene expression. Herein we report the usage and importance of cluster stability evaluations, when applied to results generated from three popular clustering algorithms - SPADE, FLOCK and PhenoGraph - run on four different data sets. These algorithms were shown to generate clusters with various degrees of statistical stability, many of them being unstable. By comparing the results of automated clustering with manually gated populations, we illustrate how information on cluster stability can assist towards a more rigorous and informed interpretation of clustering results. We also explore the relationships between statistical stability and other properties such as clusters' compactness and isolation, demonstrating that whilst cluster stability is linked to other properties it cannot be reliably predicted by any of them. Our study proposes the introduction of cluster stability as a necessary checkpoint for cluster interpretation and contributes to the construction of a more systematic and standardized analytical framework for the assessment of cytometry clustering results. © 2016 International Society for Advancement of Cytometry. © 2016 International Society for Advancement of Cytometry.
Structural and functional annotation of the porcine immunome

PubMed Central

2013-01-01

Background The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems. Results The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome. Conclusions This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response. PMID:23676093
Space-time cluster analysis of sea lice infestation (Caligus clemensi and Lepeophtheirus salmonis) on wild juvenile Pacific salmon in the Broughton Archipelago of Canada.

PubMed

Patanasatienkul, Thitiwan; Sanchez, Javier; Rees, Erin E; Pfeiffer, Dirk; Revie, Crawford W

2015-06-15

Sea lice infestation levels on wild chum and pink salmon in the Broughton Archipelago region are known to vary spatially and temporally; however, the locations of areas associated with a high infestation level had not been investigated yet. In the present study, the multivariate spatial scan statistic based on a Poisson model was used to assess spatial clustering of elevated sea lice (Caligus clemensi and Lepeophtheirus salmonis) infestation levels on wild chum and pink salmon sampled between March and July of 2004 to 2012 in the Broughton Archipelago and Knight Inlet regions of British Columbia, Canada. Three covariates, seine type (beach and purse seining), fish size, and year effect, were used to provide adjustment within the analyses. The analyses were carried out across the five months/datasets and between two fish species to assess the consistency of the identified clusters. Sea lice stages were explored separately for the early life stages (non-motile) and the late life stages of sea lice (motile). Spatial patterns in fish migration were also explored using monthly plots showing the average number of each fish species captured per sampling site. The results revealed three clusters for non-motile C. clemensi, two clusters for non-motile L. salmonis, and one cluster for the motile stage in each of the sea lice species. In general, the location and timing of clusters detected for both fish species were similar. Early in the season, the clusters of elevated sea lice infestation levels on wild fish are detected in areas closer to the rivers, with decreasing relative risks as the season progresses. Clusters were detected further from the estuaries later in the season, accompanied by increasing relative risks. In addition, the plots for fish migration exhibit similar patterns for both fish species in that, as expected, the juveniles move from the rivers toward the open ocean as the season progresses The identification of space-time clustering of infestation on wild fish from this study can help in targeting investigations of factors associated with these infestations and thereby support the development of more effective sea lice control measures. Copyright © 2015 Elsevier B.V. All rights reserved.
Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis.

PubMed

Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo

2016-01-15

Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties. Copyright © 2015 Elsevier B.V. All rights reserved.
Genetic variability of Brazilian isolates of Alternaria alternata detected by AFLP and RAPD techniques

PubMed Central

Dini-Andreote, Francisco; Pietrobon, Vivian Cristina; Andreote, Fernando Dini; Romão, Aline Silva; Spósito, Marcel Bellato; Araújo, Welington Luiz

2009-01-01

The Alternaria brown spot (ABS) is a disease caused in tangerine plants and its hybrids by the fungus Alternaria alternata f. sp. citri which has been found in Brazil since 2001. Due to the recent occurrence in Brazilian orchards, the epidemiology and genetic variability of this pathogen is still an issue to be addressed. Here it is presented a survey about the genetic variability of this fungus by the characterization of twenty four pathogenic isolates of A. alternata f. sp. citri from citrus plants and four endophytic isolates from mango (one Alternaria tenuissima and three Alternaria arborescens). The application of two molecular markers Random Amplified Polymorphic DNA (RAPD) and Amplified Fragment Length Polymorphism (AFLP) had revealed the isolates clustering in distinct groups when fingerprintings were analyzed by Principal Components Analysis (PCA). Despite the better assessment of the genetic variability through the AFLP, significant modifications in clusters components were not observed, and only slight shifts in the positioning of isolates LRS 39/3 and 25M were observed in PCA plots. Furthermore, in both analyses, only the isolates from lemon plants revealed to be clustered, differently from the absence of clustering for other hosts or plant tissues. Summarizing, both RAPD and AFLP analyses were both efficient to detect the genetic variability within the population of the pathogenic fungus Alternaria spp., supplying information on the genetic variability of this species as a basis for further studies aiming the disease control. PMID:24031413
What do you mean "drunk"? Convergent validation of multiple methods of mapping alcohol expectancy memory networks.

PubMed

Reich, Richard R; Ariel, Idan; Darkes, Jack; Goldman, Mark S

2012-09-01

The configuration and activation of memory networks have been theorized as mechanisms that underlie the often observed link between alcohol expectancies and drinking. A key component of this network is the expectancy "drunk." The memory network configuration of "drunk" was mapped by using cluster analysis of data gathered from the paired-similarities task (PST) and the Alcohol Expectancy Multi-Axial Assessment (AEMAX). A third task, the free associates task (FA), assessed participants' strongest alcohol expectancy associates and was used as a validity check for the cluster analyses. Six hundred forty-seven 18-19-year-olds completed these measures and a measure of alcohol consumption at baseline assessment for a 5-year longitudinal study. For both the PST and AEMAX, "drunk" clustered with mainly negative and sedating effects (e.g., "sick," "dizzy," "sleepy") in lighter drinkers and with more positive and arousing effects (e.g., "happy," "horny," "outgoing") in heavier drinkers, showing that the cognitive organization of expectancies reflected drinker type (and might influence the choice to drink). Consistent with the cluster analyses, in participants who gave "drunk" as an FA response, heavier drinkers rated the word as more positive and arousing than lighter drinkers. Additionally, gender did not account for the observed drinker-type differences. These results support the notion that for some emerging adults, drinking may be linked to what they mean by the word "drunk." PsycINFO Database Record (c) 2012 APA, all rights reserved.
Wildfire cluster detection using space-time scan statistics

NASA Astrophysics Data System (ADS)

Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.

2009-04-01

The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely present along the whole year. The analysis of fires distribution to evaluate if they are statistically more frequent in some area or/and in some period of the year, can be useful to support fire management and to focus on prevention measures.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.