Sample records for statistically significant clusters

  1. Visualizing statistical significance of disease clusters using cartograms.

    PubMed

    Kronenfeld, Barry J; Wong, David W S

    2017-05-15

    Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.

  2. Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies

    PubMed Central

    Goodpaster, Aaron M.; Kennedy, Michael A.

    2015-01-01

    Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647

  3. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    DTIC Science & Technology

    2016-04-26

    REPORT TYPE Final 3. DATES COVERED (From - To) 15 Oct 2014 to 14 Jan 2015 4. TITLE AND SUBTITLE Detecting statistically significant clusters of...extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex...priori, the need for triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was

  4. Performance of cancer cluster Q-statistics for case-control residential histories

    PubMed Central

    Sloan, Chantel D.; Jacquez, Geoffrey M.; Gallagher, Carolyn M.; Ward, Mary H.; Raaschou-Nielsen, Ole; Nordsborg, Rikke Baastrup; Meliker, Jaymie R.

    2012-01-01

    Few investigations of health event clustering have evaluated residential mobility, though causative exposures for chronic diseases such as cancer often occur long before diagnosis. Recently developed Q-statistics incorporate human mobility into disease cluster investigations by quantifying space- and time-dependent nearest neighbor relationships. Using residential histories from two cancer case-control studies, we created simulated clusters to examine Q-statistic performance. Results suggest the intersection of cases with significant clustering over their life course, Qi, with cases who are constituents of significant local clusters at given times, Qit, yielded the best performance, which improved with increasing cluster size. Upon comparison, a larger proportion of true positives were detected with Kulldorf’s spatial scan method if the time of clustering was provided. We recommend using Q-statistics to identify when and where clustering may have occurred, followed by the scan method to localize the candidate clusters. Future work should investigate the generalizability of these findings. PMID:23149326

  5. Testing prediction methods: Earthquake clustering versus the Poisson model

    USGS Publications Warehouse

    Michael, A.J.

    1997-01-01

    Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.

  6. Identifying irregularly shaped crime hot-spots using a multiobjective evolutionary algorithm

    NASA Astrophysics Data System (ADS)

    Wu, Xiaolan; Grubesic, Tony H.

    2010-12-01

    Spatial cluster detection techniques are widely used in criminology, geography, epidemiology, and other fields. In particular, spatial scan statistics are popular and efficient techniques for detecting areas of elevated crime or disease events. The majority of spatial scan approaches attempt to delineate geographic zones by evaluating the significance of clusters using likelihood ratio statistics tested with the Poisson distribution. While this can be effective, many scan statistics give preference to circular clusters, diminishing their ability to identify elongated and/or irregular shaped clusters. Although adjusting the shape of the scan window can mitigate some of these problems, both the significance of irregular clusters and their spatial structure must be accounted for in a meaningful way. This paper utilizes a multiobjective evolutionary algorithm to find clusters with maximum significance while quantitatively tracking their geographic structure. Crime data for the city of Cincinnati are utilized to demonstrate the advantages of the new approach and highlight its benefits versus more traditional scan statistics.

  7. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

  8. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  9. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  10. Detection of Clostridium difficile infection clusters, using the temporal scan statistic, in a community hospital in southern Ontario, Canada, 2006-2011.

    PubMed

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-05-12

    In hospitals, Clostridium difficile infection (CDI) surveillance relies on unvalidated guidelines or threshold criteria to identify outbreaks. This can result in false-positive and -negative cluster alarms. The application of statistical methods to identify and understand CDI clusters may be a useful alternative or complement to standard surveillance techniques. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting CDI clusters and determine if there are significant differences in the rate of CDI cases by month, season, and year in a community hospital. Bacteriology reports of patients identified with a CDI from August 2006 to February 2011 were collected. For patients detected with CDI from March 2010 to February 2011, stool specimens were obtained. Clostridium difficile isolates were characterized by ribotyping and investigated for the presence of toxin genes by PCR. CDI clusters were investigated using a retrospective temporal scan test statistic. Statistically significant clusters were compared to known CDI outbreaks within the hospital. A negative binomial regression model was used to identify associations between year, season, month and the rate of CDI cases. Overall, 86 CDI cases were identified. Eighteen specimens were analyzed and nine ribotypes were classified with ribotype 027 (n = 6) the most prevalent. The temporal scan statistic identified significant CDI clusters at the hospital (n = 5), service (n = 6), and ward (n = 4) levels (P ≤ 0.05). Three clusters were concordant with the one C. difficile outbreak identified by hospital personnel. Two clusters were identified as potential outbreaks. The negative binomial model indicated years 2007-2010 (P ≤ 0.05) had decreased CDI rates compared to 2006 and spring had an increased CDI rate compared to the fall (P = 0.023). Application of the temporal scan statistic identified several clusters, including potential outbreaks not detected by hospital personnel. The identification of time periods with decreased or increased CDI rates may have been a result of specific hospital events. Understanding the clustering of CDIs can aid in the interpretation of surveillance data and lead to the development of better early detection systems.

  11. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  12. Retrospective space-time cluster analysis of whooping cough, re-emergence in Barcelona, Spain, 2000-2011.

    PubMed

    Solano, Rubén; Gómez-Barroso, Diana; Simón, Fernando; Lafuente, Sarah; Simón, Pere; Rius, Cristina; Gorrindo, Pilar; Toledo, Diana; Caylà, Joan A

    2014-05-01

    A retrospective, space-time study of whooping cough cases reported to the Public Health Agency of Barcelona, Spain between the years 2000 and 2011 is presented. It is based on 633 individual whooping cough cases and the 2006 population census from the Spanish National Statistics Institute, stratified by age and sex at the census tract level. Cluster identification was attempted using space-time scan statistic assuming a Poisson distribution and restricting temporal extent to 7 days and spatial distance to 500 m. Statistical calculations were performed with Stata 11 and SatScan and mapping was performed with ArcGis 10.0. Only clusters showing statistical significance (P <0.05) were mapped. The most likely cluster identified included five census tracts located in three neighbourhoods in central Barcelona during the week from 17 to 23 August 2011. This cluster included five cases compared with the expected level of 0.0021 (relative risk = 2436, P <0.001). In addition, 11 secondary significant space-time clusters were detected with secondary clusters occurring at different times and localizations. Spatial statistics is felt to be useful by complementing epidemiological surveillance systems through visualizing excess in the number of cases in space and time and thus increase the possibility of identifying outbreaks not reported by the surveillance system.

  13. Geographic clusters in underimmunization and vaccine refusal.

    PubMed

    Lieu, Tracy A; Ray, G Thomas; Klein, Nicola P; Chung, Cindy; Kulldorff, Martin

    2015-02-01

    Parental refusal and delay of childhood vaccines has increased in recent years and is believed to cluster in some communities. Such clusters could pose public health risks and barriers to achieving immunization quality benchmarks. Our aims were to (1) describe geographic clusters of underimmunization and vaccine refusal, (2) compare clusters of underimmunization with different vaccines, and (3) evaluate whether vaccine refusal clusters may pose barriers to achieving high immunization rates. We analyzed electronic health records among children born between 2000 and 2011 with membership in Kaiser Permanente Northern California. The study population included 154,424 children in 13 counties with continuous membership from birth to 36 months of age. We used spatial scan statistics to identify clusters of underimmunization (having missed 1 or more vaccines by 36 months of age) and vaccine refusal (based on International Classification of Diseases, Ninth Revision, Clinical Modification codes). We identified 5 statistically significant clusters of underimmunization among children who turned 36 months old during 2010-2012. The underimmunization rate within clusters ranged from 18% to 23%, and the rate outside them was 11%. Children in the most statistically significant cluster had 1.58 (P < .001) times the rate of underimmunization as others. Underimmunization with measles, mumps, rubella vaccine and varicella vaccines clustered in similar geographic areas. Vaccine refusal also clustered, with rates of 5.5% to 13.5% within clusters, compared with 2.6% outside them. Underimmunization and vaccine refusal cluster geographically. Spatial scan statistics may be a useful tool to identify locations with challenges to achieving high immunization rates, which deserve focused intervention. Copyright © 2015 by the American Academy of Pediatrics.

  14. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  15. The statistical average of optical properties for alumina particle cluster in aircraft plume

    NASA Astrophysics Data System (ADS)

    Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin

    2018-04-01

    We establish a model for lognormal distribution of monomer radius and number of alumina particle clusters in plume. According to the Multi-Sphere T Matrix (MSTM) theory, we provide a method for finding the statistical average of optical properties for alumina particle clusters in plume, analyze the effect of different distributions and different detection wavelengths on the statistical average of optical properties for alumina particle cluster, and compare the statistical average optical properties under the alumina particle cluster model established in this study and those under three simplified alumina particle models. The calculation results show that the monomer number of alumina particle cluster and its size distribution have a considerable effect on its statistical average optical properties. The statistical average of optical properties for alumina particle cluster at common detection wavelengths exhibit obvious differences, whose differences have a great effect on modeling IR and UV radiation properties of plume. Compared with the three simplified models, the alumina particle cluster model herein features both higher extinction and scattering efficiencies. Therefore, we may find that an accurate description of the scattering properties of alumina particles in aircraft plume is of great significance in the study of plume radiation properties.

  16. Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-04-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  17. Spatiotemporal Analysis of the Ebola Hemorrhagic Fever in West Africa in 2014

    NASA Astrophysics Data System (ADS)

    Xu, M.; Cao, C. X.; Guo, H. F.

    2017-09-01

    Ebola hemorrhagic fever (EHF) is an acute hemorrhagic diseases caused by the Ebola virus, which is highly contagious. This paper aimed to explore the possible gathering area of EHF cases in West Africa in 2014, and identify endemic areas and their tendency by means of time-space analysis. We mapped distribution of EHF incidences and explored statistically significant space, time and space-time disease clusters. We utilized hotspot analysis to find the spatial clustering pattern on the basis of the actual outbreak cases. spatial-temporal cluster analysis is used to analyze the spatial or temporal distribution of agglomeration disease, examine whether its distribution is statistically significant. Local clusters were investigated using Kulldorff's scan statistic approach. The result reveals that the epidemic mainly gathered in the western part of Africa near north Atlantic with obvious regional distribution. For the current epidemic, we have found areas in high incidence of EVD by means of spatial cluster analysis.

  18. A spatial scan statistic for multiple clusters.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2011-10-01

    Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. Coordinate based random effect size meta-analysis of neuroimaging studies.

    PubMed

    Tench, C R; Tanasescu, Radu; Constantinescu, C S; Auer, D P; Cottam, W J

    2017-06-01

    Low power in neuroimaging studies can make them difficult to interpret, and Coordinate based meta-analysis (CBMA) may go some way to mitigating this issue. CBMA has been used in many analyses to detect where published functional MRI or voxel-based morphometry studies testing similar hypotheses report significant summary results (coordinates) consistently. Only the reported coordinates and possibly t statistics are analysed, and statistical significance of clusters is determined by coordinate density. Here a method of performing coordinate based random effect size meta-analysis and meta-regression is introduced. The algorithm (ClusterZ) analyses both coordinates and reported t statistic or Z score, standardised by the number of subjects. Statistical significance is determined not by coordinate density, but by a random effects meta-analyses of reported effects performed cluster-wise using standard statistical methods and taking account of censoring inherent in the published summary results. Type 1 error control is achieved using the false cluster discovery rate (FCDR), which is based on the false discovery rate. This controls both the family wise error rate under the null hypothesis that coordinates are randomly drawn from a standard stereotaxic space, and the proportion of significant clusters that are expected under the null. Such control is necessary to avoid propagating and even amplifying the very issues motivating the meta-analysis in the first place. ClusterZ is demonstrated on both numerically simulated data and on real data from reports of grey matter loss in multiple sclerosis (MS) and syndromes suggestive of MS, and of painful stimulus in healthy controls. The software implementation is available to download and use freely. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-07-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  1. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    PubMed

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  2. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    PubMed Central

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  3. WordCluster: detecting clusters of DNA words and genomic elements

    PubMed Central

    2011-01-01

    Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981

  4. The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

    PubMed

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-07-08

    In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital personnel. The identification of specific years and months with increased MRSA rates may be attributable to several hospital level factors including the presence of other pathogens. Within hospitals, the incorporation of the temporal scan statistic to standard surveillance techniques is a valuable tool for healthcare workers to evaluate surveillance strategies and aid in the identification of MRSA clusters.

  5. The effect of clulstering of galaxies on the statistics of gravitational lenses

    NASA Technical Reports Server (NTRS)

    Anderson, N.; Alcock, C.

    1986-01-01

    It is examined whether clustering of galaxies can significantly alter the statistical properties of gravitational lenses? Only models of clustering that resemble the observed distribution of galaxies in the properties of the two-point correlation function are considered. Monte-Carlo simulations of the imaging process are described. It is found that the effect of clustering is too small to be significant, unless the mass of the deflectors is so large that gravitational lenses become common occurrences. A special model is described which was concocted to optimize the effect of clustering on gravitational lensing but still resemble the observed distribution of galaxies; even this simulation did not satisfactorily produce large numbers of wide-angle lenses.

  6. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  7. Dissociation kinetics of metal clusters on multiple electronic states including electronic level statistics into the vibronic soup

    NASA Astrophysics Data System (ADS)

    Shvartsburg, Alexandre A.; Siu, K. W. Michael

    2001-06-01

    Modeling the delayed dissociation of clusters had been over the last decade a frontline development area in chemical physics. It is of fundamental interest how statistical kinetics methods previously validated for regular molecules and atomic nuclei may apply to clusters, as this would help to understand the transferability of statistical models for disintegration of complex systems across various classes of physical objects. From a practical perspective, accurate simulation of unimolecular decomposition is critical for the extraction of true thermochemical values from measurements on the decay of energized clusters. Metal clusters are particularly challenging because of the multitude of low-lying electronic states that are coupled to vibrations. This has previously been accounted for assuming the average electronic structure of a conducting cluster approximated by the levels of electron in a cavity. While this provides a reasonable time-averaged description, it ignores the distribution of instantaneous electronic structures in a "boiling" cluster around that average. Here we set up a new treatment that incorporates the statistical distribution of electronic levels around the average picture using random matrix theory. This approach faithfully reflects the completely chaotic "vibronic soup" nature of hot metal clusters. We found that the consideration of electronic level statistics significantly promotes electronic excitation and thus increases the magnitude of its effect. As this excitation always depresses the decay rates, the inclusion of level statistics results in slower dissociation of metal clusters.

  8. Geospatial clustering in sugar-sweetened beverage consumption among Boston youth.

    PubMed

    Tamura, Kosuke; Duncan, Dustin T; Athens, Jessica K; Bragg, Marie A; Rienti, Michael; Aldstadt, Jared; Scott, Marc A; Elbel, Brian

    2017-09-01

    The objective was to detect geospatial clustering of sugar-sweetened beverage (SSB) intake in Boston adolescents (age = 16.3 ± 1.3 years [range: 13-19]; female = 56.1%; White = 10.4%, Black = 42.6%, Hispanics = 32.4%, and others = 14.6%) using spatial scan statistics. We used data on self-reported SSB intake from the 2008 Boston Youth Survey Geospatial Dataset (n = 1292). Two binary variables were created: consumption of SSB (never versus any) on (1) soda and (2) other sugary drinks (e.g., lemonade). A Bernoulli spatial scan statistic was used to identify geospatial clusters of soda and other sugary drinks in unadjusted models and models adjusted for age, gender, and race/ethnicity. There was no statistically significant clustering of soda consumption in the unadjusted model. In contrast, a cluster of non-soda SSB consumption emerged in the middle of Boston (relative risk = 1.20, p = .005), indicating that adolescents within the cluster had a 20% higher probability of reporting non-soda SSB intake than outside the cluster. The cluster was no longer significant in the adjusted model, suggesting spatial variation in non-soda SSB drink intake correlates with the geographic distribution of students by race/ethnicity, age, and gender.

  9. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  10. Statistical significance test for transition matrices of atmospheric Markov chains

    NASA Technical Reports Server (NTRS)

    Vautard, Robert; Mo, Kingtse C.; Ghil, Michael

    1990-01-01

    Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.

  11. Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.

    PubMed

    Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves

    2011-08-01

    The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.

  12. Connecting optical and X-ray tracers of galaxy cluster relaxation

    NASA Astrophysics Data System (ADS)

    Roberts, Ian D.; Parker, Laura C.; Hlavacek-Larrondo, Julie

    2018-04-01

    Substantial effort has been devoted in determining the ideal proxy for quantifying the morphology of the hot intracluster medium in clusters of galaxies. These proxies, based on X-ray emission, typically require expensive, high-quality X-ray observations making them difficult to apply to large surveys of groups and clusters. Here, we compare optical relaxation proxies with X-ray asymmetries and centroid shifts for a sample of Sloan Digital Sky Survey clusters with high-quality, archival X-ray data from Chandra and XMM-Newton. The three optical relaxation measures considered are the shape of the member-galaxy projected velocity distribution - measured by the Anderson-Darling (AD) statistic, the stellar mass gap between the most-massive and second-most-massive cluster galaxy, and the offset between the most-massive galaxy (MMG) position and the luminosity-weighted cluster centre. The AD statistic and stellar mass gap correlate significantly with X-ray relaxation proxies, with the AD statistic being the stronger correlator. Conversely, we find no evidence for a correlation between X-ray asymmetry or centroid shift and the MMG offset. High-mass clusters (Mhalo > 1014.5 M⊙) in this sample have X-ray asymmetries, centroid shifts, and Anderson-Darling statistics which are systematically larger than for low-mass systems. Finally, considering the dichotomy of Gaussian and non-Gaussian clusters (measured by the AD test), we show that the probability of being a non-Gaussian cluster correlates significantly with X-ray asymmetry but only shows a marginal correlation with centroid shift. These results confirm the shape of the radial velocity distribution as a useful proxy for cluster relaxation, which can then be applied to large redshift surveys lacking extensive X-ray coverage.

  13. Geographic Clusters of Basal Cell Carcinoma in a Northern California Health Plan Population.

    PubMed

    Ray, G Thomas; Kulldorff, Martin; Asgari, Maryam M

    2016-11-01

    Rates of skin cancer, including basal cell carcinoma (BCC), the most common cancer, have been increasing over the past 3 decades. A better understanding of geographic clustering of BCCs can help target screening and prevention efforts. Present a methodology to identify spatial clusters of BCC and identify such clusters in a northern California population. This retrospective study used a BCC registry to determine rates of BCC by census block group, and used spatial scan statistics to identify statistically significant geographic clusters of BCCs, adjusting for age, sex, and socioeconomic status. The study population consisted of white, non-Hispanic members of Kaiser Permanente Northern California during years 2011 and 2012. Statistically significant geographic clusters of BCC as determined by spatial scan statistics. Spatial analysis of 28 408 individuals who received a diagnosis of at least 1 BCC in 2011 or 2012 revealed distinct geographic areas with elevated BCC rates. Among the 14 counties studied, BCC incidence ranged from 661 to 1598 per 100 000 person-years. After adjustment for age, sex, and neighborhood socioeconomic status, a pattern of 5 discrete geographic clusters emerged, with a relative risk ranging from 1.12 (95% CI, 1.03-1.21; P = .006) for a cluster in eastern Sonoma and northern Napa Counties to 1.40 (95% CI, 1.15-1.71; P < .001) for a cluster in east Contra Costa and west San Joaquin Counties, compared with persons residing outside that cluster. In this study of a northern California population, we identified several geographic clusters with modestly elevated incidence of BCC. Knowledge of geographic clusters can help inform future research on the underlying etiology of the clustering including factors related to the environment, health care access, or other characteristics of the resident population, and can help target screening efforts to areas of highest yield.

  14. Dynamics of cD Clusters of Galaxies. 4; Conclusion of a Survey of 25 Abell Clusters

    NASA Technical Reports Server (NTRS)

    Oegerle, William R.; Hill, John M.; Fisher, Richard R. (Technical Monitor)

    2001-01-01

    We present the final results of a spectroscopic study of a sample of cD galaxy clusters. The goal of this program has been to study the dynamics of the clusters, with emphasis on determining the nature and frequency of cD galaxies with peculiar velocities. Redshifts measured with the MX Spectrometer have been combined with those obtained from the literature to obtain typically 50 - 150 observed velocities in each of 25 galaxy clusters containing a central cD galaxy. We present a dynamical analysis of the final 11 clusters to be observed in this sample. All 25 clusters are analyzed in a uniform manner to test for the presence of substructure, and to determine peculiar velocities and their statistical significance for the central cD galaxy. These peculiar velocities were used to determine whether or not the central cD galaxy is at rest in the cluster potential well. We find that 30 - 50% of the clusters in our sample possess significant subclustering (depending on the cluster radius used in the analysis), which is in agreement with other studies of non-cD clusters. Hence, the dynamical state of cD clusters is not different than other present-day clusters. After careful study, four of the clusters appear to have a cD galaxy with a significant peculiar velocity. Dressler-Shectman tests indicate that three of these four clusters have statistically significant substructure within 1.5/h(sub 75) Mpc of the cluster center. The dispersion 75 of the cD peculiar velocities is 164 +41/-34 km/s around the mean cluster velocity. This represents a significant detection of peculiar cD velocities, but at a level which is far below the mean velocity dispersion for this sample of clusters. The picture that emerges is one in which cD galaxies are nearly at rest with respect to the cluster potential well, but have small residual velocities due to subcluster mergers.

  15. Regional variation in the severity of pesticide exposure outcomes: applications of geographic information systems and spatial scan statistics.

    PubMed

    Sudakin, Daniel L; Power, Laura E

    2009-03-01

    Geographic information systems and spatial scan statistics have been utilized to assess regional clustering of symptomatic pesticide exposures reported to a state Poison Control Center (PCC) during a single year. In the present study, we analyzed five subsequent years of PCC data to test whether there are significant geographic differences in pesticide exposure incidents resulting in serious (moderate, major, and fatal) medical outcomes. A PCC provided the data on unintentional pesticide exposures for the time period 2001-2005. The geographic location of the caller, the location where the exposure occurred, the exposure route, and the medical outcome were abstracted. There were 273 incidents resulting in moderate effects (n = 261), major effects (n = 10), or fatalities (n = 2). Spatial scan statistics identified a geographic area consisting of two adjacent counties (one urban, one rural), where statistically significant clustering of serious outcomes was observed. The relative risk of moderate, major, and fatal outcomes was 2.0 in this spatial cluster (p = 0.0005). PCC data, geographic information systems, and spatial scan statistics can identify clustering of serious outcomes from human exposure to pesticides. These analyses may be useful for public health officials to target preventive interventions. Further investigation is warranted to understand better the potential explanations for geographical clustering, and to assess whether preventive interventions have an impact on reducing pesticide exposure incidents resulting in serious medical outcomes.

  16. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  17. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nurgaliev, D.; McDonald, M.; Benson, B. A.

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  18. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE PAGES

    Nurgaliev, D.; McDonald, M.; Benson, B. A.; ...

    2017-05-16

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  19. Wildfire cluster detection using space-time scan statistics

    NASA Astrophysics Data System (ADS)

    Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.

    2009-04-01

    The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely present along the whole year. The analysis of fires distribution to evaluate if they are statistically more frequent in some area or/and in some period of the year, can be useful to support fire management and to focus on prevention measures.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van den Bergh, Sidney

    It is widely believed that lenticular (S0) galaxies were initially spirals from which the gas has been removed by interactions with hot cluster gas, or by ram pressure stripping of cool gas from spirals that are orbiting within rich clusters of galaxies. However, problems with this interpretation are that (1) some lenticulars, such as NGC 3115, are isolated field galaxies rather than cluster members. (2) The distribution of flattening values of S0 galaxies in clusters, in groups, and in the field are statistically indistinguishable. This is surprising because one might have expected most of the progenitors of field S0 galaxiesmore » to have been flattened late-type galaxies, whereas lenticulars in clusters are thought to have mostly been derived from bulge-dominated early-type galaxies. (3) It should be hardest for ram pressure to strip massive luminous galaxies with deep potential wells. However, no statistically significant differences are seen between the luminosity distributions of early-type Shapley-Ames galaxies in clusters, groups, and in the field. (4) Finally both ram pressure stripping and evaporation by hot intracluster gas would be most efficient in rich clusters. However, the small number of available data in the Shapley-Ames sample appears to show no statistically significant differences between the relative frequencies of dust-poor S0{sub 1} and dust-rich S0{sub 3} galaxies in clusters, groups, and in the field. It is tentatively concluded that ram pressure stripping and heating by intracluster gas, may not be the only evolutionary channels that lead to the formation of lenticular galaxies. It is speculated that gas starvation, or gas ejection by active nuclei, may have played a major role in the formation of a significant fraction of all S0 galaxies.« less

  1. Regional and Temporal Variation in Methamphetamine-Related Incidents: Applications of Spatial and Temporal Scan Statistics

    PubMed Central

    Sudakin, Daniel L.

    2009-01-01

    Introduction This investigation utilized spatial scan statistics, geographic information systems and multiple data sources to assess spatial clustering of statewide methamphetamine-related incidents. Temporal and spatial associations with regulatory interventions to reduce access to precursor chemicals (pseudoephedrine) were also explored. Methods Four statewide data sources were utilized including regional poison control center statistics, fatality incidents, methamphetamine laboratory seizures, and hazardous substance releases involving methamphetamine laboratories. Spatial clustering of methamphetamine incidents was assessed using SaTScan™. SaTScan™ was also utilized to assess space-time clustering of methamphetamine laboratory incidents, in relation to the enactment of regulations to reduce access to pseudoephedrine. Results Five counties with a significantly higher relative risk of methamphetamine-related incidents were identified. The county identified as the most likely cluster had a significantly elevated relative risk of methamphetamine laboratories (RR=11.5), hazardous substance releases (RR=8.3), and fatalities relating to methamphetamine (RR=1.4). A significant increase in relative risk of methamphetamine laboratory incidents was apparent in this same geographic area (RR=20.7) during the time period when regulations were enacted in 2004 and 2005, restricting access to pseudoephedrine. Subsequent to the enactment of these regulations, a significantly lower rate of incidents (RR 0.111, p=0.0001) was observed over a large geographic area of the state, including regions that previously had significantly higher rates. Conclusions Spatial and temporal scan statistics can be effectively applied to multiple data sources to assess regional variation in methamphetamine-related incidents, and explore the impact of preventive regulatory interventions. PMID:19225949

  2. Spatial distribution and cluster analysis of retail drug shop characteristics and antimalarial behaviors as reported by private medicine retailers in western Kenya: informing future interventions.

    PubMed

    Rusk, Andria; Highfield, Linda; Wilkerson, J Michael; Harrell, Melissa; Obala, Andrew; Amick, Benjamin

    2016-02-19

    Efforts to improve malaria case management in sub-Saharan Africa have shifted focus to private antimalarial retailers to increase access to appropriate treatment. Demands to decrease intervention cost while increasing efficacy requires interventions tailored to geographic regions with demonstrated need. Cluster analysis presents an opportunity to meet this demand, but has not been applied to the retail sector or antimalarial retailer behaviors. This research conducted cluster analysis on medicine retailer behaviors in Kenya, to improve malaria case management and inform future interventions. Ninety-seven surveys were collected from medicine retailers working in the Webuye Health and Demographic Surveillance Site. Survey items included retailer training, education, antimalarial drug knowledge, recommending behavior, sales, and shop characteristics, and were analyzed using Kulldorff's spatial scan statistic. The Bernoulli purely spatial model for binomial data was used, comparing cases to controls. Statistical significance of found clusters was tested with a likelihood ratio test, using the null hypothesis of no clustering, and a p value based on 999 Monte Carlo simulations. The null hypothesis was rejected with p values of 0.05 or less. A statistically significant cluster of fewer than expected pharmacy-trained retailers was found (RR = .09, p = .001) when compared to the expected random distribution. Drug recommending behavior also yielded a statistically significant cluster, with fewer than expected retailers recommending the correct antimalarial medication to adults (RR = .018, p = .01), and fewer than expected shops selling that medication more often than outdated antimalarials when compared to random distribution (RR = 0.23, p = .007). All three of these clusters were co-located, overlapping in the northwest of the study area. Spatial clustering was found in the data. A concerning amount of correlation was found in one specific region in the study area where multiple behaviors converged in space, highlighting a prime target for interventions. These results also demonstrate the utility of applying geospatial methods in the study of medicine retailer behaviors, making the case for expanding this approach to other regions.

  3. Descriptive epidemiology of typhoid fever during an epidemic in Harare, Zimbabwe, 2012.

    PubMed

    Polonsky, Jonathan A; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J

    2014-01-01

    Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range.

  4. Descriptive Epidemiology of Typhoid Fever during an Epidemic in Harare, Zimbabwe, 2012

    PubMed Central

    Polonsky, Jonathan A.; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J.

    2014-01-01

    Background Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. Methods A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. Principal Findings We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. Conclusions This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range. PMID:25486292

  5. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  6. Using Geographic Information Science to Explore Associations between Air Pollution, Environmental Amenities, and Preterm Births

    PubMed Central

    Ogneva-Himmelberger, Yelena; Dahlberg, Tyler; Kelly, Kristen; Simas, Tiffany A. Moore

    2015-01-01

    The study uses geographic information science (GIS) and statistics to find out if there are statistical differences between full term and preterm births to non-Hispanic white, non-Hispanic Black, and Hispanic mothers in their exposure to air pollution and access to environmental amenities (green space and vendors of healthy food) in the second largest city in New England, Worcester, Massachusetts. Proximity to a Toxic Release Inventory site has a statistically significant effect on preterm birth regardless of race. The air-pollution hazard score from the Risk Screening Environmental Indicators Model is also a statistically significant factor when preterm births are categorized into three groups based on the degree of prematurity. Proximity to green space and to a healthy food vendor did not have an effect on preterm births. The study also used cluster analysis and found statistically significant spatial clusters of high preterm birth volume for non-Hispanic white, non-Hispanic Black, and Hispanic mothers. PMID:29546120

  7. Using Geographic Information Science to Explore Associations between Air Pollution, Environmental Amenities, and Preterm Births.

    PubMed

    Ogneva-Himmelberger, Yelena; Dahlberg, Tyler; Kelly, Kristen; Simas, Tiffany A Moore

    2015-01-01

    The study uses geographic information science (GIS) and statistics to find out if there are statistical differences between full term and preterm births to non-Hispanic white, non-Hispanic Black, and Hispanic mothers in their exposure to air pollution and access to environmental amenities (green space and vendors of healthy food) in the second largest city in New England, Worcester, Massachusetts. Proximity to a Toxic Release Inventory site has a statistically significant effect on preterm birth regardless of race. The air-pollution hazard score from the Risk Screening Environmental Indicators Model is also a statistically significant factor when preterm births are categorized into three groups based on the degree of prematurity. Proximity to green space and to a healthy food vendor did not have an effect on preterm births. The study also used cluster analysis and found statistically significant spatial clusters of high preterm birth volume for non-Hispanic white, non-Hispanic Black, and Hispanic mothers.

  8. Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters

    PubMed Central

    2010-01-01

    Background Irregularly shaped spatial clusters are difficult to delineate. A cluster found by an algorithm often spreads through large portions of the map, impacting its geographical meaning. Penalized likelihood methods for Kulldorff's spatial scan statistics have been used to control the excessive freedom of the shape of clusters. Penalty functions based on cluster geometry and non-connectivity have been proposed recently. Another approach involves the use of a multi-objective algorithm to maximize two objectives: the spatial scan statistics and the geometric penalty function. Results & Discussion We present a novel scan statistic algorithm employing a function based on the graph topology to penalize the presence of under-populated disconnection nodes in candidate clusters, the disconnection nodes cohesion function. A disconnection node is defined as a region within a cluster, such that its removal disconnects the cluster. By applying this function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. To evaluate the statistical significance of solutions for multi-objective scans, a statistical approach based on the concept of attainment function is used. In this paper we compared different penalized likelihoods employing the geometric and non-connectivity regularity functions and the novel disconnection nodes cohesion function. We also build multi-objective scans using those three functions and compare them with the previous penalized likelihood scans. An application is presented using comprehensive state-wide data for Chagas' disease in puerperal women in Minas Gerais state, Brazil. Conclusions We show that, compared to the other single-objective algorithms, multi-objective scans present better performance, regarding power, sensitivity and positive predicted value. The multi-objective non-connectivity scan is faster and better suited for the detection of moderately irregularly shaped clusters. The multi-objective cohesion scan is most effective for the detection of highly irregularly shaped clusters. PMID:21034451

  9. Identifying clusters of active transportation using spatial scan statistics.

    PubMed

    Huang, Lan; Stinchcomb, David G; Pickle, Linda W; Dill, Jennifer; Berrigan, David

    2009-08-01

    There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007-2008. Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units.

  10. Identifying Clusters of Active Transportation Using Spatial Scan Statistics

    PubMed Central

    Huang, Lan; Stinchcomb, David G.; Pickle, Linda W.; Dill, Jennifer; Berrigan, David

    2009-01-01

    Background There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Methods Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007–2008. Results Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. Conclusions The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units. PMID:19589451

  11. Accounting for Multiple Births in Neonatal and Perinatal Trials: Systematic Review and Case Study

    PubMed Central

    Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

    2010-01-01

    Objectives To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births. To explore the sensitivity of an actual trial to several analytic approaches to multiples. Methods A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The NO CLD trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using non-clustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. Results In the systematic review, most studies did not describe the randomization of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (p<0.01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. Conclusions The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. PMID:19969305

  12. Constraining the mass–richness relationship of redMaPPer clusters with angular clustering

    DOE PAGES

    Baxter, Eric J.; Rozo, Eduardo; Jain, Bhuvnesh; ...

    2016-08-04

    The potential of using cluster clustering for calibrating the mass–richness relation of galaxy clusters has been recognized theoretically for over a decade. In this paper, we demonstrate the feasibility of this technique to achieve high-precision mass calibration using redMaPPer clusters in the Sloan Digital Sky Survey North Galactic Cap. By including cross-correlations between several richness bins in our analysis, we significantly improve the statistical precision of our mass constraints. The amplitude of the mass–richness relation is constrained to 7 per cent statistical precision by our analysis. However, the error budget is systematics dominated, reaching a 19 per cent total errormore » that is dominated by theoretical uncertainty in the bias–mass relation for dark matter haloes. We confirm the result from Miyatake et al. that the clustering amplitude of redMaPPer clusters depends on galaxy concentration as defined therein, and we provide additional evidence that this dependence cannot be sourced by mass dependences: some other effect must account for the observed variation in clustering amplitude with galaxy concentration. Assuming that the observed dependence of redMaPPer clustering on galaxy concentration is a form of assembly bias, we find that such effects introduce a systematic error on the amplitude of the mass–richness relation that is comparable to the error bar from statistical noise. Finally, the results presented here demonstrate the power of cluster clustering for mass calibration and cosmology provided the current theoretical systematics can be ameliorated.« less

  13. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    PubMed

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Effect of spatial smoothing on t-maps: arguments for going back from t-maps to masked contrast images.

    PubMed

    Reimold, Matthias; Slifstein, Mark; Heinz, Andreas; Mueller-Schauenburg, Wolfgang; Bares, Roland

    2006-06-01

    Voxelwise statistical analysis has become popular in explorative functional brain mapping with fMRI or PET. Usually, results are presented as voxelwise levels of significance (t-maps), and for clusters that survive correction for multiple testing the coordinates of the maximum t-value are reported. Before calculating a voxelwise statistical test, spatial smoothing is required to achieve a reasonable statistical power. Little attention is being given to the fact that smoothing has a nonlinear effect on the voxel variances and thus the local characteristics of a t-map, which becomes most evident after smoothing over different types of tissue. We investigated the related artifacts, for example, white matter peaks whose position depend on the relative variance (variance over contrast) of the surrounding regions, and suggest improving spatial precision with 'masked contrast images': color-codes are attributed to the voxelwise contrast, and significant clusters (e.g., detected with statistical parametric mapping, SPM) are enlarged by including contiguous pixels with a contrast above the mean contrast in the original cluster, provided they satisfy P < 0.05. The potential benefit is demonstrated with simulations and data from a [11C]Carfentanil PET study. We conclude that spatial smoothing may lead to critical, sometimes-counterintuitive artifacts in t-maps, especially in subcortical brain regions. If significant clusters are detected, for example, with SPM, the suggested method is one way to improve spatial precision and may give the investigator a more direct sense of the underlying data. Its simplicity and the fact that no further assumptions are needed make it a useful complement for standard methods of statistical mapping.

  15. Spatiotemporal clusters of malaria cases at village level, northwest Ethiopia.

    PubMed

    Alemu, Kassahun; Worku, Alemayehu; Berhane, Yemane; Kumie, Abera

    2014-06-06

    Malaria attacks are not evenly distributed in space and time. In highland areas with low endemicity, malaria transmission is highly variable and malaria acquisition risk for individuals is unevenly distributed even within a neighbourhood. Characterizing the spatiotemporal distribution of malaria cases in high-altitude villages is necessary to prioritize the risk areas and facilitate interventions. Spatial scan statistics using the Bernoulli method were employed to identify spatial and temporal clusters of malaria in high-altitude villages. Daily malaria data were collected, using a passive surveillance system, from patients visiting local health facilities. Georeference data were collected at villages using hand-held global positioning system devices and linked to patient data. Bernoulli model using Bayesian approaches and Marcov Chain Monte Carlo (MCMC) methods were used to identify the effects of factors on spatial clusters of malaria cases. The deviance information criterion (DIC) was used to assess the goodness-of-fit of the different models. The smaller the DIC, the better the model fit. Malaria cases were clustered in both space and time in high-altitude villages. Spatial scan statistics identified a total of 56 spatial clusters of malaria in high-altitude villages. Of these, 39 were the most likely clusters (LLR = 15.62, p < 0.00001) and 17 were secondary clusters (LLR = 7.05, p < 0.03). The significant most likely temporal malaria clusters were detected between August and December (LLR = 17.87, p < 0.001). Travel away home, males and age above 15 years had statistically significant effect on malaria clusters at high-altitude villages. The study identified spatial clusters of malaria cases occurring at high elevation villages within the district. A patient who travelled away from home to a malaria-endemic area might be the most probable source of malaria infection in a high-altitude village. Malaria interventions in high altitude villages should address factors associated with malaria clustering.

  16. Use of a spatial scan statistic to identify clusters of births occurring outside Ghanaian health facilities for targeted intervention.

    PubMed

    Bosomprah, Samuel; Dotse-Gborgbortsi, Winfred; Aboagye, Patrick; Matthews, Zoe

    2016-11-01

    To identify and evaluate clusters of births that occurred outside health facilities in Ghana for targeted intervention. A retrospective study was conducted using a convenience sample of live births registered in Ghanaian health facilities from January 1 to December 31, 2014. Data were extracted from the district health information system. A spatial scan statistic was used to investigate clusters of home births through a discrete Poisson probability model. Scanning with a circular spatial window was conducted only for clusters with high rates of such deliveries. The district was used as the geographic unit of analysis. The likelihood P value was estimated using Monte Carlo simulations. Ten statistically significant clusters with a high rate of home birth were identified. The relative risks ranged from 1.43 ("least likely" cluster; P=0.001) to 1.95 ("most likely" cluster; P=0.001). The relative risks of the top five "most likely" clusters ranged from 1.68 to 1.95; these clusters were located in Ashanti, Brong Ahafo, and the Western, Eastern, and Greater regions of Accra. Health facility records, geospatial techniques, and geographic information systems provided locally relevant information to assist policy makers in delivering targeted interventions to small geographic areas. Copyright © 2016 International Federation of Gynecology and Obstetrics. Published by Elsevier Ireland Ltd. All rights reserved.

  17. Applying the Anderson-Darling test to suicide clusters: evidence of contagion at U. S. universities?

    PubMed

    MacKenzie, Donald W

    2013-01-01

    Suicide clusters at Cornell University and the Massachusetts Institute of Technology (MIT) prompted popular and expert speculation of suicide contagion. However, some clustering is to be expected in any random process. This work tested whether suicide clusters at these two universities differed significantly from those expected under a homogeneous Poisson process, in which suicides occur randomly and independently of one another. Suicide dates were collected for MIT and Cornell for 1990-2012. The Anderson-Darling statistic was used to test the goodness-of-fit of the intervals between suicides to distribution expected under the Poisson process. Suicides at MIT were consistent with the homogeneous Poisson process, while those at Cornell showed clustering inconsistent with such a process (p = .05). The Anderson-Darling test provides a statistically powerful means to identify suicide clustering in small samples. Practitioners can use this method to test for clustering in relevant communities. The difference in clustering behavior between the two institutions suggests that more institutions should be studied to determine the prevalence of suicide clustering in universities and its causes.

  18. Common Scientific and Statistical Errors in Obesity Research

    PubMed Central

    George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

    2015-01-01

    We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280

  19. Familial clustering of overweight and obesity among schoolchildren in northern China.

    PubMed

    Li, Zengning; Luo, Bin; Du, Limei; Hu, Huanyu; Xie, Ying

    2014-01-01

    We aimed to study the prevalence of overweight and obesity and to assess its familial clustering among schoolchildren in northern China. A cross-sectional study was conducted on 95,292 schoolchildren in northern China to investigate the prevalence of overweight and obesity. A group of overweight and obese children (n = 450) was selected using a cluster sampling method. Answers from a questionnaire on their and their families' nutrition and behaviors were recorded and analyzed statistically. The prevalence of overweight and obesity in schoolchildren was 27.4% and 13.2%, respectively. The prevalence of overweight and obesity were significantly higher in boys than in girls. The prevalence of familial clustering of overweight and obesity was 75.3% and 20.3%, respectively. The prevalence of overweight in first-generation (parents) and second-generation (grandparents) relatives was 54.6% and 53.1%, respectively. There was a linear trend toward correlation between age and the rates of overweight and obesity. The familial clustering of obesity with family income reached statistical significance. The prevalence of overweight and obesity was extremely high, especially among boys and their fathers. Evidence of familial clustering of overweight and obesity among schoolchildren and their parental family members in northern China is emerging.

  20. A comparison of risk factors associated with suicide ideation/attempts in American Indian and White youth in Montana.

    PubMed

    Manzo, Karen; Tiesman, Hope; Stewart, Jera; Hobbs, Gerald R; Knox, Sarah S

    2015-01-01

    We examined racial/ethnic and gender-specific associations between suicide ideation/attempts and risky behaviors, sadness/hopelessness, and victimization in Montana American Indian and White youth using 1999-2011 Youth Risk Behavior Survey data. Logistic regression was used to calculate odds ratios and 95% confidence intervals in stratified racial/ethnic-gender groups. The primary results of this study show that although the American Indian youth had more statistically significant suicidal thoughts and attempts than the White youth, they had fewer statistically significant predictors compared to the White youth. Sadness/hopelessness was the strongest, and the only statistically significant, predictor of suicide ideation/attempts common across all four groups. The unhealthy weight control cluster was a significant predictor for the White youth and the American Indian/Alaska Native girls; the alcohol/tobacco/marijuana cluster was a significant predictor for the American Indian boys only. Results show important differences across the groups and indicate directions for future research targeting prevention and intervention.

  1. Measurement of surface roughness changes of unpolished and polished enamel following erosion

    PubMed Central

    Austin, Rupert S.; Parkinson, Charles R.; Hasan, Adam; Bartlett, David W.

    2017-01-01

    Objectives To determine if Sa roughness data from measuring one central location of unpolished and polished enamel were representative of the overall surfaces before and after erosion. Methods Twenty human enamel sections (4x4 mm) were embedded in bis-acryl composite and randomised to either a native or polishing enamel preparation protocol. Enamel samples were subjected to an acid challenge (15 minutes 100 mL orange juice, pH 3.2, titratable acidity 41.3mmol OH/L, 62.5 rpm agitation, repeated for three cycles). Median (IQR) surface roughness [Sa] was measured at baseline and after erosion from both a centralised cluster and four peripheral clusters. Within each cluster, five smaller areas (0.04 mm2) provided the Sa roughness data. Results For both unpolished and polished enamel samples there were no significant differences between measuring one central cluster or four peripheral clusters, before and after erosion. For unpolished enamel the single central cluster had a median (IQR) Sa roughness of 1.45 (2.58) μm and the four peripheral clusters had a median (IQR) of 1.32 (4.86) μm before erosion; after erosion there were statistically significant reductions to 0.38 (0.35) μm and 0.34 (0.49) μm respectively (p<0.0001). Polished enamel had a median (IQR) Sa roughness 0.04 (0.17) μm for the single central cluster and 0.05 (0.15) μm for the four peripheral clusters which statistically significantly increased after erosion to 0.27 (0.08) μm for both (p<0.0001). Conclusion Measuring one central cluster of unpolished and polished enamel was representative of the overall enamel surface roughness, before and after erosion. PMID:28771562

  2. Searching for the 3.5 keV Line in the Stacked Suzaku Observations of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Bulbul, Esra; Markevitch, Maxim; Foster, Adam; Miller, Eric; Bautz, Mark; Lowenstein, Mike; Randall, Scott W.; Smith, Randall K.

    2016-01-01

    We perform a detailed study of the stacked Suzaku observations of 47 galaxy clusters, spanning a redshift range of 0.01-0.45, to search for the unidentified 3.5 keV line. This sample provides an independent test for the previously detected line. We detect a 2sigma-significant spectral feature at 3.5 keV in the spectrum of the full sample. When the sample is divided into two subsamples (cool-core and non-cool core clusters), the cool-core subsample shows no statistically significant positive residuals at the line energy. A very weak (approx. 2sigma confidence) spectral feature at 3.5 keV is permitted by the data from the non-cool-core clusters sample. The upper limit on a neutrino decay mixing angle of sin(sup 2)(2theta) = 6.1 x 10(exp -11) from the full Suzaku sample is consistent with the previous detections in the stacked XMM-Newton sample of galaxy clusters (which had a higher statistical sensitivity to faint lines), M31, and Galactic center, at a 90% confidence level. However, the constraint from the present sample, which does not include the Perseus cluster, is in tension with previously reported line flux observed in the core of the Perseus cluster with XMM-Newton and Suzaku.

  3. Ankle plantarflexion strength in rearfoot and forefoot runners: a novel clusteranalytic approach.

    PubMed

    Liebl, Dominik; Willwacher, Steffen; Hamill, Joseph; Brüggemann, Gert-Peter

    2014-06-01

    The purpose of the present study was to test for differences in ankle plantarflexion strengths of habitually rearfoot and forefoot runners. In order to approach this issue, we revisit the problem of classifying different footfall patterns in human runners. A dataset of 119 subjects running shod and barefoot (speed 3.5m/s) was analyzed. The footfall patterns were clustered by a novel statistical approach, which is motivated by advances in the statistical literature on functional data analysis. We explain the novel statistical approach in detail and compare it to the classically used strike index of Cavanagh and Lafortune (1980). The two groups found by the new cluster approach are well interpretable as a forefoot and a rearfoot footfall groups. The subsequent comparison study of the clustered subjects reveals that runners with a forefoot footfall pattern are capable of producing significantly higher joint moments in a maximum voluntary contraction (MVC) of their ankle plantarflexor muscles tendon units; difference in means: 0.28Nm/kg. This effect remains significant after controlling for an additional gender effect and for differences in training levels. Our analysis confirms the hypothesis that forefoot runners have a higher mean MVC plantarflexion strength than rearfoot runners. Furthermore, we demonstrate that our proposed stochastic cluster analysis provides a robust and useful framework for clustering foot strikes. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Structural parameters of young star clusters: fractal analysis

    NASA Astrophysics Data System (ADS)

    Hetem, A.

    2017-07-01

    A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.

  5. Accounting for multiple births in neonatal and perinatal trials: systematic review and case study.

    PubMed

    Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

    2010-02-01

    To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births and to explore the sensitivity of an actual trial to several analytic approaches to multiples. A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The Nitric Oxide to Prevent Chronic Lung Disease (NO CLD) trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using nonclustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. In the systematic review, most studies did not describe the random assignment of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (P < .01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. Copyright 2010 Mosby, Inc. All rights reserved.

  6. A method of using cluster analysis to study statistical dependence in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.

  7. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

    PubMed

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.

  8. Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

    PubMed Central

    Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

    2016-01-01

    Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646

  9. Cluster analysis as a prediction tool for pregnancy outcomes.

    PubMed

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  10. Cluster and propensity based approximation of a network

    PubMed Central

    2013-01-01

    Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424

  11. On the Distribution of Orbital Poles of Milky Way Satellites

    NASA Astrophysics Data System (ADS)

    Palma, Christopher; Majewski, Steven R.; Johnston, Kathryn V.

    2002-01-01

    In numerous studies of the outer Galactic halo some evidence for accretion has been found. If the outer halo did form in part or wholly through merger events, we might expect to find coherent streams of stars and globular clusters following orbits similar to those of their parent objects, which are assumed to be present or former Milky Way dwarf satellite galaxies. We present a study of this phenomenon by assessing the likelihood of potential descendant ``dynamical families'' in the outer halo. We conduct two analyses: one that involves a statistical analysis of the spatial distribution of all known Galactic dwarf satellite galaxies (DSGs) and globular clusters, and a second, more specific analysis of those globular clusters and DSGs for which full phase space dynamical data exist. In both cases our methodology is appropriate only to members of descendant dynamical families that retain nearly aligned orbital poles today. Since the Sagittarius dwarf (Sgr) is considered a paradigm for the type of merger/tidal interaction event for which we are searching, we also undertake a case study of the Sgr system and identify several globular clusters that may be members of its extended dynamical family. In our first analysis, the distribution of possible orbital poles for the entire sample of outer (Rgc>8 kpc) halo globular clusters is tested for statistically significant associations among globular clusters and DSGs. Our methodology for identifying possible associations is similar to that used by Lynden-Bell & Lynden-Bell, but we put the associations on a more statistical foundation. Moreover, we study the degree of possible dynamical clustering among various interesting ensembles of globular clusters and satellite galaxies. Among the ensembles studied, we find the globular cluster subpopulation with the highest statistical likelihood of association with one or more of the Galactic DSGs to be the distant, outer halo (Rgc>25 kpc), second-parameter globular clusters. The results of our orbital pole analysis are supported by the great circle cell count methodology of Johnston, Hernquist, & Bolte. The space motions of the clusters Pal 4, NGC 6229, NGC 7006, and Pyxis are predicted to be among those most likely to show the clusters to be following stream orbits, since these clusters are responsible for the majority of the statistical significance of the association between outer halo, second-parameter globular clusters and the Milky Way DSGs. In our second analysis, we study the orbits of the 41 globular clusters and six Milky Way-bound DSGs having measured proper motions to look for objects with both coplanar orbits and similar angular momenta. Unfortunately, the majority of globular clusters with measured proper motions are inner halo clusters that are less likely to retain memory of their original orbit. Although four potential globular cluster/DSG associations are found, we believe three of these associations involving inner halo clusters to be coincidental. While the present sample of objects with complete dynamical data is small and does not include many of the globular clusters that are more likely to have been captured by the Milky Way, the methodology we adopt will become increasingly powerful as more proper motions are measured for distant Galactic satellites and globular clusters, and especially as results from the Space Interferometry Mission (SIM) become available.

  12. [Space-time suicide clustering in the community of Antequera (Spain)].

    PubMed

    Pérez-Costillas, Lucía; Blasco-Fontecilla, Hilario; Benítez, Nicolás; Comino, Raquel; Antón, José Miguel; Ramos-Medina, Valentín; Lopez, Amalia; Palomo, José Luis; Madrigal, Lucía; Alcalde, Javier; Perea-Millá, Emilio; Artieda-Urrutia, Paula; de León-Martínez, Victoria; de Diego Otero, Yolanda

    2015-01-01

    Approximately 3,500 people commit suicide every year in Spain. The main aim of this study is to explore if a spatial and temporal clustering of suicide exists in the region of Antequera (Málaga, España). Sample and procedure: All suicides from January 1, 2004 to December 31, 2008 were identified using data from the Forensic Pathology Department of the Institute of Legal Medicine, Málaga (España). Geolocalisation. Google Earth was used to calculate the coordinates for each suicide decedent's address. Statistical analysis. A spatiotemporal permutation scan statistic and the Ripley's K function were used to explore spatiotemporal clustering. Pearson's chi-squared was used to determine whether there were differences between suicides inside and outside the spatiotemporal clusters. A total of 120 individuals committed suicide within the region of Antequera, of which 96 (80%) were included in our analyses. Statistically significant evidence for 7 spatiotemporal suicide clusters emerged within critical limits for the 0-2.5 km distance and for the first and second semanas (P<.05 in both cases) after suicide. There was not a single subject diagnosed with a current psychotic disorder, among suicides within clusters, whereas outside clusters, 20% had this diagnosis (X2=4.13; df=1; P<.05). There are spatiotemporal suicide clusters in the area surrounding Antequera. Patients diagnosed with current psychotic disorder are less likely to be influenced by the factors explaining suicide clustering. Copyright © 2013 SEP y SEPB. Published by Elsevier España. All rights reserved.

  13. The Effect of Mergers on Galaxy Cluster Mass Estimates

    NASA Astrophysics Data System (ADS)

    Johnson, Ryan E.; Zuhone, John A.; Thorsen, Tessa; Hinds, Andre

    2015-08-01

    At vertices within the filamentary structure that describes the universal matter distribution, clusters of galaxies grow hierarchically through merging with other clusters. As such, the most massive galaxy clusters should have experienced many such mergers in their histories. Though we cannot see them evolve over time, these mergers leave lasting, measurable effects in the cluster galaxies' phase space. By simulating several different galaxy cluster mergers here, we examine how the cluster galaxies kinematics are altered as a result of these mergers. Further, we also examine the effect of our line of sight viewing angle with respect to the merger axis. In projecting the 6-dimensional galaxy phase space onto a 3-dimensional plane, we are able to simulate how these clusters might actually appear to optical redshift surveys. We find that for those optical cluster statistics which are most often used as a proxy for the cluster mass (variants of σv), the uncertainty due to an inprecise or unknown line of sight may alter the derived cluster masses moreso than the kinematic disturbance of the merger itself. Finally, by examining these, and several other clustering statistics, we find that significant events (such as pericentric crossings) are identifiable over a range of merger initial conditions and from many different lines of sight.

  14. A statistical method (cross-validation) for bone loss region detection after spaceflight

    PubMed Central

    Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

    2010-01-01

    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144

  15. E-learning or educational leaflet: does it make a difference in oral health promotion? A clustered randomized trial.

    PubMed

    Al Bardaweel, Susan; Dashash, Mayssoon

    2018-05-10

    The early recognition of technology together with great ability to use computers and smart systems have promoted researchers to investigate the possibilities of utilizing technology for improving health care in children. The aim of this study was to compare between the traditional educational leaflets and E-applications in improving oral health knowledge, oral hygiene and gingival health in schoolchildren of Damascus city, Syria. A clustered randomized controlled trial at two public primary schools was performed. About 220 schoolchildren aged 10-11 years were included in this study and grouped into two clusters. Children in Leaflet cluster received oral health education through leaflets, while children in E-learning cluster received oral health education through an E-learning program. A questionnaire was designed to register information related to oral health knowledge and to record Plaque and Gingival indices. Questionnaire administration and clinical assessment were undertaken at baseline, 6 and at 12 weeks of oral health education. Data was analysed using one way repeated measures ANOVA, post hoc Bonferroni test and independent samples t-test. Leaflet cluster (107 participants) had statistically significant better oral health knowledge than E-learning cluster (104 participants) at 6 weeks (P < 0.05) and at 12 weeks (P < 0.05) (Leaflet cluster:100 participants, E-learning cluster:100 participants). The mean knowledge gain compared to baseline was higher in Leaflet cluster than in E-learning cluster. A significant reduction in the PI means at 6 weeks and 12 weeks was observed in both clusters (P < 0.05) when compared to baseline. Children in Leaflet cluster had significantly less plaque than those in E-learning cluster at 6 weeks (P < 0.05) and at 12 weeks (P < 0.05). Similarly, a significant reduction in the GI means at 6 weeks and 12 weeks was observed in both clusters when compared to baseline (P < 0.05). Children in Leaflet cluster had statistically significant better gingival health than E-learning cluster at 6 weeks (P < 0.05) and 12 weeks (P < 0.05). Traditional educational leaflets are an effective tool in the improvement of both oral health knowledge as well as clinical indices of oral hygiene and care among Syrian children. Leaflets can be used in school-based oral health education for a positive outcome. Australian New Zealand Clinical Trials Registry ( ACTRN12618000395235 ), Date registered: 16/03/2018, retrospectively registered.

  16. RELICS: Strong Lens Models for Five Galaxy Clusters from the Reionization Lensing Cluster Survey

    NASA Astrophysics Data System (ADS)

    Cerny, Catherine; Sharon, Keren; Andrade-Santos, Felipe; Avila, Roberto J.; Bradač, Maruša; Bradley, Larry D.; Carrasco, Daniela; Coe, Dan; Czakon, Nicole G.; Dawson, William A.; Frye, Brenda L.; Hoag, Austin; Huang, Kuang-Han; Johnson, Traci L.; Jones, Christine; Lam, Daniel; Lovisari, Lorenzo; Mainali, Ramesh; Oesch, Pascal A.; Ogaz, Sara; Past, Matthew; Paterno-Mahler, Rachel; Peterson, Avery; Riess, Adam G.; Rodney, Steven A.; Ryan, Russell E.; Salmon, Brett; Sendra-Server, Irene; Stark, Daniel P.; Strolger, Louis-Gregory; Trenti, Michele; Umetsu, Keiichi; Vulcani, Benedetta; Zitrin, Adi

    2018-06-01

    Strong gravitational lensing by galaxy clusters magnifies background galaxies, enhancing our ability to discover statistically significant samples of galaxies at {\\boldsymbol{z}}> 6, in order to constrain the high-redshift galaxy luminosity functions. Here, we present the first five lens models out of the Reionization Lensing Cluster Survey (RELICS) Hubble Treasury Program, based on new HST WFC3/IR and ACS imaging of the clusters RXC J0142.9+4438, Abell 2537, Abell 2163, RXC J2211.7–0349, and ACT-CLJ0102–49151. The derived lensing magnification is essential for estimating the intrinsic properties of high-redshift galaxy candidates, and properly accounting for the survey volume. We report on new spectroscopic redshifts of multiply imaged lensed galaxies behind these clusters, which are used as constraints, and detail our strategy to reduce systematic uncertainties due to lack of spectroscopic information. In addition, we quantify the uncertainty on the lensing magnification due to statistical and systematic errors related to the lens modeling process, and find that in all but one cluster, the magnification is constrained to better than 20% in at least 80% of the field of view, including statistical and systematic uncertainties. The five clusters presented in this paper span the range of masses and redshifts of the clusters in the RELICS program. We find that they exhibit similar strong lensing efficiencies to the clusters targeted by the Hubble Frontier Fields within the WFC3/IR field of view. Outputs of the lens models are made available to the community through the Mikulski Archive for Space Telescopes.

  17. Using scan statistics for congenital anomalies surveillance: the EUROCAT methodology.

    PubMed

    Teljeur, Conor; Kelly, Alan; Loane, Maria; Densem, James; Dolk, Helen

    2015-11-01

    Scan statistics have been used extensively to identify temporal clusters of health events. We describe the temporal cluster detection methodology adopted by the EUROCAT (European Surveillance of Congenital Anomalies) monitoring system. Since 2001, EUROCAT has implemented variable window width scan statistic for detecting unusual temporal aggregations of congenital anomaly cases. The scan windows are based on numbers of cases rather than being defined by time. The methodology is imbedded in the EUROCAT Central Database for annual application to centrally held registry data. The methodology was incrementally adapted to improve the utility and to address statistical issues. Simulation exercises were used to determine the power of the methodology to identify periods of raised risk (of 1-18 months). In order to operationalize the scan methodology, a number of adaptations were needed, including: estimating date of conception as unit of time; deciding the maximum length (in time) and recency of clusters of interest; reporting of multiple and overlapping significant clusters; replacing the Monte Carlo simulation with a lookup table to reduce computation time; and placing a threshold on underlying population change and estimating the false positive rate by simulation. Exploration of power found that raised risk periods lasting 1 month are unlikely to be detected except when the relative risk and case counts are high. The variable window width scan statistic is a useful tool for the surveillance of congenital anomalies. Numerous adaptations have improved the utility of the original methodology in the context of temporal cluster detection in congenital anomalies.

  18. Identifying and characterizing hepatitis C virus hotspots in Massachusetts: a spatial epidemiological approach.

    PubMed

    Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H

    2017-04-20

    Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.

  19. Familial clustering of overweight and obesity among schoolchildren in northern China

    PubMed Central

    Li, Zengning; Luo, Bin; Du, Limei; Hu, Huanyu; Xie, Ying

    2014-01-01

    Background: We aimed to study the prevalence of overweight and obesity and to assess its familial clustering among schoolchildren in northern China. Methods: A cross-sectional study was conducted on 95,292 schoolchildren in northern China to investigate the prevalence of overweight and obesity. A group of overweight and obese children (n = 450) was selected using a cluster sampling method. Answers from a questionnaire on their and their families’ nutrition and behaviors were recorded and analyzed statistically. Results: The prevalence of overweight and obesity in schoolchildren was 27.4% and 13.2%, respectively. The prevalence of overweight and obesity were significantly higher in boys than in girls. The prevalence of familial clustering of overweight and obesity was 75.3% and 20.3%, respectively. The prevalence of overweight in first-generation (parents) and second-generation (grandparents) relatives was 54.6% and 53.1%, respectively. There was a linear trend toward correlation between age and the rates of overweight and obesity. The familial clustering of obesity with family income reached statistical significance. Conclusion: The prevalence of overweight and obesity was extremely high, especially among boys and their fathers. Evidence of familial clustering of overweight and obesity among schoolchildren and their parental family members in northern China is emerging. PMID:25664106

  20. Dynamic evolution of nearby galaxy clusters

    NASA Astrophysics Data System (ADS)

    Biernacka, M.; Flin, P.

    2011-06-01

    A study of the evolution of 377 rich ACO clusters with redshift z<0.2 is presented. The data concerning galaxies in the investigated clusters were obtained using FOCAS packages applied to Digital Sky Survey I. The 377 galaxy clusters constitute a statistically uniform sample to which visual galaxy/star reclassifications were applied. Cluster shape within 2.0 h-1 Mpc from the adopted cluster centre (the mean and the median of all galaxy coordinates, the position of the brightest and of the third brightest galaxy in the cluster) was determined through its ellipticity calculated using two methods: the covariance ellipse method (hereafter CEM) and the method based on Minkowski functionals (hereafter MFM). We investigated ellipticity dependence on the radius of circular annuli, in which ellipticity was calculated. This was realized by varying the radius from 0.5 to 2 Mpc in steps of 0.25 Mpc. By performing Monte Carlo simulations, we generated clusters to which the two ellipticity methods were applied. We found that the covariance ellipse method works better than the method based on Minkowski functionals. We also found that ellipticity distributions are different for different methods used. Using the ellipticity-redshift relation, we investigated the possibility of cluster evolution in the low-redshift Universe. The correlation of cluster ellipticities with redshifts is undoubtly an indicator of structural evolution. Using the t-Student statistics, we found a statistically significant correlation between ellipticity and redshift at the significance level of α = 0.95. In one of the two shape determination methods we found that ellipticity grew with redshift, while the other method gave opposite results. Monte Carlo simulations showed that only ellipticities calculated at the distance of 1.5 Mpc from cluster centre in the Minkowski functional method are robust enough to be taken into account, but for that radius we did not find any relation between e and z. Since CEM pointed towards the existence of the e(z) relation, we conclude that such an effect is real though rather weak. A detailed study of the e(z) relation showed that the observed relation is nonlinear, and the number of elongated structures grows rapidly for z>0.14.

  1. Cosmological Constraints from Galaxy Cluster Velocity Statistics

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Suman; Kosowsky, Arthur

    2007-04-01

    Future microwave sky surveys will have the sensitivity to detect the kinematic Sunyaev-Zeldovich signal from moving galaxy clusters, thus providing a direct measurement of their line-of-sight peculiar velocity. We show that cluster peculiar velocity statistics applied to foreseeable surveys will put significant constraints on fundamental cosmological parameters. We consider three statistical quantities that can be constructed from a cluster peculiar velocity catalog: the probability density function, the mean pairwise streaming velocity, and the pairwise velocity dispersion. These quantities are applied to an envisioned data set that measures line-of-sight cluster velocities with normal errors of 100 km s-1 for all clusters with masses larger than 1014 Msolar over a sky area of up to 5000 deg2. A simple Fisher matrix analysis of this survey shows that the normalization of the matter power spectrum and the dark energy equation of state can be constrained to better than 10%, and that the Hubble constant and the primordial power spectrum index can be constrained to a few percent, independent of any other cosmological observations. We also find that the current constraint on the power spectrum normalization can be improved by more than a factor of 2 using data from a 400 deg2 survey and WMAP third-year priors. We also show how the constraints on cosmological parameters change if cluster velocities are measured with normal errors of 300 km s-1.

  2. Groundwater source contamination mechanisms: Physicochemical profile clustering, risk factor analysis and multivariate modelling

    NASA Astrophysics Data System (ADS)

    Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.

  3. Hot spot detection and spatio-temporal dispersion of dengue fever in Hanoi, Vietnam

    PubMed Central

    Toan, Do Thi Thanh; Hu, Wenbiao; Thai, Pham Quang; Hoat, Luu Ngoc; Wright, Pamela; Martens, Pim

    2013-01-01

    Introduction Dengue fever (DF) in Vietnam remains a serious emerging arboviral disease, which generates significant concerns among international health authorities. Incidence rates of DF have increased significantly during the last few years in many provinces and cities, especially Hanoi. The purpose of this study was to detect DF hot spots and identify the disease dynamics dispersion of DF over the period between 2004 and 2009 in Hanoi, Vietnam. Methods Daily data on DF cases and population data for each postcode area of Hanoi between January 1998 and December 2009 were obtained from the Hanoi Center for Preventive Health and the General Statistic Office of Vietnam. Moran's I statistic was used to assess the spatial autocorrelation of reported DF. Spatial scan statistics and logistic regression were used to identify space–time clusters and dispersion of DF. Results The study revealed a clear trend of geographic expansion of DF transmission in Hanoi through the study periods (OR 1.17, 95% CI 1.02–1.34). The spatial scan statistics showed that 6/14 (42.9%) districts in Hanoi had significant cluster patterns, which lasted 29 days and were limited to a radius of 1,000 m. The study also demonstrated that most DF cases occurred between June and November, during which the rainfall and temperatures are highest. Conclusions There is evidence for the existence of statistically significant clusters of DF in Hanoi, and that the geographical distribution of DF has expanded over recent years. This finding provides a foundation for further investigation into the social and environmental factors responsible for changing disease patterns, and provides data to inform program planning for DF control. PMID:23364076

  4. Hot spot detection and spatio-temporal dispersion of dengue fever in Hanoi, Vietnam.

    PubMed

    Toan, Do Thi Thanh; Hu, Wenbiao; Quang Thai, Pham; Hoat, Luu Ngoc; Wright, Pamela; Martens, Pim

    2013-01-24

    Dengue fever (DF) in Vietnam remains a serious emerging arboviral disease, which generates significant concerns among international health authorities. Incidence rates of DF have increased significantly during the last few years in many provinces and cities, especially Hanoi. The purpose of this study was to detect DF hot spots and identify the disease dynamics dispersion of DF over the period between 2004 and 2009 in Hanoi, Vietnam. Daily data on DF cases and population data for each postcode area of Hanoi between January 1998 and December 2009 were obtained from the Hanoi Center for Preventive Health and the General Statistic Office of Vietnam. Moran's I statistic was used to assess the spatial autocorrelation of reported DF. Spatial scan statistics and logistic regression were used to identify space-time clusters and dispersion of DF. The study revealed a clear trend of geographic expansion of DF transmission in Hanoi through the study periods (OR 1.17, 95% CI 1.02-1.34). The spatial scan statistics showed that 6/14 (42.9%) districts in Hanoi had significant cluster patterns, which lasted 29 days and were limited to a radius of 1,000 m. The study also demonstrated that most DF cases occurred between June and November, during which the rainfall and temperatures are highest. There is evidence for the existence of statistically significant clusters of DF in Hanoi, and that the geographical distribution of DF has expanded over recent years. This finding provides a foundation for further investigation into the social and environmental factors responsible for changing disease patterns, and provides data to inform program planning for DF control.

  5. Female cluster headache in the United States of America: what are the gender differences? Results from the United States Cluster Headache Survey.

    PubMed

    Rozen, Todd D; Fishman, Royce S

    2012-06-15

    To present results from the United States Cluster Headache Survey regarding gender differences in cluster headache demographics, clinical characteristics, diagnostic delay, triggers, treatment response and personal burden. Very few studies have looked at the gender differences in cluster headache presentation. The United States Cluster Headache Survey is the largest study of cluster headache sufferers ever completed in the United States and it is also the largest study of female cluster headache patients ever presented. The total survey consisted of 187 multiple choice questions which dealt with various issues related to cluster headache including: demographics, clinical characteristics, concomitant medical conditions, family history, triggers, smoking history, diagnosis, treatment response and personal burden. A group of questions were specifically targeted to female cluster headache patients. The survey was placed on a website from October to December 2008. For all survey responders the diagnosis of cluster headache needed to be made by a neurologist but there was no validation of the headache diagnosis by the authors. 1134 individuals completed the survey (816 male, 318 female). Key Points that define the differences between female and male cluster headache include: a. Age of onset: women develop cluster headache at an earlier age than men and are more likely to develop a second peak of cluster headache onset after 50 years of age. b. Family history: woman cluster headache sufferers are more likely to have a family history of both cluster headache and migraine and have an increased familial risk of Parkinson's disease. c. Comorbid conditions: female cluster headaches sufferers are significantly more likely to experience depression and have asthma than males. d. Aura issues: aura with cluster headache is equally common in both sexes, but aura duration is shorter in women. Women are much more likely to experience sensory, language and brainstem auras. e. Pain location: cluster headache pain is typically retro-orbital in location in both sexes but women are significantly more likely to experience cluster headache pain in the jaw, cheek and ear than men. f. Associated symptoms: women with cluster headache develop more “migrainous” associated symptoms than men, especially nausea and they are also more likely to have self-injurious behavior than men. g. Triggers: women with cluster headache are much less likely to have alcohol trigger a headache, but are significantly more likely to have “migrainous” triggers for their cluster headaches than men. h. Smoking issues: women are much less likely to have a smoking history than male cluster headache sufferers, more likely to have never smoked prior to cluster headache onset. i. Cycle issues: spring and fall are the most common time to start a cluster headache cycle in both sexes. Women are statistically significantly less likely to start a cluster headache cycle in the months of October–December than men. Women have more attacks per day and higher pain intensity nighttime attacks than men. j. in regard to acute treatment women statistically were less response to sumatriptan injectable and nasal spray than men, but statistically more likely to respond to inhaled lidocaine. There was equal efficacy in the sexes to inhaled oxygen but slower response in women. For preventive treatment no significant gender differences were noted, but overall women were less responsive to almost all preventives than men. k. Diagnostic delay: there remains a significant diagnostic delay for cluster headache patients in both sexes but women were more likely to be diagnosed after 10 years of symptom onset than males and significantly fewer women were diagnosed correctly at an initial physician visit than men. l. Female specific issues: cluster headache does not appear to be influenced by menses or menopause but 50% of the survey responders stated their headaches improved with pregnancy. Cluster headache does not appear to alter fertility rates in female cluster headache sufferers. m. Personal burden: cluster headache causes significantly more personal burden in women than men with more loss of employment and/or need of disability, as well as more homebound days. Overall women and men with cluster headache have a similar presentation but there are some distinct differences that have been suggested in smaller studies of female cluster headache that we have now verified, while some of our study conclusions have not been shown previously. One major limitation to the study is a lack of validation of diagnosis. A substantial false positive cluster headache diagnosis rate, especially in females, cannot be excluded by the study methods utilized. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. Imprints of dynamical interactions on brown dwarf pairing statistics and kinematics

    NASA Astrophysics Data System (ADS)

    Sterzik, M. F.; Durisen, R. H.

    2003-03-01

    We present statistically robust predictions of brown dwarf properties arising from dynamical interactions during their early evolution in small clusters. Our conclusions are based on numerical calculations of the internal cluster dynamics as well as on Monte-Carlo models. Accounting for recent observational constraints on the sub-stellar mass function and initial properties in fragmenting star forming clumps, we derive multiplicity fractions, mass ratios, separation distributions, and velocity dispersions. We compare them with observations of brown dwarfs in the field and in young clusters. Observed brown dwarf companion fractions around 15 +/- 7% for very low-mass stars as reported recently by Close et al. (\\cite{CSFB03}) are consistent with certain dynamical decay models. A significantly smaller mean separation distribution for brown dwarf binaries than for binaries of late-type stars can be explained by similar specific energy at the time of cluster formation for all cluster masses. Due to their higher velocity dispersions, brown-dwarfs and low-mass single stars will undergo time-dependent spatial segregation from higher-mass stars and multiple systems. This will cause mass functions and binary statistics in star forming regions to vary with the age of the region and the volume sampled.

  7. Uncertainties in the cluster-cluster correlation function

    NASA Astrophysics Data System (ADS)

    Ling, E. N.; Frenk, C. S.; Barrow, J. D.

    1986-12-01

    The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.

  8. Minimal spanning tree algorithm for γ-ray source detection in sparse photon images: cluster parameters and selection strategies

    DOE PAGES

    Campana, R.; Bernieri, E.; Massaro, E.; ...

    2013-05-22

    We present that the minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to γ-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular “tree” connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, canmore » be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitudeM of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that √M is strongly correlated with other statistical significance parameters, derived from a wavelet based algorithm and maximum likelihood (ML) analysis, and that it can be used as a good estimator of statistical significance of MST detections. Finally, we apply the method to a 2-year LAT image at energies higher than 3 GeV, and we show the presence of new clusters, likely associated with BL Lac objects.« less

  9. Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

    PubMed

    Anandakrishnan, Ramu; Onufriev, Alexey

    2008-03-01

    In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.

  10. The Peculiarities in O-Type Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Panko, E. A.; Emelyanov, S. I.

    We present the results of analysis of 2D distribution of galaxies in galaxy cluster fields. The Catalogue of Galaxy Clusters and Groups PF (Panko & Flin) was used as input observational data set. We selected open rich PF galaxy clusters, containing 100 and more galaxies for our study. According to Panko classification scheme open galaxy clusters (O-type) have no concentration to the cluster center. The data set contains both pure O-type clusters and O-type clusters with overdence belts, namely OL and OF types. According to Rood & Sastry and Struble & Rood ideas, the open galaxy clusters are the beginning stage of cluster evolution. We found in the O-type clusters some types of statistically significant regular peculiarities, such as two crossed belts or curved strip. We suppose founded features connected with galaxy clusters evolution and the distribution of DM inside the clusters.

  11. Evaluation of the Gini Coefficient in Spatial Scan Statistics for Detecting Irregularly Shaped Clusters

    PubMed Central

    Kim, Jiyu; Jung, Inkyung

    2017-01-01

    Spatial scan statistics with circular or elliptic scanning windows are commonly used for cluster detection in various applications, such as the identification of geographical disease clusters from epidemiological data. It has been pointed out that the method may have difficulty in correctly identifying non-compact, arbitrarily shaped clusters. In this paper, we evaluated the Gini coefficient for detecting irregularly shaped clusters through a simulation study. The Gini coefficient, the use of which in spatial scan statistics was recently proposed, is a criterion measure for optimizing the maximum reported cluster size. Our simulation study results showed that using the Gini coefficient works better than the original spatial scan statistic for identifying irregularly shaped clusters, by reporting an optimized and refined collection of clusters rather than a single larger cluster. We have provided a real data example that seems to support the simulation results. We think that using the Gini coefficient in spatial scan statistics can be helpful for the detection of irregularly shaped clusters. PMID:28129368

  12. Probing the dynamical and X-ray mass proxies of the cluster of galaxies Abell S1101

    NASA Astrophysics Data System (ADS)

    Rabitz, Andreas; Zhang, Yu-Ying; Schwope, Axel; Verdugo, Miguel; Reiprich, Thomas H.; Klein, Matthias

    2017-01-01

    Context. The galaxy cluster Abell S1101 (S1101 hereafter) deviates significantly from the X-ray luminosity versus velocity dispersion relation (L-σ) of galaxy clusters in our previous study. Given reliable X-ray luminosity measurement combining XMM-Newton and ROSAT, this could most likely be caused by the bias in the velocity dispersion due to interlopers and low member statistic in the previous sample of member galaxies, which was solely based on 20 galaxy redshifts drawn from the literature. Aims: We intend to increase the galaxy member statistics to perform precision measurements of the velocity dispersion and dynamical mass of S1101. We aim for a detailed substructure and dynamical state characterization of this cluster, and a comparison of mass estimates derived from (I) the velocity dispersion (Mvir), (II) the caustic mass computation (Mcaustic), and (III) mass proxies from X-ray observations and the Sunyaev-Zel'dovich (SZ) effect. Methods: We carried out new optical spectroscopic observations of the galaxies in this cluster field with VIMOS, obtaining a sample of 60 member galaxies for S1101. We revised the cluster redshift and velocity dispersion measurements based on this sample and also applied the Dressler-Shectman substructure test. Results: The completeness of cluster members within r200 was significantly improved for this cluster. Tests for dynamical substructure do not show evidence of major disturbances or merging activities in S1101. We find good agreement between the dynamical cluster mass measurements and X-ray mass estimates, which confirms the relaxed state of the cluster displayed in the 2D substructure test. The SZ mass proxy is slightly higher than the other estimates. The updated measurement of σ erased the deviation of S1101 in the L-σ relation. We also noticed a background structure in the cluster field of S1101. This structure is a galaxy group that is very close to the cluster S1101 in projection but at almost twice its redshift. However the mass of this structure is too low to significantly bias the observed bolometric X-ray luminosity of S1101. Hence, we can conclude that the deviation of S1101 in the L-σ relation in our previous study can be explained by low member statistics and galaxy interlopers, which are known to introduce biases in the estimated velocity dispersion. We have made use of VLT/VIMOS observations taken with the ESO Telescope at the Paranal Observatory under programme 087.A-0096.

  13. Relative risk estimates from spatial and space-time scan statistics: Are they biased?

    PubMed Central

    Prates, Marcos O.; Kulldorff, Martin; Assunção, Renato M.

    2014-01-01

    The purely spatial and space-time scan statistics have been successfully used by many scientists to detect and evaluate geographical disease clusters. Although the scan statistic has high power in correctly identifying a cluster, no study has considered the estimates of the cluster relative risk in the detected cluster. In this paper we evaluate whether there is any bias on these estimated relative risks. Intuitively, one may expect that the estimated relative risks has upward bias, since the scan statistic cherry picks high rate areas to include in the cluster. We show that this intuition is correct for clusters with low statistical power, but with medium to high power the bias becomes negligible. The same behaviour is not observed for the prospective space-time scan statistic, where there is an increasing conservative downward bias of the relative risk as the power to detect the cluster increases. PMID:24639031

  14. An application of seasonal ARIMA models on group commodities to forecast Philippine merchandise exports performance

    NASA Astrophysics Data System (ADS)

    Natividad, Gina May R.; Cawiding, Olive R.; Addawe, Rizavel C.

    2017-11-01

    The increase in the merchandise exports of the country offers information about the Philippines' trading role within the global economy. Merchandise exports statistics are used to monitor the country's overall production that is consumed overseas. This paper investigates the comparison between two models obtained by a) clustering the commodity groups into two based on its proportional contribution to the total exports, and b) treating only the total exports. Different seasonal autoregressive integrated moving average (SARIMA) models were then developed for the clustered commodities and for the total exports based on the monthly merchandise exports of the Philippines from 2011 to 2016. The data set used in this study was retrieved from the Philippine Statistics Authority (PSA) which is the central statistical authority in the country responsible for primary data collection. A test for significance of the difference between means at 0.05 level of significance was then performed on the forecasts produced. The result indicates that there is a significant difference between the mean of the forecasts of the two models. Moreover, upon a comparison of the root mean square error (RMSE) and mean absolute error (MAE) of the models, it was found that the models used for the clustered groups outperform the model for the total exports.

  15. Adjusted scaling of FDG positron emission tomography images for statistical evaluation in patients with suspected Alzheimer's disease.

    PubMed

    Buchert, Ralph; Wilke, Florian; Chakrabarti, Bhismadev; Martin, Brigitte; Brenner, Winfried; Mester, Janos; Clausen, Malte

    2005-10-01

    Statistical parametric mapping (SPM) gained increasing acceptance for the voxel-based statistical evaluation of brain positron emission tomography (PET) with the glucose analog 2-[18F]-fluoro-2-deoxy-d-glucose (FDG) in patients with suspected Alzheimer's disease (AD). To increase the sensitivity for detection of local changes, individual differences of total brain FDG uptake are usually compensated for by proportional scaling. However, in cases of extensive hypometabolic areas, proportional scaling overestimates scaled uptake. This may cause significant underestimation of the extent of hypometabolic areas by the statistical test. To detect this problem, the authors tested for hypermetabolism. In patients with no visual evidence of true focal hypermetabolism, significant clusters of hypermetabolism in the presence of extended hypometabolism were interpreted as false-positive findings, indicating relevant overestimation of scaled uptake. In this case, scaled uptake was reduced step by step until there were no more significant clusters of hypermetabolism. In 22 consecutive patients with suspected AD, proportional scaling resulted in relevant overestimation of scaled uptake in 9 patients. Scaled uptake had to be reduced by 11.1% +/- 5.3% in these cases to eliminate the artifacts. Adjusted scaling resulted in extension of existing and appearance of new clusters of hypometabolism. Total volume of the additional voxels with significant hypometabolism depended linearly on the extent of the additional scaling and was 202 +/- 118 mL on average. Adjusted scaling helps to identify characteristic metabolic patterns in patients with suspected AD. It is expected to increase specificity of FDGPET in this group of patients.

  16. A measurement of CMB cluster lensing with SPT and DES year 1 data

    NASA Astrophysics Data System (ADS)

    Baxter, E. J.; Raghunathan, S.; Crawford, T. M.; Fosalba, P.; Hou, Z.; Holder, G. P.; Omori, Y.; Patil, S.; Rozo, E.; Abbott, T. M. C.; Annis, J.; Aylor, K.; Benoit-Lévy, A.; Benson, B. A.; Bertin, E.; Bleem, L.; Buckley-Geer, E.; Burke, D. L.; Carlstrom, J.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Chang, C. L.; Cho, H.-M.; Crites, A. T.; Crocce, M.; Cunha, C. E.; da Costa, L. N.; D'Andrea, C. B.; Davis, C.; de Haan, T.; Desai, S.; Dietrich, J. P.; Dobbs, M. A.; Dodelson, S.; Doel, P.; Drlica-Wagner, A.; Estrada, J.; Everett, W. B.; Fausti Neto, A.; Flaugher, B.; Frieman, J.; García-Bellido, J.; George, E. M.; Gaztanaga, E.; Giannantonio, T.; Gruen, D.; Gruendl, R. A.; Gschwend, J.; Gutierrez, G.; Halverson, N. W.; Harrington, N. L.; Hartley, W. G.; Holzapfel, W. L.; Honscheid, K.; Hrubes, J. D.; Jain, B.; James, D. J.; Jarvis, M.; Jeltema, T.; Knox, L.; Krause, E.; Kuehn, K.; Kuhlmann, S.; Kuropatkin, N.; Lahav, O.; Lee, A. T.; Leitch, E. M.; Li, T. S.; Lima, M.; Luong-Van, D.; Manzotti, A.; March, M.; Marrone, D. P.; Marshall, J. L.; Martini, P.; McMahon, J. J.; Melchior, P.; Menanteau, F.; Meyer, S. S.; Miller, C. J.; Miquel, R.; Mocanu, L. M.; Mohr, J. J.; Natoli, T.; Nord, B.; Ogando, R. L. C.; Padin, S.; Plazas, A. A.; Pryke, C.; Rapetti, D.; Reichardt, C. L.; Romer, A. K.; Roodman, A.; Ruhl, J. E.; Rykoff, E.; Sako, M.; Sanchez, E.; Sayre, J. T.; Scarpine, V.; Schaffer, K. K.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Shirokoff, E.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Staniszewski, Z.; Stark, A.; Story, K.; Suchyta, E.; Tarle, G.; Thomas, D.; Troxel, M. A.; Vanderlinde, K.; Vieira, J. D.; Walker, A. R.; Williamson, R.; Zhang, Y.; Zuntz, J.

    2018-05-01

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. The cluster catalogue used in this analysis contains 3697 members with mean redshift of \\bar{z} = 0.45. We detect lensing of the CMB by the galaxy clusters at 8.1σ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly 17 {per cent} precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentring.

  17. RRW: repeated random walks on genome-scale protein networks for local cluster discovery

    PubMed Central

    Macropol, Kathy; Can, Tolga; Singh, Ambuj K

    2009-01-01

    Background We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. Results We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. Conclusion RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters. PMID:19740439

  18. The effectiveness of repeat lumbar transforaminal epidural steroid injections.

    PubMed

    Murthy, Naveen S; Geske, Jennifer R; Shelerud, Randy A; Wald, John T; Diehn, Felix E; Thielen, Kent R; Kaufmann, Timothy J; Morris, Jonathan M; Lehman, Vance T; Amrami, Kimberly K; Carter, Rickey E; Maus, Timothy P

    2014-10-01

    The aim of this study was to determine 1) if repeat lumbar transforaminal epidural steroid injections (TFESIs) resulted in recovery of pain relief, which has waned since an index injection, and 2) if cumulative benefit could be achieved by repeat injections within 3 months of the index injection. Retrospective observational study with statistical modeling of the response to repeat TFESI. Academic radiology practice. Two thousand eighty-seven single-level TFESIs were performed for radicular pain on 933 subjects. Subjects received repeat TFESIs >2 weeks and <1 year from the index injection. Hierarchical linear modeling was performed to evaluate changes in continuous and categorical pain relief outcomes after repeat TFESI. Subgroup analyses were performed on patients with <3 months duration of pain (acute pain), patients receiving repeat injections within 3 months (clustered injections), and in patients with both acute pain and clustered injections. Repeat TFESIs achieved pain relief in both continuous and categorical outcomes. Relative to the index injection, there was a minimal but statistically significant decrease in pain relief in modeled continuous outcome measures with subsequent injections. Acute pain patients recovered all prior benefit with a statistically significant cumulative benefit. Patients receiving clustered injections achieved statistically significant cumulative benefit, of greater magnitude in acute pain patients. Repeat TFESI may be performed for recurrence of radicular pain with the expectation of recovery of most or all previously achieved benefit; acute pain patients will likely recover all prior benefit. Repeat TFESIs within 3 months of the index injection can provide cumulative benefit. Wiley Periodicals, Inc.

  19. Utility of K-Means clustering algorithm in differentiating apparent diffusion coefficient values between benign and malignant neck pathologies

    PubMed Central

    Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.

    2014-01-01

    Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723

  20. Utility of the k-means clustering algorithm in differentiating apparent diffusion coefficient values of benign and malignant neck pathologies.

    PubMed

    Srinivasan, A; Galbán, C J; Johnson, T D; Chenevert, T L; Ross, B D; Mukherji, S K

    2010-04-01

    Does the K-means algorithm do a better job of differentiating benign and malignant neck pathologies compared to only mean ADC? The objective of our study was to analyze the differences between ADC partitions to evaluate whether the K-means technique can be of additional benefit to whole-lesion mean ADC alone in distinguishing benign and malignant neck pathologies. MR imaging studies of 10 benign and 10 malignant proved neck pathologies were postprocessed on a PC by using in-house software developed in Matlab. Two neuroradiologists manually contoured the lesions, with the ADC values within each lesion clustered into 2 (low, ADC-ADC(L); high, ADC-ADC(H)) and 3 partitions (ADC(L); intermediate, ADC-ADC(I); ADC(H)) by using the K-means clustering algorithm. An unpaired 2-tailed Student t test was performed for all metrics to determine statistical differences in the means of the benign and malignant pathologies. A statistically significant difference between the mean ADC(L) clusters in benign and malignant pathologies was seen in the 3-cluster models of both readers (P = .03 and .022, respectively) and the 2-cluster model of reader 2 (P = .04), with the other metrics (ADC(H), ADC(I); whole-lesion mean ADC) not revealing any significant differences. ROC curves demonstrated the quantitative differences in mean ADC(H) and ADC(L) in both the 2- and 3-cluster models to be predictive of malignancy (2 clusters: P = .008, area under curve = 0.850; 3 clusters: P = .01, area under curve = 0.825). The K-means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared with whole-lesion mean ADC alone.

  1. Case-control geographic clustering for residential histories accounting for risk factors and covariates.

    PubMed

    Jacquez, Geoffrey M; Meliker, Jaymie R; Avruskin, Gillian A; Goovaerts, Pierre; Kaufmann, Andy; Wilson, Mark L; Nriagu, Jerome

    2006-08-03

    Methods for analyzing space-time variation in risk in case-control studies typically ignore residential mobility. We develop an approach for analyzing case-control data for mobile individuals and apply it to study bladder cancer in 11 counties in southeastern Michigan. At this time data collection is incomplete and no inferences should be drawn - we analyze these data to demonstrate the novel methods. Global, local and focused clustering of residential histories for 219 cases and 437 controls is quantified using time-dependent nearest neighbor relationships. Business address histories for 268 industries that release known or suspected bladder cancer carcinogens are analyzed. A logistic model accounting for smoking, gender, age, race and education specifies the probability of being a case, and is incorporated into the cluster randomization procedures. Sensitivity of clustering to definition of the proximity metric is assessed for 1 to 75 k nearest neighbors. Global clustering is partly explained by the covariates but remains statistically significant at 12 of the 14 levels of k considered. After accounting for the covariates 26 Local clusters are found in Lapeer, Ingham, Oakland and Jackson counties, with the clusters in Ingham and Oakland counties appearing in 1950 and persisting to the present. Statistically significant focused clusters are found about the business address histories of 22 industries located in Oakland (19 clusters), Ingham (2) and Jackson (1) counties. Clusters in central and southeastern Oakland County appear in the 1930's and persist to the present day. These methods provide a systematic approach for evaluating a series of increasingly realistic alternative hypotheses regarding the sources of excess risk. So long as selection of cases and controls is population-based and not geographically biased, these tools can provide insights into geographic risk factors that were not specifically assessed in the case-control study design.

  2. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  3. Oxidative stress gene expression profile in inbred mouse after ischemia/reperfusion small bowel injury.

    PubMed

    Bertoletto, Paulo Roberto; Ikejiri, Adauto Tsutomu; Somaio Neto, Frederico; Chaves, José Carlos; Teruya, Roberto; Bertoletto, Eduardo Rodrigues; Taha, Murched Omar; Fagundes, Djalma José

    2012-11-01

    To determine the profile of gene expressions associated with oxidative stress and thereby contribute to establish parameters about the role of enzyme clusters related to the ischemia/reperfusion intestinal injury. Twelve male inbred mice (C57BL/6) were randomly assigned: Control Group (CG) submitted to anesthesia, laparotomy and observed by 120 min; Ischemia/reperfusion Group (IRG) submitted to anesthesia, laparotomy, 60 min of small bowel ischemia and 60 min of reperfusion. A pool of six samples was submitted to the qPCR-RT protocol (six clusters) for mouse oxidative stress and antioxidant defense pathways. On the 84 genes investigated, 64 (76.2%) had statistic significant expression and 20 (23.8%) showed no statistical difference to the control group. From these 64 significantly expressed genes, 60 (93.7%) were up-regulated and 04 (6.3%) were down-regulated. From the group with no statistical significantly expression, 12 genes were up-regulated and 8 genes were down-regulated. Surprisingly, 37 (44.04%) showed a higher than threefold up-regulation and then arbitrarily the values was considered as a very significant. Thus, 37 genes (44.04%) were expressed very significantly up-regulated. The remained 47 (55.9%) genes were up-regulated less than three folds (35 genes - 41.6%) or down-regulated less than three folds (12 genes - 14.3%). The intestinal ischemia and reperfusion promote a global hyper-expression profile of six different clusters genes related to antioxidant defense and oxidative stress.

  4. An X-ray method for detecting substructure in galaxy clusters - Application to Perseus, A2256, Centaurus, Coma, and Sersic 40/6

    NASA Technical Reports Server (NTRS)

    Mohr, Joseph J.; Fabricant, Daniel G.; Geller, Margaret J.

    1993-01-01

    We use the moments of the X-ray surface brightness distribution to constrain the dynamical state of a galaxy cluster. Using X-ray observations from the Einstein Observatory IPC, we measure the first moment FM, the ellipsoidal orientation angle, and the axial ratio at a sequence of radii in the cluster. We argue that a significant variation in the image centroid FM as a function of radius is evidence for a nonequilibrium feature in the intracluster medium (ICM) density distribution. In simple terms, centroid shifts indicate that the center of mass of the ICM varies with radius. This variation is a tracer of continuing dynamical evolution. For each cluster, we evaluate the significance of variations in the centroid of the IPC image by computing the same statistics on an ensemble of simulated cluster images. In producing these simulated images we include X-ray point source emission, telescope vignetting, Poisson noise, and characteristics of the IPC. Application of this new method to five Abell clusters reveals that the core of each one has significant substructure. In addition, we find significant variations in the orientation angle and the axial ratio for several of the clusters.

  5. Design of partially supervised classifiers for multispectral image data

    NASA Technical Reports Server (NTRS)

    Jeon, Byeungwoo; Landgrebe, David

    1993-01-01

    A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.

  6. Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

    DOE PAGES

    Old, L.; Wojtak, R.; Pearce, F. R.; ...

    2017-12-20

    With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

  7. Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Old, L.; Wojtak, R.; Pearce, F. R.

    With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

  8. DENBRAN: A basic program for a significance test for multivariate normality of clusters from branching patterns in dendrograms

    NASA Astrophysics Data System (ADS)

    Sneath, P. H. A.

    A BASIC program is presented for significance tests to determine whether a dendrogram is derived from clustering of points that belong to a single multivariate normal distribution. The significance tests are based on statistics of the Kolmogorov—Smirnov type, obtained by comparing the observed cumulative graph of branch levels with a graph for the hypothesis of multivariate normality. The program also permits testing whether the dendrogram could be from a cluster of lower dimensionality due to character correlations. The program makes provision for three similarity coefficients, (1) Euclidean distances, (2) squared Euclidean distances, and (3) Simple Matching Coefficients, and for five cluster methods (1) WPGMA, (2) UPGMA, (3) Single Linkage (or Minimum Spanning Trees), (4) Complete Linkage, and (5) Ward's Increase in Sums of Squares. The program is entitled DENBRAN.

  9. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  10. A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.

    PubMed

    Tango, Toshiro; Takahashi, Kunihiko

    2012-12-30

    Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan. Copyright © 2012 John Wiley & Sons, Ltd.

  11. A Measurement of CMB Cluster Lensing with SPT and DES Year 1 Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baxter, E.J.; et al.

    2017-08-03

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. We detect lensing of the CMB by the galaxy clusters at 6.5more » $$\\sigma$$ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly $$20\\%$$ precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentering.« less

  12. Cluster mass inference via random field theory.

    PubMed

    Zhang, Hui; Nichols, Thomas E; Johnson, Timothy D

    2009-01-01

    Cluster extent and voxel intensity are two widely used statistics in neuroimaging inference. Cluster extent is sensitive to spatially extended signals while voxel intensity is better for intense but focal signals. In order to leverage strength from both statistics, several nonparametric permutation methods have been proposed to combine the two methods. Simulation studies have shown that of the different cluster permutation methods, the cluster mass statistic is generally the best. However, to date, there is no parametric cluster mass inference available. In this paper, we propose a cluster mass inference method based on random field theory (RFT). We develop this method for Gaussian images, evaluate it on Gaussian and Gaussianized t-statistic images and investigate its statistical properties via simulation studies and real data. Simulation results show that the method is valid under the null hypothesis and demonstrate that it can be more powerful than the cluster extent inference method. Further, analyses with a single subject and a group fMRI dataset demonstrate better power than traditional cluster size inference, and good accuracy relative to a gold-standard permutation test.

  13. Spatial Autocorrelation of Cancer Incidence in Saudi Arabia

    PubMed Central

    Al-Ahmadi, Khalid; Al-Zahrani, Ali

    2013-01-01

    Little is known about the geographic distribution of common cancers in Saudi Arabia. We explored the spatial incidence patterns of common cancers in Saudi Arabia using spatial autocorrelation analyses, employing the global Moran’s I and Anselin’s local Moran’s I statistics to detect nonrandom incidence patterns. Global ordinary least squares (OLS) regression and local geographically-weighted regression (GWR) were applied to examine the spatial correlation of cancer incidences at the city level. Population-based records of cancers diagnosed between 1998 and 2004 were used. Male lung cancer and female breast cancer exhibited positive statistically significant global Moran’s I index values, indicating a tendency toward clustering. The Anselin’s local Moran’s I analyses revealed small significant clusters of lung cancer, prostate cancer and Hodgkin’s disease among males in the Eastern region and significant clusters of thyroid cancers in females in the Eastern and Riyadh regions. Additionally, both regression methods found significant associations among various cancers. For example, OLS and GWR revealed significant spatial associations among NHL, leukemia and Hodgkin’s disease (r² = 0.49–0.67 using OLS and r² = 0.52–0.68 using GWR) and between breast and prostate cancer (r² = 0.53 OLS and 0.57 GWR) in Saudi Arabian cities. These findings may help to generate etiologic hypotheses of cancer causation and identify spatial anomalies in cancer incidence in Saudi Arabia. Our findings should stimulate further research on the possible causes underlying these clusters and associations. PMID:24351742

  14. Detection of Galaxy Cluster Motions with the Kinematic Sunyaev-Zel'dovich Effect

    NASA Technical Reports Server (NTRS)

    Hand, Nick; Addison, Graeme E.; Aubourg, Eric; Battaglia, Nick; Battistelli, Elia S.; Bizyaev, Dmitry; Bond, J. Richard; Brewington, Howard; Brinkmann, Jon; Brown, Benjamin R.; hide

    2012-01-01

    Using high-resolution microwave sky maps made by the Atacama Cosmology Telescope, we for the first time detect motions of galaxy clusters and groups via microwave background .temperature distortions due to the kinematic Sunyaev.Zel'dovich effect. Galaxy clusters are identified by their constituent luminous galaxies observed by the Baryon Oscillation Spectroscopic Survey, part of the Sloan Digital Sky Survey III. The mean pairwise momentum of clusters is measured. at a statistical. significance of 3.8 sigma, and the signal is consistent with the growth of cosmic structure in the standard model of cosmology

  15. MO-DE-207B-03: Improved Cancer Classification Using Patient-Specific Biological Pathway Information Via Gene Expression Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Young, M; Craft, D

    Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less

  16. A spatial scan statistic for nonisotropic two-level risk cluster.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2012-01-30

    Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.

  17. Application of Scan Statistics to Detect Suicide Clusters in Australia

    PubMed Central

    Cheung, Yee Tak Derek; Spittal, Matthew J.; Williamson, Michelle Kate; Tung, Sui Jay; Pirkis, Jane

    2013-01-01

    Background Suicide clustering occurs when multiple suicide incidents take place in a small area or/and within a short period of time. In spite of the multi-national research attention and particular efforts in preparing guidelines for tackling suicide clusters, the broader picture of epidemiology of suicide clustering remains unclear. This study aimed to develop techniques in using scan statistics to detect clusters, with the detection of suicide clusters in Australia as example. Methods and Findings Scan statistics was applied to detect clusters among suicides occurring between 2004 and 2008. Manipulation of parameter settings and change of area for scan statistics were performed to remedy shortcomings in existing methods. In total, 243 suicides out of 10,176 (2.4%) were identified as belonging to 15 suicide clusters. These clusters were mainly located in the Northern Territory, the northern part of Western Australia, and the northern part of Queensland. Among the 15 clusters, 4 (26.7%) were detected by both national and state cluster detections, 8 (53.3%) were only detected by the state cluster detection, and 3 (20%) were only detected by the national cluster detection. Conclusions These findings illustrate that the majority of spatial-temporal clusters of suicide were located in the inland northern areas, with socio-economic deprivation and higher proportions of indigenous people. Discrepancies between national and state/territory cluster detection by scan statistics were due to the contrast of the underlying suicide rates across states/territories. Performing both small-area and large-area analyses, and applying multiple parameter settings may yield the maximum benefits for exploring clusters. PMID:23342098

  18. A measurement of CMB cluster lensing with SPT and DES year 1 data

    DOE PAGES

    Baxter, E. J.; Raghunathan, S.; Crawford, T. M.; ...

    2018-02-09

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. The cluster catalog used in this analysis contains 3697 members with mean redshift ofmore » $$\\bar{z} = 0.45$$. We detect lensing of the CMB by the galaxy clusters at $$8.1\\sigma$$ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly $$17\\%$$ precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentering.« less

  19. A measurement of CMB cluster lensing with SPT and DES year 1 data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baxter, E. J.; Raghunathan, S.; Crawford, T. M.

    Clusters of galaxies gravitationally lens the cosmic microwave background (CMB) radiation, resulting in a distinct imprint in the CMB on arcminute scales. Measurement of this effect offers a promising way to constrain the masses of galaxy clusters, particularly those at high redshift. We use CMB maps from the South Pole Telescope Sunyaev-Zel'dovich (SZ) survey to measure the CMB lensing signal around galaxy clusters identified in optical imaging from first year observations of the Dark Energy Survey. The cluster catalog used in this analysis contains 3697 members with mean redshift ofmore » $$\\bar{z} = 0.45$$. We detect lensing of the CMB by the galaxy clusters at $$8.1\\sigma$$ significance. Using the measured lensing signal, we constrain the amplitude of the relation between cluster mass and optical richness to roughly $$17\\%$$ precision, finding good agreement with recent constraints obtained with galaxy lensing. The error budget is dominated by statistical noise but includes significant contributions from systematic biases due to the thermal SZ effect and cluster miscentering.« less

  20. Sertralilne, paroxetine and venlafaxine in refugee post traumatic stress disorder with depression symptoms.

    PubMed

    Smajkić, A; Weine, S; Durić-Bijedić, Z; Boskailo, E; Lewis, J; Pavković, I

    2001-01-01

    The authors describe the use of three new antidepressants: Sertralilne, Paroxetine and Venlafaxine in treating Posttraumatic Stress Disorder and symptoms of Depression in adult Bosnian refugees victims of ethnic cleansing. 32 Bosnian refugees with PTSD and symptoms of Depression presenting for treatment of the mental health consequences of surviving ethnic cleansing, participated in a case series study. All subjects completed open trials of Sertraline (15), Paroxetine (12) or Venlafaxine (5), with standard clinical doses. Overall, Sertraline and Paroxetine yielded statistically significant improvement at 6 weeks in the total PTSD symptom severity, in each symptom cluster, in Beck Depression Inventory and in Global Assessment of Functioning. Venlafaxine produced statistically significant improvement at 6 weeks in the total PTSD symptom severity, in each symptom cluster and in Global Assessment of Functioning but did not yield significant improvement in symptoms of depression and had a high rate of side effects.

  1. An Empirical Taxonomy of Hospital Governing Board Roles

    PubMed Central

    Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R

    2008-01-01

    Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260

  2. Cluster Headache Clinical Phenotypes: Tobacco Nonexposed (Never Smoker and No Parental Secondary Smoke Exposure as a Child) versus Tobacco-Exposed: Results from the United States Cluster Headache Survey.

    PubMed

    Rozen, Todd D

    2018-05-01

    To present results from the United States Cluster Headache Survey comparing the clinical presentation of tobacco nonexposed and tobacco-exposed cluster headache patients. Cluster headache is uniquely tied to a personal history of tobacco usage/cigarette smoking and, if the individual cluster headache sufferer did not smoke, it has been shown that their parent(s) typically did and that individual had significant secondary smoke exposure as a child. The true nontobacco exposed (no personal or secondary exposure) cluster headache sufferer has never been fully studied. The United States Cluster Headache Survey consisted of 187 multiple choice questions related to cluster headache including: patient demographics, clinical headache characteristics, family history, triggers, smoking history (personal and secondary), and headache-related disability. The survey was placed on a website from October through December 2008. One thousand one hundred thirty-four individuals completed the survey. One hundred thirty-three subjects or 12% of the surveyed population had no personal smoking/tobacco use history and no secondary smoke exposure as an infant/child, thus a nontobacco exposed population. In the nonexposed population, there were 87 males and 46 females with a gender ratio of 1.9:1. Episodic cluster headache occurred in 80% of nonexposed subjects. One thousand and one survey responders or 88% were tobacco-exposed (729 males and 272 females) with a gender ratio of 2.7:1. Eighty-three percent had a personal smoking history, while only 17% just had parents who smoked with secondary smoke exposure. Eighty-five percent of smokers had double exposure with a personal smoking history and secondary exposure as a child. Nonexposed cluster headache subjects are significantly more likely to develop cluster headache at ages 40 years and younger, while the exposed sufferers are significantly more likely to develop cluster headache at 40 years of age and older. Nonexposed patients have a statistically significant higher frequency of a migraine family history. The exposed population is statistically significantly more likely to have a history of head trauma 19% vs the nonexposed population 10% (P = .02). Tobacco exposed are significantly more likely to transition from episodic to chronic cluster headache (23% vs 14%, P = .02). Cranial autonomic symptoms as well as agitation are more common in tobacco exposed. Nonexposed are less likely to have specific cluster headache triggers. Exposed are significantly more likely to be triggered by alcohol. Tobacco exposed are significantly heavier caffeine users than nonexposed. Nonexposed are significantly more likely to have cluster headache cycles that vary throughout the year than exposed (52% vs 40%, P = .02). Exposed are much more likely to develop cluster headache from 12 am to 6 am than non exposed. Exposed experience significantly more frequent attacks per day and longer duration cycles than nonexposed. A significantly larger percent of the exposed population (57%) has suicidal ideations with their syndrome than nonexposed (43%) (P = .003). In regard to disability, both subtypes are disabled by their headaches, but exposed have more work related disability and lost home-days from headache. Both subgroups have a poor overall response to preventive and abortive medication outside of inhaled oxygen and injectable sumatriptan. Cluster headache sufferers who were never exposed to tobacco (personal or secondary as a child) appear to present uniquely compared to the tobacco exposed subgroup. The tobacco exposed clinical phenotype appears to have a more severe syndrome based on attack frequency, cycle duration, and headache related disability. Tobacco exposure is associated with cluster headache chronification. The nonexposed subtype appears to have an earlier age of onset, higher rate of familial migraine, and less circadian periodicity and daytime entrainment, suggesting a possible different underlying pathology than in the tobacco exposed sub-form. © 2018 American Headache Society.

  3. Calibrating First-Order Strong Lensing Mass Estimates in Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Reed, Brendan; Remolian, Juan; Sharon, Keren; Li, Nan; SPT Clusters Cooperation

    2018-01-01

    We investigate methods to reduce the statistical and systematic errors inherent to using the Einstein Radius as a first-order mass estimate in strong lensing galaxy clusters. By finding an empirical universal calibration function, we aim to enable a first-order mass estimate of large cluster data sets in a fraction of the time and effort of full-scale strong lensing mass modeling. We use 74 simulated cluster data from the Argonne National Laboratory in a lens redshift slice of [0.159, 0.667] with various source redshifts in the range of [1.23, 2.69]. From the simulated density maps, we calculate the exact mass enclosed within the Einstein Radius. We find that the mass inferred from the Einstein Radius alone produces an error width of ~39% with respect to the true mass. We explore an array of polynomial and exponential correction functions with dependence on cluster redshift and projected radii of the lensed images, aiming to reduce the statistical and systematic uncertainty. We find that the error on the the mass inferred from the Einstein Radius can be reduced significantly by using a universal correction function. Our study has implications for current and future large galaxy cluster surveys aiming to measure cluster mass, and the mass-concentration relation.

  4. Scalable Integrated Region-Based Image Retrieval Using IRM and Statistical Clustering.

    ERIC Educational Resources Information Center

    Wang, James Z.; Du, Yanping

    Statistical clustering is critical in designing scalable image retrieval systems. This paper presents a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images…

  5. Tsallis p⊥ distribution from statistical clusters

    NASA Astrophysics Data System (ADS)

    Bialas, A.

    2015-07-01

    It is shown that the transverse momentum distributions of particles emerging from the decay of statistical clusters, distributed according to a power law in their transverse energy, closely resemble those following from the Tsallis non-extensive statistical model. The experimental data are well reproduced with the cluster temperature T ≈ 160 MeV.

  6. Abundance gradients in cooling flow clusters: Ginga Large Area Counters and Einstein Solid State Spectrometer spectra of A496, A1795, A2142, and A2199

    NASA Technical Reports Server (NTRS)

    White, Raymond E., III; Day, C. S. R.; Hatsukade, Isamu; Hughes, John P.

    1994-01-01

    We analyze the Ginga Large Area Counters (LAC) and Einstein Solid State Spectrometer (SSS) spectra of four cooling flow clusters, A496, A1795, A2142, and A2199, each of which shows firm evidence of a relatively cool component. The inclusion of such cool spectral components in joint fits of SSS and LAC data leads to somewhat higher global temperatures than are derived from the high-energy LAC data alone. We find little evidence of cool emission outside the SSS field of view. Metal abundances appear to be centrally enhanced in all four clusters, with varying degrees of model dependence and statistical significance: the evidence is statistically strongest for A496 and A2142, somewhat weaker for A2199 and weakest for A1795. We also explore the model dependence in the amount of cold, X-ray-absorbing matter discovered in these clusters by White et al.

  7. Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

    The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less

  8. Formalized classification of moss litters in swampy spruce forests of intermontane depressions of Kuznetsk Alatau

    NASA Astrophysics Data System (ADS)

    Efremova, T. T.; Avrova, A. F.; Efremov, S. P.

    2016-09-01

    The approaches of multivariate statistics have been used for the numerical classification of morphogenetic types of moss litters in swampy spruce forests according to their physicochemical properties (the ash content, decomposition degree, bulk density, pH, mass, and thickness). Three clusters of moss litters— peat, peaty, and high-ash peaty—have been specified. The functions of classification for identification of new objects have been calculated and evaluated. The degree of decomposition and the ash content are the main classification parameters of litters, though all other characteristics are also statistically significant. The final prediction accuracy of the assignment of a litter to a particular cluster is 86%. Two leading factors participating in the clustering of litters have been determined. The first factor—the degree of transformation of plant remains (quality)—specifies 49% of the total variance, and the second factor—the accumulation rate (quantity)— specifies 26% of the total variance. The morphogenetic structure and physicochemical properties of the clusters of moss litters are characterized.

  9. Testing Numerical Models of Cool Core Galaxy Cluster Formation with X-Ray Observations

    NASA Astrophysics Data System (ADS)

    Henning, Jason W.; Gantner, Brennan; Burns, Jack O.; Hallman, Eric J.

    2009-12-01

    Using archival Chandra and ROSAT data along with numerical simulations, we compare the properties of cool core and non-cool core galaxy clusters, paying particular attention to the region beyond the cluster cores. With the use of single and double β-models, we demonstrate a statistically significant difference in the slopes of observed cluster surface brightness profiles while the cluster cores remain indistinguishable between the two cluster types. Additionally, through the use of hardness ratio profiles, we find evidence suggesting cool core clusters are cooler beyond their cores than non-cool core clusters of comparable mass and temperature, both in observed and simulated clusters. The similarities between real and simulated clusters supports a model presented in earlier work by the authors describing differing merger histories between cool core and non-cool core clusters. Discrepancies between real and simulated clusters will inform upcoming numerical models and simulations as to new ways to incorporate feedback in these systems.

  10. [Visual field progression in glaucoma: cluster analysis].

    PubMed

    Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M

    2012-11-01

    Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  11. Case-control geographic clustering for residential histories accounting for risk factors and covariates

    PubMed Central

    2006-01-01

    Background Methods for analyzing space-time variation in risk in case-control studies typically ignore residential mobility. We develop an approach for analyzing case-control data for mobile individuals and apply it to study bladder cancer in 11 counties in southeastern Michigan. At this time data collection is incomplete and no inferences should be drawn – we analyze these data to demonstrate the novel methods. Global, local and focused clustering of residential histories for 219 cases and 437 controls is quantified using time-dependent nearest neighbor relationships. Business address histories for 268 industries that release known or suspected bladder cancer carcinogens are analyzed. A logistic model accounting for smoking, gender, age, race and education specifies the probability of being a case, and is incorporated into the cluster randomization procedures. Sensitivity of clustering to definition of the proximity metric is assessed for 1 to 75 k nearest neighbors. Results Global clustering is partly explained by the covariates but remains statistically significant at 12 of the 14 levels of k considered. After accounting for the covariates 26 Local clusters are found in Lapeer, Ingham, Oakland and Jackson counties, with the clusters in Ingham and Oakland counties appearing in 1950 and persisting to the present. Statistically significant focused clusters are found about the business address histories of 22 industries located in Oakland (19 clusters), Ingham (2) and Jackson (1) counties. Clusters in central and southeastern Oakland County appear in the 1930's and persist to the present day. Conclusion These methods provide a systematic approach for evaluating a series of increasingly realistic alternative hypotheses regarding the sources of excess risk. So long as selection of cases and controls is population-based and not geographically biased, these tools can provide insights into geographic risk factors that were not specifically assessed in the case-control study design. PMID:16887016

  12. A Gender Bias Habit-Breaking Intervention Led to Increased Hiring of Female Faculty in STEMM Departments.

    PubMed

    Devine, Patricia G; Forscher, Patrick S; Cox, William T L; Kaatz, Anna; Sheridan, Jennifer; Carnes, Molly

    2017-11-01

    Addressing the underrepresentation of women in science is a top priority for many institutions, but the majority of efforts to increase representation of women are neither evidence-based nor rigorously assessed. One exception is the gender bias habit-breaking intervention (Carnes et al., 2015), which, in a cluster-randomized trial involving all but two departmental clusters ( N = 92) in the 6 STEMM focused schools/colleges at the University of Wisconsin - Madison, led to increases in gender bias awareness and self-efficacy to promote gender equity in academic science departments. Following this initial success, the present study compares, in a preregistered analysis, hiring rates of new female faculty pre- and post-manipulation. Whereas the proportion of women hired by control departments remained stable over time, the proportion of women hired by intervention departments increased by an estimated 18 percentage points ( OR = 2.23, d OR = 0.34). Though the preregistered analysis did not achieve conventional levels of statistical significance ( p < 0.07), our study has a hard upper limit on statistical power, as the cluster-randomized trial has a maximum sample size of 92 departmental clusters. These patterns have undeniable practical significance for the advancement of women in science, and provide promising evidence that psychological interventions can facilitate gender equity and diversity.

  13. Eye-gaze determination of user intent at the computer interface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldberg, J.H.; Schryver, J.C.

    1993-12-31

    Determination of user intent at the computer interface through eye-gaze monitoring can significantly aid applications for the disabled, as well as telerobotics and process control interfaces. Whereas current eye-gaze control applications are limited to object selection and x/y gazepoint tracking, a methodology was developed here to discriminate a more abstract interface operation: zooming-in or out. This methodology first collects samples of eve-gaze location looking at controlled stimuli, at 30 Hz, just prior to a user`s decision to zoom. The sample is broken into data frames, or temporal snapshots. Within a data frame, all spatial samples are connected into a minimummore » spanning tree, then clustered, according to user defined parameters. Each cluster is mapped to one in the prior data frame, and statistics are computed from each cluster. These characteristics include cluster size, position, and pupil size. A multiple discriminant analysis uses these statistics both within and between data frames to formulate optimal rules for assigning the observations into zooming, zoom-out, or no zoom conditions. The statistical procedure effectively generates heuristics for future assignments, based upon these variables. Future work will enhance the accuracy and precision of the modeling technique, and will empirically test users in controlled experiments.« less

  14. Intrinsic alignment in redMaPPer clusters – II. Radial alignment of satellites towards cluster centres

    DOE PAGES

    Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.; ...

    2017-11-23

    We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less

  15. Intrinsic alignment in redMaPPer clusters – II. Radial alignment of satellites towards cluster centres

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.

    We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less

  16. Effect of a new motorway on social-spatial patterning of road traffic accidents: A retrospective longitudinal natural experimental study

    PubMed Central

    Mitchell, Richard; Ogilvie, David

    2017-01-01

    Background The World Health Organisation reports that road traffic accidents (accidents) could become the seventh leading cause of death globally by 2030. Accidents often occur in spatial clusters and, generally, there are more accidents in less advantaged areas. Infrastructure changes, such as new roads, can affect the locations and magnitude of accident clusters but evidence of impact is lacking. A new 5-mile motorway extension was opened in 2011 in Glasgow, Scotland. Previous research found no impact on the number of accidents but did not consider their spatial location or socio-economic setting. We evaluated impacts on these, both locally and city-wide. Methods We used STATS19 data covering the period 2008 to 2014 and describing the location and details of all reported accidents involving a personal injury. Poisson-based continuous scan statistics were used to detect spatial clusters of accidents and any change in these over time. Change in the socio-economic distribution of accident cluster locations during the study period was also assessed. Results In each year accidents were strongly clustered, with statistically significant clusters more likely to occur in socio-economically deprived areas. There was no significant shift in the magnitude or location of accident clusters during motorway construction or following opening, either locally or city-wide. There was also no impact on the socio-economic patterning of accident cluster locations. Conclusions Although urban infrastructure changes occur constantly, all around the world, this is the first study to evaluate the impact of such changes on road accident clusters. Despite expectations to the contrary from both proponents and opponents of the M74 extension, we found no beneficial or adverse change in the socio-spatial distribution of accidents associated with its construction, opening or operation. Our approach and findings can help inform urban planning internationally. PMID:28880956

  17. Spatial clustering and local risk of leprosy in São Paulo, Brazil.

    PubMed

    Ramos, Antônio Carlos Vieira; Yamamura, Mellina; Arroyo, Luiz Henrique; Popolin, Marcela Paschoal; Chiaravalloti Neto, Francisco; Palha, Pedro Fredemir; Uchoa, Severina Alice da Costa; Pieri, Flávia Meneguetti; Pinto, Ione Carvalho; Fiorati, Regina Célia; Queiroz, Ana Angélica Rêgo de; Belchior, Aylana de Souza; Dos Santos, Danielle Talita; Garcia, Maria Concebida da Cunha; Crispim, Juliane de Almeida; Alves, Luana Seles; Berra, Thaís Zamboni; Arcêncio, Ricardo Alexandre

    2017-02-01

    Although the detection rate is decreasing, the proportion of new cases with WHO grade 2 disability (G2D) is increasing, creating concern among policy makers and the Brazilian government. This study aimed to identify spatial clustering of leprosy and classify high-risk areas in a major leprosy cluster using the SatScan method. Data were obtained including all leprosy cases diagnosed between January 2006 and December 2013. In addition to the clinical variable, information was also gathered regarding the G2D of the patient at diagnosis and after treatment. The Scan Spatial statistic test, developed by Kulldorff e Nagarwalla, was used to identify spatial clustering and to measure the local risk (Relative Risk-RR) of leprosy. Maps considering these risks and their confidence intervals were constructed. A total of 434 cases were identified, including 188 (43.31%) borderline leprosy and 101 (23.28%) lepromatous leprosy cases. There was a predominance of males, with ages ranging from 15 to 59 years, and 51 patients (11.75%) presented G2D. Two significant spatial clusters and three significant spatial-temporal clusters were also observed. The main spatial cluster (p = 0.000) contained 90 census tracts, a population of approximately 58,438 inhabitants, detection rate of 22.6 cases per 100,000 people and RR of approximately 3.41 (95%CI = 2.721-4.267). Regarding the spatial-temporal clusters, two clusters were observed, with RR ranging between 24.35 (95%CI = 11.133-52.984) and 15.24 (95%CI = 10.114-22.919). These findings could contribute to improvements in policies and programming, aiming for the eradication of leprosy in Brazil. The Spatial Scan statistic test was found to be an interesting resource for health managers and healthcare professionals to map the vulnerability of areas in terms of leprosy transmission risk and areas of underreporting.

  18. Effect of a new motorway on social-spatial patterning of road traffic accidents: A retrospective longitudinal natural experimental study.

    PubMed

    Olsen, Jonathan R; Mitchell, Richard; Ogilvie, David

    2017-01-01

    The World Health Organisation reports that road traffic accidents (accidents) could become the seventh leading cause of death globally by 2030. Accidents often occur in spatial clusters and, generally, there are more accidents in less advantaged areas. Infrastructure changes, such as new roads, can affect the locations and magnitude of accident clusters but evidence of impact is lacking. A new 5-mile motorway extension was opened in 2011 in Glasgow, Scotland. Previous research found no impact on the number of accidents but did not consider their spatial location or socio-economic setting. We evaluated impacts on these, both locally and city-wide. We used STATS19 data covering the period 2008 to 2014 and describing the location and details of all reported accidents involving a personal injury. Poisson-based continuous scan statistics were used to detect spatial clusters of accidents and any change in these over time. Change in the socio-economic distribution of accident cluster locations during the study period was also assessed. In each year accidents were strongly clustered, with statistically significant clusters more likely to occur in socio-economically deprived areas. There was no significant shift in the magnitude or location of accident clusters during motorway construction or following opening, either locally or city-wide. There was also no impact on the socio-economic patterning of accident cluster locations. Although urban infrastructure changes occur constantly, all around the world, this is the first study to evaluate the impact of such changes on road accident clusters. Despite expectations to the contrary from both proponents and opponents of the M74 extension, we found no beneficial or adverse change in the socio-spatial distribution of accidents associated with its construction, opening or operation. Our approach and findings can help inform urban planning internationally.

  19. Spatial clustering by disease severity among reported Rocky Mountain spotted fever cases in the United States, 2001-2005.

    PubMed

    Adjemian, Jennifer Zipser; Krebs, John; Mandel, Eric; McQuiston, Jennifer

    2009-01-01

    Rocky Mountain spotted fever (RMSF) occurs throughout much of the United States, ranging in clinical severity from moderate to fatal infection. Yet, little is known about possible differences among severity levels across geographic locations. To identify significant spatial clusters of severe and non-severe disease, RMSF cases reported to Centers for Disease Control and Prevention (CDC) were geocoded by county and classified by severity level. The statistical software program SaTScan was used to detect significant spatial clusters. Of 4,533 RMSF cases reported, 1,089 hospitalizations (168 with complications) and 23 deaths occurred. Significant clusters of 6 deaths (P = 0.05, RR = 11.4) and 19 hospitalizations with complications (P = 0.02, RR = 3.45) were detected in southwestern Tennessee. Two geographic areas were identified in north-central North Carolina with unusually low rates of severity (P = 0.001, RR = 0.62 and P = 0.001, RR = 0.45, respectively). Of all hospitalizations, 20% were clustered in central Oklahoma (P = 0.02, RR = 1.43). Significant geographic differences in severity were observed, suggesting that biologic and/or anthropogenic factors may be impacting RMSF epidemiology in the United States.

  20. Significance tests for functional data with complex dependence structure.

    PubMed

    Staicu, Ana-Maria; Lahiri, Soumen N; Carroll, Raymond J

    2015-01-01

    We propose an L 2 -norm based global testing procedure for the null hypothesis that multiple group mean functions are equal, for functional data with complex dependence structure. Specifically, we consider the setting of functional data with a multilevel structure of the form groups-clusters or subjects-units, where the unit-level profiles are spatially correlated within the cluster, and the cluster-level data are independent. Orthogonal series expansions are used to approximate the group mean functions and the test statistic is estimated using the basis coefficients. The asymptotic null distribution of the test statistic is developed, under mild regularity conditions. To our knowledge this is the first work that studies hypothesis testing, when data have such complex multilevel functional and spatial structure. Two small-sample alternatives, including a novel block bootstrap for functional data, are proposed, and their performance is examined in simulation studies. The paper concludes with an illustration of a motivating experiment.

  1. Kappa statistic for clustered matched-pair data.

    PubMed

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  2. Spatial-temporal clustering of companion animal enteric syndrome: detection and investigation through the use of electronic medical records from participating private practices.

    PubMed

    Anholt, R M; Berezowski, J; Robertson, C; Stephen, C

    2015-09-01

    There is interest in the potential of companion animal surveillance to provide data to improve pet health and to provide early warning of environmental hazards to people. We implemented a companion animal surveillance system in Calgary, Alberta and the surrounding communities. Informatics technologies automatically extracted electronic medical records from participating veterinary practices and identified cases of enteric syndrome in the warehoused records. The data were analysed using time-series analyses and a retrospective space-time permutation scan statistic. We identified a seasonal pattern of reports of occurrences of enteric syndromes in companion animals and four statistically significant clusters of enteric syndrome cases. The cases within each cluster were examined and information about the animals involved (species, age, sex), their vaccination history, possible exposure or risk behaviour history, information about disease severity, and the aetiological diagnosis was collected. We then assessed whether the cases within the cluster were unusual and if they represented an animal or public health threat. There was often insufficient information recorded in the medical record to characterize the clusters by aetiology or exposures. Space-time analysis of companion animal enteric syndrome cases found evidence of clustering. Collection of more epidemiologically relevant data would enhance the utility of practice-based companion animal surveillance.

  3. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

    PubMed Central

    Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi

    2017-01-01

    Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986

  4. Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

    PubMed

    Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

    2017-01-01

    Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

  5. Global, local and focused geographic clustering for case-control data with residential histories

    PubMed Central

    Jacquez, Geoffrey M; Kaufmann, Andy; Meliker, Jaymie; Goovaerts, Pierre; AvRuskin, Gillian; Nriagu, Jerome

    2005-01-01

    Background This paper introduces a new approach for evaluating clustering in case-control data that accounts for residential histories. Although many statistics have been proposed for assessing local, focused and global clustering in health outcomes, few, if any, exist for evaluating clusters when individuals are mobile. Methods Local, global and focused tests for residential histories are developed based on sets of matrices of nearest neighbor relationships that reflect the changing topology of cases and controls. Exposure traces are defined that account for the latency between exposure and disease manifestation, and that use exposure windows whose duration may vary. Several of the methods so derived are applied to evaluate clustering of residential histories in a case-control study of bladder cancer in south eastern Michigan. These data are still being collected and the analysis is conducted for demonstration purposes only. Results Statistically significant clustering of residential histories of cases was found but is likely due to delayed reporting of cases by one of the hospitals participating in the study. Conclusion Data with residential histories are preferable when causative exposures and disease latencies occur on a long enough time span that human mobility matters. To analyze such data, methods are needed that take residential histories into account. PMID:15784151

  6. A metric to search for relevant words

    NASA Astrophysics Data System (ADS)

    Zhou, Hongding; Slater, Gary W.

    2003-11-01

    We propose a new metric to evaluate and rank the relevance of words in a text. The method uses the density fluctuations of a word to compute an index that measures its degree of clustering. Highly significant words tend to form clusters, while common words are essentially uniformly spread in a text. If a word is not rare, the metric is stable when we move any individual occurrence of this word in the text. Furthermore, we prove that the metric always increases when words are moved to form larger clusters, or when several independent documents are merged. Using the Holy Bible as an example, we show that our approach reduces the significance of common words when compared to a recently proposed statistical metric.

  7. Hydrometeor classification through statistical clustering of polarimetric radar measurements: a semi-supervised approach

    NASA Astrophysics Data System (ADS)

    Besic, Nikola; Ventura, Jordi Figueras i.; Grazioli, Jacopo; Gabella, Marco; Germann, Urs; Berne, Alexis

    2016-09-01

    Polarimetric radar-based hydrometeor classification is the procedure of identifying different types of hydrometeors by exploiting polarimetric radar observations. The main drawback of the existing supervised classification methods, mostly based on fuzzy logic, is a significant dependency on a presumed electromagnetic behaviour of different hydrometeor types. Namely, the results of the classification largely rely upon the quality of scattering simulations. When it comes to the unsupervised approach, it lacks the constraints related to the hydrometeor microphysics. The idea of the proposed method is to compensate for these drawbacks by combining the two approaches in a way that microphysical hypotheses can, to a degree, adjust the content of the classes obtained statistically from the observations. This is done by means of an iterative approach, performed offline, which, in a statistical framework, examines clustered representative polarimetric observations by comparing them to the presumed polarimetric properties of each hydrometeor class. Aside from comparing, a routine alters the content of clusters by encouraging further statistical clustering in case of non-identification. By merging all identified clusters, the multi-dimensional polarimetric signatures of various hydrometeor types are obtained for each of the studied representative datasets, i.e. for each radar system of interest. These are depicted by sets of centroids which are then employed in operational labelling of different hydrometeors. The method has been applied on three C-band datasets, each acquired by different operational radar from the MeteoSwiss Rad4Alp network, as well as on two X-band datasets acquired by two research mobile radars. The results are discussed through a comparative analysis which includes a corresponding supervised and unsupervised approach, emphasising the operational potential of the proposed method.

  8. Canine parvovirus in Australia: the role of socio-economic factors in disease clusters.

    PubMed

    Brady, S; Norris, J M; Kelman, M; Ward, M P

    2012-08-01

    To identify clusters of canine parvoviral related disease occurring in Australia during 2010 and investigate the role of socio-economic factors contributing to these clusters, reported cases of canine parvovirus were extracted from an on-line disease surveillance system. Reported residential postcode was used to locate cases, and clusters were identified using a scan statistic. Cases included in clusters were compared to those not included in such clusters with respect to human socioeconomic factors (postcode area relative socioeconomic disadvantage, economic resources, education and occupation) and dog factors (neuter status, breed, age, gender, vaccination status). During 2010, there were 1187 cases of canine parvovirus reported. Nineteen significant (P<0.05) disease clusters were identified, most commonly located in New South Wales. Eleven (58%) clusters occurred between April and July, and the average cluster length was 5.7 days. All clusters occurred in postcodes with a significantly (P<0.05) greater level of relative socioeconomic disadvantage and a lower rank in education and occupation, and it was noted that clustered cases were less likely to have been neutered (P=0.004). No significant difference (P>0.05) was found between cases reported from cluster postcodes and those not within clusters for dog age, gender, breed or vaccination status (although the latter needs to be interpreted with caution, since vaccination was absent in most of the cases). Further research is required to investigate the apparent association between indicators of poor socioeconomic status and clusters of reported canine parvovirus diseases; however these initial findings may be useful for developing geographically- and temporally-targeted prevention and disease control programs. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. Clustering of 3D-Structure Similarity Based Network of Secondary Metabolites Reveals Their Relationships with Biological Activities.

    PubMed

    Ohtana, Yuki; Abdullah, Azian Azamimi; Altaf-Ul-Amin, Md; Huang, Ming; Ono, Naoaki; Sato, Tetsuo; Sugiura, Tadao; Horai, Hisayuki; Nakamura, Yukiko; Morita Hirai, Aki; Lange, Klaus W; Kibinge, Nelson K; Katsuragi, Tetsuo; Shirai, Tsuyoshi; Kanaya, Shigehiko

    2014-12-01

    Developing database systems connecting diverse species based on omics is the most important theme in big data biology. To attain this purpose, we have developed KNApSAcK Family Databases, which are utilized in a number of researches in metabolomics. In the present study, we have developed a network-based approach to analyze relationships between 3D structure and biological activity of metabolites consisting of four steps as follows: construction of a network of metabolites based on structural similarity (Step 1), classification of metabolites into structure groups (Step 2), assessment of statistically significant relations between structure groups and biological activities (Step 3), and 2-dimensional clustering of the constructed data matrix based on statistically significant relations between structure groups and biological activities (Step 4). Applying this method to a data set consisting of 2072 secondary metabolites and 140 biological activities reported in KNApSAcK Metabolite Activity DB, we obtained 983 statistically significant structure group-biological activity pairs. As a whole, we systematically analyzed the relationship between 3D-chemical structures of metabolites and biological activities. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Real- and redshift-space halo clustering in f(R) cosmologies

    NASA Astrophysics Data System (ADS)

    Arnalte-Mur, Pablo; Hellwing, Wojciech A.; Norberg, Peder

    2017-05-01

    We present two-point correlation function statistics of the mass and the haloes in the chameleon f(R) modified gravity scenario using a series of large-volume N-body simulations. Three distinct variations of f(R) are considered (F4, F5 and F6) and compared to a fiducial Λ cold dark matter (ΛCDM) model in the redshift range z ∈ [0, 1]. We find that the matter clustering is indistinguishable for all models except for F4, which shows a significantly steeper slope. The ratio of the redshift- to real-space correlation function at scales >20 h-1 Mpc agrees with the linear General Relativity (GR) Kaiser formula for the viable f(R) models considered. We consider three halo populations characterized by spatial abundances comparable to that of luminous red galaxies and galaxy clusters. The redshift-space halo correlation functions of F4 and F5 deviate significantly from ΛCDM at intermediate and high redshift, as the f(R) halo bias is smaller than or equal to that of the ΛCDM case. Finally, we introduce a new model-independent clustering statistic to distinguish f(R) from GR: the relative halo clustering ratio - R. The sampling required to adequately reduce the scatter in R will be available with the advent of the next-generation galaxy redshift surveys. This will foster a prospective avenue to obtain largely model-independent cosmological constraints on this class of modified gravity models.

  11. Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

    PubMed

    Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

    2016-01-01

    Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  12. Population Genomics and the Statistical Values of Race: An Interdisciplinary Perspective on the Biological Classification of Human Populations and Implications for Clinical Genetic Epidemiological Research

    PubMed Central

    Maglo, Koffi N.; Mersha, Tesfaye B.; Martin, Lisa J.

    2016-01-01

    The biological status and biomedical significance of the concept of race as applied to humans continue to be contentious issues despite the use of advanced statistical and clustering methods to determine continental ancestry. It is thus imperative for researchers to understand the limitations as well as potential uses of the concept of race in biology and biomedicine. This paper deals with the theoretical assumptions behind cluster analysis in human population genomics. Adopting an interdisciplinary approach, it demonstrates that the hypothesis that attributes the clustering of human populations to “frictional” effects of landform barriers at continental boundaries is empirically incoherent. It then contrasts the scientific status of the “cluster” and “cline” constructs in human population genomics, and shows how cluster may be instrumentally produced. It also shows how statistical values of race vindicate Darwin's argument that race is evolutionarily meaningless. Finally, the paper explains why, due to spatiotemporal parameters, evolutionary forces, and socio-cultural factors influencing population structure, continental ancestry may be pragmatically relevant to global and public health genomics. Overall, this work demonstrates that, from a biological systematic and evolutionary taxonomical perspective, human races/continental groups or clusters have no natural meaning or objective biological reality. In fact, the utility of racial categorizations in research and in clinics can be explained by spatiotemporal parameters, socio-cultural factors, and evolutionary forces affecting disease causation and treatment response. PMID:26925096

  13. White matter alterations in college football players: a longitudinal diffusion tensor imaging study.

    PubMed

    Mayinger, Michael Christian; Merchant-Borna, Kian; Hufschmidt, Jakob; Muehlmann, Marc; Weir, Isabelle Ruth; Rauchmann, Boris-Stephan; Shenton, Martha Elizabeth; Koerte, Inga Katharina; Bazarian, Jeffrey John

    2018-02-01

    The aim of this study was to evaluate longitudinal changes in the diffusion characteristics of brain white matter (WM) in collegiate athletes at three time points: prior to the start of the football season (T1), after one season of football (T2), followed by six months of no-contact rest (T3). Fifteen male collegiate football players and 5 male non-athlete student controls underwent diffusion MR imaging and computerized cognitive testing at all three timepoints. Whole-brain tract-based spatial statistics (TBSS) were used to compare fractional anisotropy (FA), radial diffusivity (RD), axial diffusivity (AD), and trace between all timepoints. Average diffusion values were obtained from statistically significant clusters for each individual. No athlete suffered a concussion during the study period. After one season of play (T1 to T2), we observed a significant increase in trace in a cluster located in the brainstem and left temporal lobe, and a significant increase in FA in the left parietal lobe. After six months of no-contact rest (T2 to T3), there was a significant decrease in trace and FA in clusters that were partially overlapping or in close proximity with the initial clusters (T1 to T2), with no significant changes from T1 to T3. Repetitive head impacts (RHI) sustained during a single football season may result in alterations of the brain's WM in collegiate football players. These changes appear to return to baseline after 6 months of no-contact rest, suggesting remission of WM alterations. Our preliminary results suggest that collegiate football players might benefit from periods without exposure to RHI.

  14. Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.

    PubMed

    You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary

    2011-02-01

    The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure of relative efficiency might be less than the measure in the literature under some conditions, underestimating the relative efficiency. The relative efficiency of unequal versus equal cluster sizes defined using the noncentrality parameter suggests a sample size approach that is a flexible alternative and a useful complement to existing methods.

  15. Sensitivity and Specificity of Interictal EEG-fMRI for Detecting the Ictal Onset Zone at Different Statistical Thresholds

    PubMed Central

    Tousseyn, Simon; Dupont, Patrick; Goffin, Karolien; Sunaert, Stefan; Van Paesschen, Wim

    2014-01-01

    There is currently a lack of knowledge about electroencephalography (EEG)-functional magnetic resonance imaging (fMRI) specificity. Our aim was to define sensitivity and specificity of blood oxygen level dependent (BOLD) responses to interictal epileptic spikes during EEG-fMRI for detecting the ictal onset zone (IOZ). We studied 21 refractory focal epilepsy patients who had a well-defined IOZ after a full presurgical evaluation and interictal spikes during EEG-fMRI. Areas of spike-related BOLD changes overlapping the IOZ in patients were considered as true positives; if no overlap was found, they were treated as false-negatives. Matched healthy case-controls had undergone similar EEG-fMRI in order to determine true-negative and false-positive fractions. The spike-related regressor of the patient was used in the design matrix of the healthy case-control. Suprathreshold BOLD changes in the brain of controls were considered as false positives, absence of these changes as true negatives. Sensitivity and specificity were calculated for different statistical thresholds at the voxel level combined with different cluster size thresholds and represented in receiver operating characteristic (ROC)-curves. Additionally, we calculated the ROC-curves based on the cluster containing the maximal significant activation. We achieved a combination of 100% specificity and 62% sensitivity, using a Z-threshold in the interval 3.4–3.5 and cluster size threshold of 350 voxels. We could obtain higher sensitivity at the expense of specificity. Similar performance was found when using the cluster containing the maximal significant activation. Our data provide a guideline for different EEG-fMRI settings with their respective sensitivity and specificity for detecting the IOZ. The unique cluster containing the maximal significant BOLD activation was a sensitive and specific marker of the IOZ. PMID:25101049

  16. Accurate Modeling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron; Scoccimarro, Roman

    2015-01-01

    The large-scale distribution of galaxies can be explained fairly simply by assuming (i) a cosmological model, which determines the dark matter halo distribution, and (ii) a simple connection between galaxies and the halos they inhabit. This conceptually simple framework, called the halo model, has been remarkably successful at reproducing the clustering of galaxies on all scales, as observed in various galaxy redshift surveys. However, none of these previous studies have carefully modeled the systematics and thus truly tested the halo model in a statistically rigorous sense. We present a new accurate and fully numerical halo model framework and test it against clustering measurements from two luminosity samples of galaxies drawn from the SDSS DR7. We show that the simple ΛCDM cosmology + halo model is not able to simultaneously reproduce the galaxy projected correlation function and the group multiplicity function. In particular, the more luminous sample shows significant tension with theory. We discuss the implications of our findings and how this work paves the way for constraining galaxy formation by accurate simultaneous modeling of multiple galaxy clustering statistics.

  17. Genotyping and spatial analysis of pulmonary tuberculosis and diabetes cases in the state of Veracruz, Mexico.

    PubMed

    Blanco-Guillot, Francles; Castañeda-Cediel, M Lucía; Cruz-Hervert, Pablo; Ferreyra-Reyes, Leticia; Delgado-Sánchez, Guadalupe; Ferreira-Guerrero, Elizabeth; Montero-Campos, Rogelio; Bobadilla-Del-Valle, Miriam; Martínez-Gamboa, Rosa Areli; Torres-González, Pedro; Téllez-Vazquez, Norma; Canizales-Quintero, Sergio; Yanes-Lane, Mercedes; Mongua-Rodríguez, Norma; Ponce-de-León, Alfredo; Sifuentes-Osornio, José; García-García, Lourdes

    2018-01-01

    Genotyping and georeferencing in tuberculosis (TB) have been used to characterize the distribution of the disease and occurrence of transmission within specific groups and communities. The objective of this study was to test the hypothesis that diabetes mellitus (DM) and pulmonary TB may occur in spatial and molecular aggregations. Retrospective cohort study of patients with pulmonary TB. The study area included 12 municipalities in the Sanitary Jurisdiction of Orizaba, Veracruz, México. Patients with acid-fast bacilli in sputum smears and/or Mycobacterium tuberculosis in sputum cultures were recruited from 1995 to 2010. Clinical (standardized questionnaire, physical examination, chest X-ray, blood glucose test and HIV test), microbiological, epidemiological, and molecular evaluations were carried out. Patients were considered "genotype-clustered" if two or more isolates from different patients were identified within 12 months of each other and had six or more IS6110 bands in an identical pattern, or < 6 bands with identical IS6110 RFLP patterns and spoligotype with the same spacer oligonucleotides. Residential and health care centers addresses were georeferenced. We used a Jeep hand GPS. The coordinates were transferred from the GPS files to ArcGIS using ArcMap 9.3. We evaluated global spatial aggregation of patients in IS6110-RFLP/ spoligotype clusters using global Moran´s I. Since global distribution was not random, we evaluated "hotspots" using Getis-Ord Gi* statistic. Using bivariate and multivariate analysis we analyzed sociodemographic, behavioral, clinic and bacteriological conditions associated with "hotspots". We used STATA® v13.1 for all statistical analysis. From 1995 to 2010, 1,370 patients >20 years were diagnosed with pulmonary TB; 33% had DM. The proportion of isolates that were genotyped was 80.7% (n = 1105), of which 31% (n = 342) were grouped in 91 genotype clusters with 2 to 23 patients each; 65.9% of total clusters were small (2 members) involving 35.08% of patients. Twenty three (22.7) percent of cases were classified as recent transmission. Moran`s I indicated that distribution of patients in IS6110-RFLP/spoligotype clusters was not random (Moran`s I = 0.035468, Z value = 7.0, p = 0.00). Local spatial analysis showed statistically significant spatial aggregation of patients in IS6110-RFLP/spoligotype clusters identifying "hotspots" and "coldspots". GI* statistic showed that the hotspot for spatial clustering was located in Camerino Z. Mendoza municipality; 14.6% (50/342) of patients in genotype clusters were located in a hotspot; of these, 60% (30/50) lived with DM. Using logistic regression the statistically significant variables associated with hotspots were: DM [adjusted Odds Ratio (aOR) 7.04, 95% Confidence interval (CI) 3.03-16.38] and attending the health center in Camerino Z. Mendoza (aOR18.04, 95% CI 7.35-44.28). The combination of molecular and epidemiological information with geospatial data allowed us to identify the concurrence of molecular clustering and spatial aggregation of patients with DM and TB. This information may be highly useful for TB control programs.

  18. Space-Time Analysis of Testicular Cancer Clusters Using Residential Histories: A Case-Control Study in Denmark

    PubMed Central

    Sloan, Chantel D.; Nordsborg, Rikke B.; Jacquez, Geoffrey M.; Raaschou-Nielsen, Ole; Meliker, Jaymie R.

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID:25756204

  19. Space-time analysis of testicular cancer clusters using residential histories: a case-control study in Denmark.

    PubMed

    Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population.

  20. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  1. Incidence trend and risk factors for campylobacter infections in humans in Norway

    PubMed Central

    Sandberg, Marianne; Nygård, Karin; Meldal, Hege; Valle, Paul Steinar; Kruse, Hilde; Skjerve, Eystein

    2006-01-01

    Background The objectives of the study were to evaluate whether the increase in incidence of campylobacteriosis observed in humans in Norway from 1995 to 2001 was statistically significant and whether different biologically plausible risk factors were associated with the incidence of campylobacteriosis in the different counties in Norway. Methods To model the incidence of domestically acquired campylobacteriosis from 1995 to 2001, a population average random effect poisson model was applied (the trend model). To case data and assumed risk-factor/protective data such as sale of chicken, receiving treated drinking water, density of dogs and grazing animals, occupation of people in the municipalities and climatic factors from 2000 and 2001, an equivalent model accounting for geographical clustering was applied (the ecological model). Results The increase in incidence of campylobacteriosis in humans in Norway from 1995 to 2001 was statistically significant from 1998. Treated water was a protective factor against Campylobacter infections in humans with an IRR of 0.78 per percentage increase in people supplied. The two-level modelling technique showed no evidence of clustering of campylobacteriosis in any particular county. Aggregation of data on municipality level makes interpretation of the results at the individual level difficult. Conclusion The increase in incidence of Campylobacter infections in humans from 1995 to 2001 was statistically significant from 1998. Treated water was a protective factor against Campylobacter infections in humans with an IRR of 0.78 per percentage increase in people supplied. Campylobacter infections did not appear to be clustered in any particular county in Norway. PMID:16827925

  2. Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

    PubMed Central

    Goovaerts, Pierre; Jacquez, Geoffrey M

    2004-01-01

    Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930

  3. DICON: interactive visual analysis of multidimensional clusters.

    PubMed

    Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin

    2011-12-01

    Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE

  4. Performance analysis of clustering techniques over microarray data: A case study

    NASA Astrophysics Data System (ADS)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  5. Use of multivariate statistics to identify unreliable data obtained using CASA.

    PubMed

    Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón

    2013-06-01

    In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.

  6. Neuropsychological assessment of decision making in alcohol-dependent commercial pilots.

    PubMed

    Georgemiller, Randy; Machizawa, Sayaka; Young, Kathleen M; Martin, Cynthia N

    2013-09-01

    The aim of this exploratory archival study was to discern the utility of the Iowa Gambling Task (IGT) in identifying adaptive decision-making capacities among pilots with a history of alcohol dependence both with and without Cluster B personality features. Participants included 18 male airmen at the rank of captain with a history of receiving alcohol dependence treatment and subsequent referral for a fitness-for-duty evaluation. Data from prior comprehensive neuropsychological evaluations conducted in a private practice setting at the mandate of the FAA utilizing criteria outlined in the HIMS program was used. ANOVA was conducted to compare pilots with (N = 4) and without Cluster B personality features (N = 14) on measures of decisionmaking capacities, intelligence, and executive functioning. Pilots with Cluster B personality features were found to have a significantly lower Total Net T-Score on IGT (M = 35.00, SD = 9.27) than pilots without features of Cluster B (M = 56.36, SD = 9.55). Furthermore, with the exception of the first 20 cards (i.e., Net 1); the groups significantly differed in their Net scores. No statistically significant difference was found on airmen's intelligence and executive functioning. The present study found that alcohol-dependent airmen with Cluster B personality features evidenced significantly poorer decisionmaking capacities as measured by the ICT in comparison to alcohol dependent airman without Cluster B personality features. Implications and limitations of the study are discussed.

  7. Multi-particle correlations in transverse momenta from statistical clusters

    NASA Astrophysics Data System (ADS)

    Bialas, Andrzej; Bzdak, Adam

    2016-09-01

    We evaluate n-particle (n = 2 , 3 , 4 , 5) transverse momentum correlations for pions and kaons following from the decay of statistical clusters. These correlation functions could provide strong constraints on a possible existence of thermal clusters in the process of particle production.

  8. Statistical uncertainty of extreme wind storms over Europe derived from a probabilistic clustering technique

    NASA Astrophysics Data System (ADS)

    Walz, Michael; Leckebusch, Gregor C.

    2016-04-01

    Extratropical wind storms pose one of the most dangerous and loss intensive natural hazards for Europe. However, due to only 50 years of high quality observational data, it is difficult to assess the statistical uncertainty of these sparse events just based on observations. Over the last decade seasonal ensemble forecasts have become indispensable in quantifying the uncertainty of weather prediction on seasonal timescales. In this study seasonal forecasts are used in a climatological context: By making use of the up to 51 ensemble members, a broad and physically consistent statistical base can be created. This base can then be used to assess the statistical uncertainty of extreme wind storm occurrence more accurately. In order to determine the statistical uncertainty of storms with different paths of progression, a probabilistic clustering approach using regression mixture models is used to objectively assign storm tracks (either based on core pressure or on extreme wind speeds) to different clusters. The advantage of this technique is that the entire lifetime of a storm is considered for the clustering algorithm. Quadratic curves are found to describe the storm tracks most accurately. Three main clusters (diagonal, horizontal or vertical progression of the storm track) can be identified, each of which have their own particulate features. Basic storm features like average velocity and duration are calculated and compared for each cluster. The main benefit of this clustering technique, however, is to evaluate if the clusters show different degrees of uncertainty, e.g. more (less) spread for tracks approaching Europe horizontally (diagonally). This statistical uncertainty is compared for different seasonal forecast products.

  9. Health-risk behaviour in Croatia.

    PubMed

    Bécue-Bertaut, Mónica; Kern, Josipa; Hernández-Maldonado, Maria-Luisa; Juresa, Vesna; Vuletic, Silvije

    2008-02-01

    To identify the health-risk behaviour of various homogeneous clusters of individuals. The study was conducted in 13 of the 20 Croatian counties and in Zagreb, the Croatian capital. In the first stage, general practices were selected in each county. The second-stage sample was created by drawing a random subsample of 10% of the patients registered at each selected general practice. The sample was divided into seven homogenous clusters using statistical methodology, combining multiple factor analysis with a hybrid clustering method. Seven homogeneous clusters were identified, three composed of males and four composed of females, based on statistically significant differences between selected characteristics (P<0.001). Although, in general, self-assessed health declined with age, significant variations were observed within specific age intervals. Higher levels of self-assessed health were associated with higher levels of education and/or socio-economic status. Many individuals, especially females, who self-reported poor health were heavy consumers of sleeping pills. Males and females reported different health-risk behaviours related to lifestyle, diet and use of the healthcare system. Heavy alcohol and tobacco use, unhealthy diet, risky physical activity and non-use of the healthcare system influenced self-assessed health in males. Females were slightly less satisfied with their health than males of the same age and educational level. Even highly educated females who took preventive healthcare tests and ate a healthy diet reported a less satisfactory self-assessed level of health than expected. Sociodemographic characteristics, life style, self-assessed health and use of the healthcare system were used in the identification of seven homogeneous population clusters. A comprehensive analysis of these clusters suggests health-related prevention and intervention efforts geared towards specific populations.

  10. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    PubMed

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Targeting regional pediatric congenital hearing loss using a spatial scan statistic.

    PubMed

    Bush, Matthew L; Christian, Warren Jay; Bianchi, Kristin; Lester, Cathy; Schoenberg, Nancy

    2015-01-01

    Congenital hearing loss is a common problem, and timely identification and intervention are paramount for language development. Patients from rural regions may have many barriers to timely diagnosis and intervention. The purpose of this study was to examine the spatial and hospital-based distribution of failed infant hearing screening testing and pediatric congenital hearing loss throughout Kentucky. Data on live births and audiological reporting of infant hearing loss results in Kentucky from 2009 to 2011 were analyzed. The authors used spatial scan statistics to identify high-rate clusters of failed newborn screening tests and permanent congenital hearing loss (PCHL), based on the total number of live births per county. The authors conducted further analyses on PCHL and failed newborn hearing screening tests, based on birth hospital data and method of screening. The authors observed four statistically significant (p < 0.05) high-rate clusters with failed newborn hearing screenings in Kentucky, including two in the Appalachian region. Hospitals using two-stage otoacoustic emission testing demonstrated higher rates of failed screening (p = 0.009) than those using two-stage automated auditory brainstem response testing. A significant cluster of high rate of PCHL was observed in Western Kentucky. Five of the 54 birthing hospitals were found to have higher relative risk of PCHL, and two of those hospitals are located in a very rural region of Western Kentucky within the cluster. This spatial analysis in children in Kentucky has identified specific regions throughout the state with high rates of congenital hearing loss and failed newborn hearing screening tests. Further investigation regarding causative factors is warranted. This method of analysis can be useful in the setting of hearing health disparities to focus efforts on regions facing high incidence of congenital hearing loss.

  12. Precipitation Cluster Distributions: Current Climate Storm Statistics and Projected Changes Under Global Warming

    NASA Astrophysics Data System (ADS)

    Quinn, Kevin Martin

    The total amount of precipitation integrated across a precipitation cluster (contiguous precipitating grid cells exceeding a minimum rain rate) is a useful measure of the aggregate size of the disturbance, expressed as the rate of water mass lost or latent heat released, i.e. the power of the disturbance. Probability distributions of cluster power are examined during boreal summer (May-September) and winter (January-March) using satellite-retrieved rain rates from the Tropical Rainfall Measuring Mission (TRMM) 3B42 and Special Sensor Microwave Imager and Sounder (SSM/I and SSMIS) programs, model output from the High Resolution Atmospheric Model (HIRAM, roughly 0.25-0.5 0 resolution), seven 1-2° resolution members of the Coupled Model Intercomparison Project Phase 5 (CMIP5) experiment, and National Center for Atmospheric Research Large Ensemble (NCAR LENS). Spatial distributions of precipitation-weighted centroids are also investigated in observations (TRMM-3B42) and climate models during winter as a metric for changes in mid-latitude storm tracks. Observed probability distributions for both seasons are scale-free from the smallest clusters up to a cutoff scale at high cluster power, after which the probability density drops rapidly. When low rain rates are excluded by choosing a minimum rain rate threshold in defining clusters, the models accurately reproduce observed cluster power statistics and winter storm tracks. Changes in behavior in the tail of the distribution, above the cutoff, are important for impacts since these quantify the frequency of the most powerful storms. End-of-century cluster power distributions and storm track locations are investigated in these models under a "business as usual" global warming scenario. The probability of high cluster power events increases by end-of-century across all models, by up to an order of magnitude for the highest-power events for which statistics can be computed. For the three models in the suite with continuous time series of high resolution output, there is substantial variability on when these probability increases for the most powerful precipitation clusters become detectable, ranging from detectable within the observational period to statistically significant trends emerging only after 2050. A similar analysis of National Centers for Environmental Prediction (NCEP) Reanalysis 2 and SSM/I-SSMIS rain rate retrievals in the recent observational record does not yield reliable evidence of trends in high-power cluster probabilities at this time. Large impacts to mid-latitude storm tracks are projected over the West Coast and eastern North America, with no less than 8 of the 9 models examined showing large increases by end-of-century in the probability density of the most powerful storms, ranging up to a factor of 6.5 in the highest range bin for which historical statistics are computed. However, within these regional domains, there is considerable variation among models in pinpointing exactly where the largest increases will occur.

  13. Clinical Study of the 3D-Master Color System among the Spanish Population.

    PubMed

    Gómez-Polo, Cristina; Gómez-Polo, Miguel; Martínez Vázquez de Parga, Juan Antonio; Celemín-Viñuela, Alicia

    2017-01-12

    To study whether the shades of the 3D-Master System were grouped and represented in the chromatic space according to the three-color coordinates of value, chroma, and hue. Maxillary central incisor color was measured on tooth surfaces through the Easyshade Compact spectrophotometer using 1361 participants aged between 16 and 89. The natural (not bleached teeth) color of the middle thirds was registered in the 3D-Master System nomenclature and in the CIELCh system. Principal component analysis and cluster analysis were applied. 75 colors of the 3D-Master System were found. The statistical analysis revealed the existence of 5 cluster groups. The centroid, the average of the 75 samples, in relation to lightness (L*) was 74.64, 22.87 for chroma (C*), and 88.85 for hue (h*). All of the clusters, except cluster 3, showed significant statistical differences with the centroid for the three-color coordinates (p <0.001). The results of this study indicated that 75 shades in the 3D-Master System were grouped into 5 clusters following coordinates L*, C*, and h* resulting from the dental spectrophotometer Vita Easyshade compact. The shades that composed each cluster did not belong to the same lightness color dimension groups. There was no special uniform chromatic distribution among the colors of the 3D-Master System. © 2017 by the American College of Prosthodontists.

  14. Clustering of fast-food restaurants around schools: a novel application of spatial statistics to the study of food environments.

    PubMed

    Austin, S Bryn; Melly, Steven J; Sanchez, Brisa N; Patel, Aarti; Buka, Stephen; Gortmaker, Steven L

    2005-09-01

    We examined the concentration of fast food restaurants in areas proximal to schools to characterize school neighborhood food environments. We used geocoded databases of restaurant and school addresses to examine locational patterns of fast-food restaurants and kindergartens and primary and secondary schools in Chicago. We used the bivariate K function statistical method to quantify the degree of clustering (spatial dependence) of fast-food restaurants around school locations. The median distance from any school in Chicago to the nearest fast-food restaurant was 0.52 km, a distance that an adult can walk in little more than 5 minutes, and 78% of schools had at least 1 fast-food restaurant within 800 m. Fast-food restaurants were statistically significantly clustered in areas within a short walking distance from schools, with an estimated 3 to 4 times as many fast-food restaurants within 1.5 km from schools than would be expected if the restaurants were distributed throughout the city in a way unrelated to school locations. Fast-food restaurants are concentrated within a short walking distance from schools, exposing children to poor-quality food environments in their school neighborhoods.

  15. Clustering of Fast-Food Restaurants Around Schools: A Novel Application of Spatial Statistics to the Study of Food Environments

    PubMed Central

    Austin, S. Bryn; Melly, Steven J.; Sanchez, Brisa N.; Patel, Aarti; Buka, Stephen; Gortmaker, Steven L.

    2005-01-01

    Objectives. We examined the concentration of fast food restaurants in areas proximal to schools to characterize school neighborhood food environments. Methods. We used geocoded databases of restaurant and school addresses to examine locational patterns of fast-food restaurants and kindergartens and primary and secondary schools in Chicago. We used the bivariate K function statistical method to quantify the degree of clustering (spatial dependence) of fast-food restaurants around school locations. Results. The median distance from any school in Chicago to the nearest fast-food restaurant was 0.52 km, a distance that an adult can walk in little more than 5 minutes, and 78% of schools had at least 1 fast-food restaurant within 800 m. Fast-food restaurants were statistically significantly clustered in areas within a short walking distance from schools, with an estimated 3 to 4 times as many fast-food restaurants within 1.5 km from schools than would be expected if the restaurants were distributed throughout the city in a way unrelated to school locations. Conclusions. Fast-food restaurants are concentrated within a short walking distance from schools, exposing children to poor-quality food environments in their school neighborhoods. PMID:16118369

  16. Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?

    PubMed

    Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J

    2018-04-01

    Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. Finding Statistically Significant Communities in Networks

    PubMed Central

    Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo

    2011-01-01

    Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480

  18. A geographic analysis of individual and environmental risk factors for hypospadias births

    PubMed Central

    Winston, Jennifer J; Meyer, Robert E; Emch, Michael E

    2014-01-01

    Background Hypospadias is a relatively common birth defect affecting the male urinary tract. We explored the etiology of hypospadias by examining its spatial distribution in North Carolina and the spatial clustering of residuals from individual and environmental risk factors. Methods We used data collected by the North Carolina Birth Defects Monitoring Program from 2003-2005 to estimate local Moran's I statistics to identify geographic clustering of overall and severe hypospadias, using 995 overall cases and 16,013 controls. We conducted logistic regression and local Moran's I statistics on standardized residuals to consider the contribution of individual variables (maternal age, maternal race/ethnicity, maternal education, smoking, parity, and diabetes) and environmental variables (block group land cover) to this clustering. Results Local Moran's I statistics indicated significant clustering of overall and severe hypospadias in eastern central North Carolina. Spatial clustering of hypospadias persisted when controlling for individual factors, but diminished somewhat when controlling for environmental factors. In adjusted models, maternal residence in a block group with more than 5% crop cover was associated with overall hypospadias (OR = 1.22; 95% CI = 1.04 – 1.43); that is living in a block group with greater than 5% crop cover was associated with a 22% increase in the odds of having a baby with hypospadias. Land cover was not associated with severe hypospadias. Conclusions This study illustrates the potential contribution of mapping in generating hypotheses about disease etiology. Results suggest that environmental factors including proximity to agriculture may play some role in the spatial distribution of hypospadias. PMID:25196538

  19. Use of Spatial Epidemiology and Hot Spot Analysis to Target Women Eligible for Prenatal Women, Infants, and Children Services

    PubMed Central

    Krawczyk, Christopher; Gradziel, Pat; Geraghty, Estella M.

    2014-01-01

    Objectives. We used a geographic information system and cluster analyses to determine locations in need of enhanced Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) Program services. Methods. We linked documented births in the 2010 California Birth Statistical Master File with the 2010 data from the WIC Integrated Statewide Information System. Analyses focused on the density of pregnant women who were eligible for but not receiving WIC services in California’s 7049 census tracts. We used incremental spatial autocorrelation and hot spot analyses to identify clusters of WIC-eligible nonparticipants. Results. We detected clusters of census tracts with higher-than-expected densities, compared with the state mean density of WIC-eligible nonparticipants, in 21 of 58 (36.2%) California counties (P < .05). In subsequent county-level analyses, we located neighborhood-level clusters of higher-than-expected densities of eligible nonparticipants in Sacramento, San Francisco, Fresno, and Los Angeles Counties (P < .05). Conclusions. Hot spot analyses provided a rigorous and objective approach to determine the locations of statistically significant clusters of WIC-eligible nonparticipants. Results helped inform WIC program and funding decisions, including the opening of new WIC centers, and offered a novel approach for targeting public health services. PMID:24354821

  20. Cluster Detection Tests in Spatial Epidemiology: A Global Indicator for Performance Assessment

    PubMed Central

    Guttmann, Aline; Li, Xinran; Feschet, Fabien; Gaudart, Jean; Demongeot, Jacques; Boire, Jean-Yves; Ouchchane, Lemlih

    2015-01-01

    In cluster detection of disease, the use of local cluster detection tests (CDTs) is current. These methods aim both at locating likely clusters and testing for their statistical significance. New or improved CDTs are regularly proposed to epidemiologists and must be subjected to performance assessment. Because location accuracy has to be considered, performance assessment goes beyond the raw estimation of type I or II errors. As no consensus exists for performance evaluations, heterogeneous methods are used, and therefore studies are rarely comparable. A global indicator of performance, which assesses both spatial accuracy and usual power, would facilitate the exploration of CDTs behaviour and help between-studies comparisons. The Tanimoto coefficient (TC) is a well-known measure of similarity that can assess location accuracy but only for one detected cluster. In a simulation study, performance is measured for many tests. From the TC, we here propose two statistics, the averaged TC and the cumulated TC, as indicators able to provide a global overview of CDTs performance for both usual power and location accuracy. We evidence the properties of these two indicators and the superiority of the cumulated TC to assess performance. We tested these indicators to conduct a systematic spatial assessment displayed through performance maps. PMID:26086911

  1. Counts-in-cylinders in the Sloan Digital Sky Survey with Comparisons to N-body Simulations

    NASA Astrophysics Data System (ADS)

    Berrier, Heather D.; Barton, Elizabeth J.; Berrier, Joel C.; Bullock, James S.; Zentner, Andrew R.; Wechsler, Risa H.

    2011-01-01

    Environmental statistics provide a necessary means of comparing the properties of galaxies in different environments, and a vital test of models of galaxy formation within the prevailing hierarchical cosmological model. We explore counts-in-cylinders, a common statistic defined as the number of companions of a particular galaxy found within a given projected radius and redshift interval. Galaxy distributions with the same two-point correlation functions do not necessarily have the same companion count distributions. We use this statistic to examine the environments of galaxies in the Sloan Digital Sky Survey Data Release 4 (SDSS DR4). We also make preliminary comparisons to four models for the spatial distributions of galaxies, based on N-body simulations and data from SDSS DR4, to study the utility of the counts-in-cylinders statistic. There is a very large scatter between the number of companions a galaxy has and the mass of its parent dark matter halo and the halo occupation, limiting the utility of this statistic for certain kinds of environmental studies. We also show that prevalent empirical models of galaxy clustering, that match observed two- and three-point clustering statistics well, fail to reproduce some aspects of the observed distribution of counts-in-cylinders on 1, 3, and 6 h -1 Mpc scales. All models that we explore underpredict the fraction of galaxies with few or no companions in 3 and 6 h -1 Mpc cylinders. Roughly 7% of galaxies in the real universe are significantly more isolated within a 6 h -1 Mpc cylinder than the galaxies in any of the models we use. Simple phenomenological models that map galaxies to dark matter halos fail to reproduce high-order clustering statistics in low-density environments.

  2. Spatio-Temporal Analysis of Smear-Positive Tuberculosis in the Sidama Zone, Southern Ethiopia

    PubMed Central

    Dangisso, Mesay Hailu; Datiko, Daniel Gemechu; Lindtjørn, Bernt

    2015-01-01

    Background Tuberculosis (TB) is a disease of public health concern, with a varying distribution across settings depending on socio-economic status, HIV burden, availability and performance of the health system. Ethiopia is a country with a high burden of TB, with regional variations in TB case notification rates (CNRs). However, TB program reports are often compiled and reported at higher administrative units that do not show the burden at lower units, so there is limited information about the spatial distribution of the disease. We therefore aim to assess the spatial distribution and presence of the spatio-temporal clustering of the disease in different geographic settings over 10 years in the Sidama Zone in southern Ethiopia. Methods A retrospective space–time and spatial analysis were carried out at the kebele level (the lowest administrative unit within a district) to identify spatial and space-time clusters of smear-positive pulmonary TB (PTB). Scan statistics, Global Moran’s I, and Getis and Ordi (Gi*) statistics were all used to help analyze the spatial distribution and clusters of the disease across settings. Results A total of 22,545 smear-positive PTB cases notified over 10 years were used for spatial analysis. In a purely spatial analysis, we identified the most likely cluster of smear-positive PTB in 192 kebeles in eight districts (RR= 2, p<0.001), with 12,155 observed and 8,668 expected cases. The Gi* statistic also identified the clusters in the same areas, and the spatial clusters showed stability in most areas in each year during the study period. The space-time analysis also detected the most likely cluster in 193 kebeles in the same eight districts (RR= 1.92, p<0.001), with 7,584 observed and 4,738 expected cases in 2003-2012. Conclusion The study found variations in CNRs and significant spatio-temporal clusters of smear-positive PTB in the Sidama Zone. The findings can be used to guide TB control programs to devise effective TB control strategies for the geographic areas characterized by the highest CNRs. Further studies are required to understand the factors associated with clustering based on individual level locations and investigation of cases. PMID:26030162

  3. Spatio-Temporal Clustering of Monitoring Network

    NASA Astrophysics Data System (ADS)

    Hussain, I.; Pilz, J.

    2009-04-01

    Pakistan has much diversity in seasonal variation of different locations. Some areas are in desserts and remain very hot and waterless, for example coastal areas are situated along the Arabian Sea and have very warm season and a little rainfall. Some areas are covered with mountains, have very low temperature and heavy rainfall; for instance Karakoram ranges. The most important variables that have an impact on the climate are temperature, precipitation, humidity, wind speed and elevation. Furthermore, it is hard to find homogeneous regions in Pakistan with respect to climate variation. Identification of homogeneous regions in Pakistan can be useful in many aspects. It can be helpful for prediction of the climate in the sub-regions and for optimizing the number of monitoring sites. In the earlier literature no one tried to identify homogeneous regions of Pakistan with respect to climate variation. There are only a few papers about spatio-temporal clustering of monitoring network. Steinhaus (1956) presented the well-known K-means clustering method. It can identify a predefined number of clusters by iteratively assigning centriods to clusters based. Castro et al. (1997) developed a genetic heuristic algorithm to solve medoids based clustering. Their method is based on genetic recombination upon random assorting recombination. The suggested method is appropriate for clustering the attributes which have genetic characteristics. Sap and Awan (2005) presented a robust weighted kernel K-means algorithm incorporating spatial constraints for clustering climate data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data. Soltani and Modarres (2006) used hierarchical and divisive cluster analysis to categorize patterns of rainfall in Iran. They only considered rainfall at twenty-eight monitoring sites and concluded that eight clusters existed. Soltani and Modarres (2006) classified the sites by using only average rainfall of sites, they did not consider time replications and spatial coordinates. Kerby et.al (2007) purposed spatial clustering method based on likelihood. They took account of the geographic locations through the variance covariance matrix. Their purposed method works like hierarchical clustering methods. Moreovere, it is inappropiriate for time replication data and could not perform well for large number of sites. Tuia.et.al (2008) used scan statistics for identifying spatio-temporal clusters for fire sequences in the Tuscany region in Italy. The scan statistics clustering method was developed by Kulldorff et al. (1997) to detect spatio-temporal clusters in epidemiology and assessing their significance. The purposed scan statistics method is used only for univariate discrete stochastic random variables. In this paper we make use of a very simple approach for spatio-temporal clustering which can create separable and homogeneous clusters. Most of the clustering methods are based on Euclidean distances. It is well known that geographic coordinates are spherical coordinates and estimating Euclidean distances from spherical coordinates is inappropriate. As a transformation from geographic coordinates to rectangular (D-plane) coordinates we use the Lambert projection method. The partition around medoids clustering method is incorporated on the data including D-plane coordinates. Ordinary kriging is taken as validity measure for the precipitation data. The kriging results for clusters are more accurate and have less variation compared to complete monitoring network precipitation data. References Casto.V.E and Murray.A.T (1997). Spatial Clustering with Data Mining with Genetic Algorithms. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.8573 Kaufman.L and Rousseeuw.P.J (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley series of Probability and Mathematical Statistics, New York. Kulldorf.M (1997). A spatial scan statistic. Commun. Stat.-Theor. Math. 26(6), 1481-1496 Kerby. A , Marx. D, Samal. A and Adamchuck. V. (2007). Spatial Clustering Using the Likelihood Function. Seventh IEEE International Conference on Data Mining - Workshops Steinhaus.H (1956). Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci., C1. III vol IV:801- 804 Snyder, J. P. (1987). Map Projection: A Working Manual. U. S. Geological Survey Professional Paper 1395. Washington, DC: U. S. Government Printing Office, pp. 104-110 Sap.M.N and Awan. A.M (2005). Finding Spatio-Temporal Patterns in Climate Data Using Clustering. Proceedings of the International Conference on Cyberworlds (CW'05) Soltani.S and Modarres.R (2006). Classification of Spatio -Temporal Pattern of Rainfall in Iran: Using Hierarchical and Divisive Cluster Analysis. Journal of Spatial Hydrology Vol.6, No.2 Tuia.D, Ratle.F, Lasaponara.R, Telesca.L and Kanevski.M (2008). Scan Statistics Analysis for Forest Fire Clusters. Commun. in Nonlinear science and numerical simulation 13,1689-1694.

  4. Spatial Differentiation of Landscape Values in the Murray River Region of Victoria, Australia

    NASA Astrophysics Data System (ADS)

    Zhu, Xuan; Pfueller, Sharron; Whitelaw, Paul; Winter, Caroline

    2010-05-01

    This research advances the understanding of the location of perceived landscape values through a statistically based approach to spatial analysis of value densities. Survey data were obtained from a sample of people living in and using the Murray River region, Australia, where declining environmental quality prompted a reevaluation of its conservation status. When densities of 12 perceived landscape values were mapped using geographic information systems (GIS), valued places clustered along the entire river bank and in associated National/State Parks and reserves. While simple density mapping revealed high value densities in various locations, it did not indicate what density of a landscape value could be regarded as a statistically significant hotspot or distinguish whether overlapping areas of high density for different values indicate identical or adjacent locations. A spatial statistic Getis-Ord Gi* was used to indicate statistically significant spatial clusters of high value densities or “hotspots”. Of 251 hotspots, 40% were for single non-use values, primarily spiritual, therapeutic or intrinsic. Four hotspots had 11 landscape values. Two, lacking economic value, were located in ecologically important river red gum forests and two, lacking wilderness value, were near the major towns of Echuca-Moama and Albury-Wodonga. Hotspots for eight values showed statistically significant associations with another value. There were high associations between learning and heritage values while economic and biological diversity values showed moderate associations with several other direct and indirect use values. This approach may improve confidence in the interpretation of spatial analysis of landscape values by enhancing understanding of value relationships.

  5. Geographical distribution patterns of iodine in drinking-water and its associations with geological factors in Shandong Province, China.

    PubMed

    Gao, Jie; Zhang, Zhijie; Hu, Yi; Bian, Jianchao; Jiang, Wen; Wang, Xiaoming; Sun, Liqian; Jiang, Qingwu

    2014-05-19

    County-based spatial distribution characteristics and the related geological factors for iodine in drinking-water were studied in Shandong Province (China). Spatial autocorrelation analysis and spatial scan statistic were applied to analyze the spatial characteristics. Generalized linear models (GLMs) and geographically weighted regression (GWR) studies were conducted to explore the relationship between water iodine level and its related geological factors. The spatial distribution of iodine in drinking-water was significantly heterogeneous in Shandong Province (Moran's I = 0.52, Z = 7.4, p < 0.001). Two clusters for high iodine in drinking-water were identified in the south-western and north-western parts of Shandong Province by the purely spatial scan statistic approach. Both GLMs and GWR indicated a significantly global association between iodine in drinking-water and geological factors. Furthermore, GWR showed obviously spatial variability across the study region. Soil type and distance to Yellow River were statistically significant at most areas of Shandong Province, confirming the hypothesis that the Yellow River causes iodine deposits in Shandong Province. Our results suggested that the more effective regional monitoring plan and water improvement strategies should be strengthened targeting at the cluster areas based on the characteristics of geological factors and the spatial variability of local relationships between iodine in drinking-water and geological factors.

  6. Assessment of the climatic potential for tourism in Iran through biometeorology clustering.

    PubMed

    Roshan, Gholamreza; Yousefi, Robabe; Błażejczyk, Krzysztof

    2018-04-01

    This study presents a spatiotemporal analysis of bioclimatic comfort conditions for Iran using mean daily meteorological data from 1995 to 2014, analyzed through Physiological Equivalent Temperature (PET) index and Universal Thermal Climate Index (UTCI) indices, and bioclimatic clustering. The results of this study demonstrate that due to the climate variability across Iran during the year, there is at any point in time a location with climatic condition suitable for tourism. Mean values demonstrate maxima in bioclimatic comfort indices for the country in late winter and spring and minima for summer. Seven statistically significant clusters in bioclimatic indices were identified. Comparing these with clustering performed on PET and UTCI, the maximum overlaps between the two indices. In the following, the outputs of this research showed that most appropriate bioclimatic clustering for Iran includes seven clusters. These clustering locations according to climatic suitability for tourism provide a valuable contribution to tourism management in the country, particularly through marketing destinations to maximize tourist flow.

  7. Modeling the Movement of Homicide by Type to Inform Public Health Prevention Efforts.

    PubMed

    Zeoli, April M; Grady, Sue; Pizarro, Jesenia M; Melde, Chris

    2015-10-01

    We modeled the spatiotemporal movement of hotspot clusters of homicide by motive in Newark, New Jersey, to investigate whether different homicide types have different patterns of clustering and movement. We obtained homicide data from the Newark Police Department Homicide Unit's investigative files from 1997 through 2007 (n = 560). We geocoded the address at which each homicide victim was found and recorded the date of and the motive for the homicide. We used cluster detection software to model the spatiotemporal movement of statistically significant homicide clusters by motive, using census tract and month of occurrence as the spatial and temporal units of analysis. Gang-motivated homicides showed evidence of clustering and diffusion through Newark. Additionally, gang-motivated homicide clusters overlapped to a degree with revenge and drug-motivated homicide clusters. Escalating dispute and nonintimate familial homicides clustered; however, there was no evidence of diffusion. Intimate partner and robbery homicides did not cluster. By tracking how homicide types diffuse through communities and determining which places have ongoing or emerging homicide problems by type, we can better inform the deployment of prevention and intervention efforts.

  8. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  9. Spatial distribution and cluster analysis of risky sexual behaviours and STDs reported by Chinese adults in Guangzhou, China: a representative population-based study

    PubMed Central

    Chen, Wen; Zhou, Fangjing; Hall, Brian J; Wang, Yu; Latkin, Carl; Ling, Li; Tucker, Joseph D

    2016-01-01

    Objectives To assess associations between residences location, risky sexual behaviours and sexually transmitted diseases (STDs) among adults living in Guangzhou, China. Methods Data were obtained from 751 Chinese adults aged 18–59 years in Guangzhou, China, using stratified random sampling by using spatial epidemiological methods. Face-to-face household interviews were conducted to collect self-report data on risky sexual behaviours and diagnosed STDs. Kulldorff’s spatial scan statistic was implemented to identify and detect spatial distribution and clusters of risky sexual behaviours and STDs. The presence and location of statistically significant clusters were mapped in the study areas using ArcGIS software. Results The prevalence of self-reported risky sexual behaviours was between 5.1% and 50.0%. The self-reported lifetime prevalence of diagnosed STDs was 7.06%. Anal intercourse clustered in an area located along the border within the rural–urban continuum (p=0.001). High rate clusters for alcohol or other drugs using before sex (p=0.008) and migrants who lived in Guangzhou <1 year (p=0.007) overlapped this cluster. Excess cases for unprotected sex (p=0.031) overlapped the cluster for college students (p<0.001). Five of nine (55.6%) students who had sexual experience during the last 12 months located in the cluster of unprotected sex. Conclusions Short-term migrants and college students reported greater risky sexual behaviours. Programmes to increase safer sex within these communities to reduce the risk of STDs are warranted in Guangzhou. Spatial analysis identified geographical clusters of risky sexual behaviours, which is critical for optimising surveillance and targeting control measures for these locations in the future. PMID:26843400

  10. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.

    PubMed

    Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi

    2015-01-01

    Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability--the basis of cluster generation--is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.

  11. Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

    PubMed Central

    Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2010-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138

  12. Is transgendered male androphilia familial in non-Western populations? The case of a Samoan village.

    PubMed

    Vanderlaan, Doug P; Vokey, John R; Vasey, Paul L

    2013-04-01

    In Western populations, male gender atypicality (i.e., cross-gender behavior and identity) and male androphilia (i.e., sexual attraction to adult males) tend to cluster in particular families. Here, we examined whether this familial clustering effect extended to non-Western populations by examining the genealogical relationships of 17 Samoan transgendered androphilic males, known locally as fa'afafine, who were born in the same rural Samoan village. Specifically, we compared the genealogies of these 17 fa'afafine and those of 17 age-matched comparison males born in the same village. In addition to familial clustering, we examined birth order, sibship sex ratio, and sibship size. The fa'afafine were significantly later born than the comparison males and clustered into five and 16 distinct lineages, respectively, which constituted a statistically significant degree of family clustering among the 17 fa'afafine. Hence, the present study indicated that transgendered male androphilia is familial in this particular Samoan village, thus adding to a growing literature demonstrating that male androphilia and gender atypicality have consistent developmental correlates across populations. Discussion focused on the possible bases of this familial clustering effect and directions for future research.

  13. Genotyping and spatial analysis of pulmonary tuberculosis and diabetes cases in the state of Veracruz, Mexico

    PubMed Central

    Blanco-Guillot, Francles; Ferreyra-Reyes, Leticia; Delgado-Sánchez, Guadalupe; Ferreira-Guerrero, Elizabeth; Montero-Campos, Rogelio; Bobadilla-del-Valle, Miriam; Martínez-Gamboa, Rosa Areli; Torres-González, Pedro; Téllez-Vazquez, Norma; Canizales-Quintero, Sergio; Yanes-Lane, Mercedes; Mongua-Rodríguez, Norma; Ponce-de-León, Alfredo; Sifuentes-Osornio, José

    2018-01-01

    Background Genotyping and georeferencing in tuberculosis (TB) have been used to characterize the distribution of the disease and occurrence of transmission within specific groups and communities. Objective The objective of this study was to test the hypothesis that diabetes mellitus (DM) and pulmonary TB may occur in spatial and molecular aggregations. Material and methods Retrospective cohort study of patients with pulmonary TB. The study area included 12 municipalities in the Sanitary Jurisdiction of Orizaba, Veracruz, México. Patients with acid-fast bacilli in sputum smears and/or Mycobacterium tuberculosis in sputum cultures were recruited from 1995 to 2010. Clinical (standardized questionnaire, physical examination, chest X-ray, blood glucose test and HIV test), microbiological, epidemiological, and molecular evaluations were carried out. Patients were considered “genotype-clustered” if two or more isolates from different patients were identified within 12 months of each other and had six or more IS6110 bands in an identical pattern, or < 6 bands with identical IS6110 RFLP patterns and spoligotype with the same spacer oligonucleotides. Residential and health care centers addresses were georeferenced. We used a Jeep hand GPS. The coordinates were transferred from the GPS files to ArcGIS using ArcMap 9.3. We evaluated global spatial aggregation of patients in IS6110-RFLP/ spoligotype clusters using global Moran´s I. Since global distribution was not random, we evaluated “hotspots” using Getis-Ord Gi* statistic. Using bivariate and multivariate analysis we analyzed sociodemographic, behavioral, clinic and bacteriological conditions associated with “hotspots”. We used STATA® v13.1 for all statistical analysis. Results From 1995 to 2010, 1,370 patients >20 years were diagnosed with pulmonary TB; 33% had DM. The proportion of isolates that were genotyped was 80.7% (n = 1105), of which 31% (n = 342) were grouped in 91 genotype clusters with 2 to 23 patients each; 65.9% of total clusters were small (2 members) involving 35.08% of patients. Twenty three (22.7) percent of cases were classified as recent transmission. Moran`s I indicated that distribution of patients in IS6110-RFLP/spoligotype clusters was not random (Moran`s I = 0.035468, Z value = 7.0, p = 0.00). Local spatial analysis showed statistically significant spatial aggregation of patients in IS6110-RFLP/spoligotype clusters identifying “hotspots” and “coldspots”. GI* statistic showed that the hotspot for spatial clustering was located in Camerino Z. Mendoza municipality; 14.6% (50/342) of patients in genotype clusters were located in a hotspot; of these, 60% (30/50) lived with DM. Using logistic regression the statistically significant variables associated with hotspots were: DM [adjusted Odds Ratio (aOR) 7.04, 95% Confidence interval (CI) 3.03–16.38] and attending the health center in Camerino Z. Mendoza (aOR18.04, 95% CI 7.35–44.28). Conclusions The combination of molecular and epidemiological information with geospatial data allowed us to identify the concurrence of molecular clustering and spatial aggregation of patients with DM and TB. This information may be highly useful for TB control programs. PMID:29534104

  14. Vaccines for preventing anthrax.

    PubMed

    Donegan, Sarah; Bellamy, Richard; Gamble, Carrol L

    2009-04-15

    Anthrax is a bacterial zoonosis that occasionally causes human disease and is potentially fatal. Anthrax vaccines include a live-attenuated vaccine, an alum-precipitated cell-free filtrate vaccine, and a recombinant protein vaccine. To evaluate the effectiveness, immunogenicity, and safety of vaccines for preventing anthrax. We searched the following databases (November 2008): Cochrane Infectious Diseases Group Specialized Register; CENTRAL (The Cochrane Library 2008, Issue 4); MEDLINE; EMBASE; LILACS; and mRCT. We also searched reference lists. We included randomized controlled trials (RCTs) of individuals and cluster-RCTs comparing anthrax vaccine with placebo, other (non-anthrax) vaccines, or no intervention; or comparing administration routes or treatment regimens of anthrax vaccine. Two authors independently considered trial eligibility, assessed risk of bias, and extracted data. We presented cases of anthrax and seroconversion rates using risk ratios (RR) and 95% confidence intervals (CI). We summarized immunoglobulin G (IgG) concentrations using geometric means. We carried out a sensitivity analysis to investigate the effect of clustering on the results from one cluster-RCT. No meta-analysis was undertaken. One cluster-RCT (with 157,259 participants) and four RCTs of individuals (1917 participants) met the inclusion criteria. The cluster-RCT from the former USSR showed that, compared with no vaccine, a live-attenuated vaccine (called STI) protected against clinical anthrax whether given by a needleless device (RR 0.16; 102,737 participants, 154 clusters) or the scarification method (RR 0.25; 104,496 participants, 151 clusters). Confidence intervals were statistically significant in unadjusted calculations, but when a small amount of association within clusters was assumed, the differences were not statistically significant. The four RCTs (of individuals) of inactivated vaccines (anthrax vaccine absorbed and recombinant protective antigen) showed a dose response relationship for the anti-protective antigen IgG antibody titre. Intramuscular administration was associated with fewer injection site reactions than subcutaneous injection, and injection site reaction rates were lower when the dosage interval was longer. One cluster-RCT provides limited evidence that a live-attenuated vaccine is effective in preventing cutaneous anthrax. Vaccines based on anthrax antigens are immunogenic in most vaccinees with few adverse events or reactions. Ongoing randomized controlled trials are investigating the immunogenicity and safety of anthrax vaccines.

  15. The balance between keystone clustering and bed roughness in experimental step-pool stabilization

    NASA Astrophysics Data System (ADS)

    Johnson, J. P.

    2016-12-01

    Predicting how mountain channels will respond to environmental perturbations such as floods requires an improved quantitative understanding of morphodynamic feedbacks among bed topography, surface grain size and sediment sorting. In boulder-rich gravel streams, transport and sorting often lead to the development of step pool morphologies, which are expressed both in bed topography and coarse grain clustering. Bed stability is difficult to measure, and is sometimes inferred from the presence of step pools. I use scaled flume experiments to explore feedbacks among surface grain sizes, coarse grain clustering, bed roughness and hydraulic roughness during progressive bed stabilization and over a range of sediment transport rates. While grain clusters are sometimes identified by subjective interpretation, I quantify the degree of coarse surface grain clustering using spatial statistics, including a novel normalization of Ripley's K function. This approach is objective and provides information on the strength of clustering over a range of length scales. Flume experiments start with an initial bed surface with a broad grain size distribution and spatially random positions. Flow causes the bed surface to progressively stabilize in response to erosion, surface coarsening, roughening and grain reorganization. At 95% confidence, many but not all beds stabilized with coarse grains becoming more clustered than complete spatial randomness (CSR). I observe a tradeoff between topographic roughness and clustering. Beds that stabilized with higher degrees of coarse-grain clustering were topographically smoother, and vice-versa. Initial conditions influenced the degree of clustering at stability: Beds that happened to have fewer initial coarse grains had more coarse grain reorganization during stabilization, leading to more clustering. Finally, regressions demonstrate that clustering statistics actually predict hydraulic roughness significantly better than does D84 (the size at which 84% of grains are smaller). In the experimental data, the spatial organization of surface grains is a stronger control on flow characteristics than the size of surface grains.

  16. Cluster analysis of fasciolosis in dairy cow herds in Munster province of Ireland and detection of major climatic and environmental predictors of the exposure risk.

    PubMed

    Selemetas, Nikolaos; Phelan, Paul; O'Kiely, Padraig; de Waal, Theo

    2015-03-19

    Fasciolosis caused by Fasciola hepatica is a widespread parasitic disease in cattle farms. The aim of this study was to detect clusters of fasciolosis in dairy cow herds in Munster Province, Ireland and to identify significant climatic and environmental predictors of the exposure risk. In total, 1,292 dairy herds across Munster was sampled in September 2012 providing a single bulk tank milk (BTM) sample. The analysis of samples by an in-house antibody-detection enzyme-linked immunosorbent assay (ELISA), showed that 65% of the dairy herds (n = 842) had been exposed to F. hepatica. Using the Getis-Ord Gi* statistic, 16 high-risk and 24 low-risk (P <0.01) clusters of fasciolosis were identified. The spatial distribution of high-risk clusters was more dispersed and mainly located in the northern and western regions of Munster compared to the low-risk clusters that were mostly concentrated in the southern and eastern regions. The most significant classes of variables that could reflect the difference between high-risk and low-risk clusters were the total number of wet-days and rain-days, rainfall, the normalized difference vegetation index (NDVI), temperature and soil type. There was a bigger proportion of well-drained soils among the low-risk clusters, whereas poorly drained soils were more common among the high-risk clusters. These results stress the role of precipitation, grazing, temperature and drainage on the life cycle of F. hepatica in the temperate Irish climate. The findings of this study highlight the importance of cluster analysis for identifying significant differences in climatic and environmental variables between high-risk and low-risk clusters of fasciolosis in Irish dairy herds.

  17. Overcoming the effects of false positives and threshold bias in graph theoretical analyses of neuroimaging data.

    PubMed

    Drakesmith, M; Caeyenberghs, K; Dutt, A; Lewis, G; David, A S; Jones, D K

    2015-09-01

    Graph theory (GT) is a powerful framework for quantifying topological features of neuroimaging-derived functional and structural networks. However, false positive (FP) connections arise frequently and influence the inferred topology of networks. Thresholding is often used to overcome this problem, but an appropriate threshold often relies on a priori assumptions, which will alter inferred network topologies. Four common network metrics (global efficiency, mean clustering coefficient, mean betweenness and smallworldness) were tested using a model tractography dataset. It was found that all four network metrics were significantly affected even by just one FP. Results also show that thresholding effectively dampens the impact of FPs, but at the expense of adding significant bias to network metrics. In a larger number (n=248) of tractography datasets, statistics were computed across random group permutations for a range of thresholds, revealing that statistics for network metrics varied significantly more than for non-network metrics (i.e., number of streamlines and number of edges). Varying degrees of network atrophy were introduced artificially to half the datasets, to test sensitivity to genuine group differences. For some network metrics, this atrophy was detected as significant (p<0.05, determined using permutation testing) only across a limited range of thresholds. We propose a multi-threshold permutation correction (MTPC) method, based on the cluster-enhanced permutation correction approach, to identify sustained significant effects across clusters of thresholds. This approach minimises requirements to determine a single threshold a priori. We demonstrate improved sensitivity of MTPC-corrected metrics to genuine group effects compared to an existing approach and demonstrate the use of MTPC on a previously published network analysis of tractography data derived from a clinical population. In conclusion, we show that there are large biases and instability induced by thresholding, making statistical comparisons of network metrics difficult. However, by testing for effects across multiple thresholds using MTPC, true group differences can be robustly identified. Copyright © 2015. Published by Elsevier Inc.

  18. Using exploratory data analysis to identify and predict patterns of human Lyme disease case clustering within a multistate region, 2010-2014.

    PubMed

    Hendricks, Brian; Mark-Carew, Miguella

    2017-02-01

    Lyme disease is the most commonly reported vectorborne disease in the United States. The objective of our study was to identify patterns of Lyme disease reporting after multistate inclusion to mitigate potential border effects. County-level human Lyme disease surveillance data were obtained from Kentucky, Maryland, Ohio, Pennsylvania, Virginia, and West Virginia state health departments. Rate smoothing and Local Moran's I was performed to identify clusters of reporting activity and identify spatial outliers. A logistic generalized estimating equation was performed to identify significant associations in disease clustering over time. Resulting analyses identified statistically significant (P=0.05) clusters of high reporting activity and trends over time. High reporting activity aggregated near border counties in high incidence states, while low reporting aggregated near shared county borders in non-high incidence states. Findings highlight the need for exploratory surveillance approaches to describe the extent to which state level reporting affects accurate estimation of Lyme disease progression. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Geographic Variation of Amyotrophic Lateral Sclerosis Incidence in New Jersey, 2009–2011

    PubMed Central

    Henry, Kevin A.; Fagliano, Jerald; Jordan, Heather M.; Rechtman, Lindsay; Kaye, Wendy E.

    2015-01-01

    Few analyses in the United States have examined geographic variation and socioeconomic disparities in amyotrophic lateral sclerosis (ALS) incidence, because of lack of population-based incidence data. In this analysis, we used population-based ALS data to identify whether ALS incidence clusters geographically and to determine whether ALS risk varies by area-based socioeconomic status (SES). This study included 493 incident ALS cases diagnosed (via El Escorial criteria) in New Jersey between 2009 and 2011. Geographic variation and clustering of ALS incidence was assessed using a spatial scan statistic and Bayesian geoadditive models. Poisson regression was used to estimate the associations between ALS risk and SES based on census-tract median income while controlling for age, sex, and race. ALS incidence varied across and within counties, but there were no statistically significant geographic clusters. SES was associated with ALS incidence. After adjustment for age, sex, and race, the relative risk of ALS was significantly higher (relative risk (RR) = 1.37, 95% confidence interval (CI): 1.02, 1.82) in the highest income quartile than in the lowest. The relative risk of ALS was significantly lower among blacks (RR = 0.57, 95% CI: 0.39, 0.83) and Asians (RR = 0.63, 95% CI: 0.41, 0.97) than among whites. Our findings suggest that ALS incidence in New Jersey appears to be associated with SES and race. PMID:26041711

  20. Sun protection at elementary schools: a cluster randomized trial.

    PubMed

    Hunter, Seft; Love-Jackson, Kymia; Abdulla, Rania; Zhu, Weiwei; Lee, Ji-Hyun; Wells, Kristen J; Roetzheim, Richard

    2010-04-07

    Elementary schools represent both a source of childhood sun exposure and a setting for educational interventions. Sun Protection of Florida's Children was a cluster randomized trial promoting hat use at (primary outcome) and outside of schools among fourth-grade students during August 8, 2006, through May 22, 2007. Twenty-two schools were randomly assigned to the intervention (1115 students) or control group (1376 students). Intervention schools received classroom sessions targeting sun protection attitudes and social norms. Each student attending an intervention school received two free wide-brimmed hats. Hat use at school was measured by direct observation and hat use outside of school was measured by self-report. A subgroup of 378 students (178 in the intervention group and 200 in the control group) underwent serial measurements of skin pigmentation to explore potential physiological effects of the intervention. Generalized linear mixed models were used to evaluate the intervention effect by accounting for the cluster randomized trial design. All P values were two-sided and were claimed as statistically significant at a level of .05. The percentage of students observed wearing hats at control schools remained essentially unchanged during the school year (baseline = 2%, fall = 0%, and spring = 1%) but increased statistically significantly at intervention schools (baseline = 2%, fall = 30%, and spring = 41%) (P < .001 for intervention effect comparing the change in rate of hat use over time at intervention vs control schools). Self-reported use of hats outside of school did not change statistically significantly during the study (control: baseline = 14%, fall = 14%, and spring = 11%; intervention: baseline = 24%, fall = 24%, and spring = 23%) nor did measures of skin pigmentation. The intervention increased use of hats among fourth-grade students at school but had no effect on self-reported wide-brimmed hat use outside of school or on measures of skin pigmentation.

  1. Detection of the kinematic Sunyaev–Zel'dovich effect with DES Year 1 and SPT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soergel, B.; Flender, S.; Story, K. T.

    Here, we detect the kinematic Sunyaev-Zel'dovich (kSZ) effect with a statistical significance ofmore » $$4.2 \\sigma$$ by combining a cluster catalogue derived from the first year data of the Dark Energy Survey (DES) with CMB temperature maps from the South Pole Telescope Sunyaev-Zel'dovich (SPT-SZ) Survey. This measurement is performed with a differential statistic that isolates the pairwise kSZ signal, providing the first detection of the large-scale, pairwise motion of clusters using redshifts derived from photometric data. By fitting the pairwise kSZ signal to a theoretical template we measure the average central optical depth of the cluster sample, $$\\bar{\\tau}_e = (3.75 \\pm 0.89)\\cdot 10^{-3}$$. We compare the extracted signal to realistic simulations and find good agreement with respect to the signal-to-noise, the constraint on $$\\bar{\\tau}_e$$, and the corresponding gas fraction. High-precision measurements of the pairwise kSZ signal with future data will be able to place constraints on the baryonic physics of galaxy clusters, and could be used to probe gravity on scales $$ \\gtrsim 100$$ Mpc.« less

  2. Detection of the kinematic Sunyaev–Zel'dovich effect with DES Year 1 and SPT

    DOE PAGES

    Soergel, B.; Flender, S.; Story, K. T.; ...

    2016-06-17

    Here, we detect the kinematic Sunyaev-Zel'dovich (kSZ) effect with a statistical significance ofmore » $$4.2 \\sigma$$ by combining a cluster catalogue derived from the first year data of the Dark Energy Survey (DES) with CMB temperature maps from the South Pole Telescope Sunyaev-Zel'dovich (SPT-SZ) Survey. This measurement is performed with a differential statistic that isolates the pairwise kSZ signal, providing the first detection of the large-scale, pairwise motion of clusters using redshifts derived from photometric data. By fitting the pairwise kSZ signal to a theoretical template we measure the average central optical depth of the cluster sample, $$\\bar{\\tau}_e = (3.75 \\pm 0.89)\\cdot 10^{-3}$$. We compare the extracted signal to realistic simulations and find good agreement with respect to the signal-to-noise, the constraint on $$\\bar{\\tau}_e$$, and the corresponding gas fraction. High-precision measurements of the pairwise kSZ signal with future data will be able to place constraints on the baryonic physics of galaxy clusters, and could be used to probe gravity on scales $$ \\gtrsim 100$$ Mpc.« less

  3. Light clusters and pasta phases in warm and dense nuclear matter

    NASA Astrophysics Data System (ADS)

    Avancini, Sidney S.; Ferreira, Márcio; Pais, Helena; Providência, Constança; Röpke, Gerd

    2017-04-01

    The pasta phases are calculated for warm stellar matter in a framework of relativistic mean-field models, including the possibility of light cluster formation. Results from three different semiclassical approaches are compared with a quantum statistical calculation. Light clusters are considered as point-like particles, and their abundances are determined from the minimization of the free energy. The couplings of the light clusters to mesons are determined from experimental chemical equilibrium constants and many-body quantum statistical calculations. The effect of these light clusters on the chemical potentials is also discussed. It is shown that, by including heavy clusters, light clusters are present up to larger nucleonic densities, although with smaller mass fractions.

  4. Temporal Clustering of Regional-Scale Extreme Precipitation Events in Southern Switzerland

    NASA Astrophysics Data System (ADS)

    Barton, Yannick; Giannakaki, Paraskevi; Von Waldow, Harald; Chevalier, Clément; Pfhal, Stephan; Martius, Olivia

    2017-04-01

    Temporal clustering of extreme precipitation events on subseasonal time scales is a form of compound extremes and is of crucial importance for the formation of large-scale flood events. Here, the temporal clustering of regional-scale extreme precipitation events in southern Switzerland is studied. These precipitation events are relevant for the flooding of lakes in southern Switzerland and northern Italy. This research determines whether temporal clustering is present and then identifies the dynamics that are responsible for the clustering. An observation-based gridded precipitation dataset of Swiss daily rainfall sums and ECMWF reanalysis datasets are used. To analyze the clustering in the precipitation time series a modified version of Ripley's K function is used. It determines the average number of extreme events in a time period, to characterize temporal clustering on subseasonal time scales and to determine the statistical significance of the clustering. Significant clustering of regional-scale precipitation extremes is found on subseasonal time scales during the fall season. Four high-impact clustering episodes are then selected and the dynamics responsible for the clustering are examined. During the four clustering episodes, all heavy precipitation events were associated with an upperlevel breaking Rossby wave over western Europe and in most cases strong diabatic processes upstream over the Atlantic played a role in the amplification of these breaking waves. Atmospheric blocking downstream over eastern Europe supported this wave breaking during two of the clustering episodes. During one of the clustering periods, several extratropical transitions of tropical cyclones in the Atlantic contributed to the formation of high-amplitude ridges over the Atlantic basin and downstream wave breaking. During another event, blocking over Alaska assisted the phase locking of the Rossby waves downstream over the Atlantic.

  5. A Cluster-Randomized Trial of Insecticide-Treated Curtains for Dengue Vector Control in Thailand

    PubMed Central

    Lenhart, Audrey; Trongtokit, Yuwadee; Alexander, Neal; Apiwathnasorn, Chamnarn; Satimai, Wichai; Vanlerberghe, Veerle; Van der Stuyft, Patrick; McCall, Philip J.

    2013-01-01

    The efficacy of insecticide-treated window curtains (ITCs) for dengue vector control was evaluated in Thailand in a cluster-randomized controlled trial. A total of 2,037 houses in 26 clusters was randomized to receive the intervention or act as control (no treatment). Entomological surveys measured Aedes infestations (Breteau index, house index, container index, and pupae per person index) and oviposition indices (mean numbers of eggs laid in oviposition traps) immediately before and after intervention, and at 3-month intervals over 12 months. There were no consistent statistically significant differences in entomological indices between intervention and control clusters, although oviposition indices were lower (P < 0.01) in ITC clusters during the wet season. It is possible that the open housing structures in the study reduced the likelihood of mosquitoes making contact with ITCs. ITCs deployed in a region where this house design is common may be unsuitable for dengue vector control. PMID:23166195

  6. Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance

    PubMed Central

    Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao

    2018-01-01

    Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy. PMID:29795600

  7. Incremental fuzzy C medoids clustering of time series data using dynamic time warping distance.

    PubMed

    Liu, Yongli; Chen, Jingli; Wu, Shuai; Liu, Zhizhong; Chao, Hao

    2018-01-01

    Clustering time series data is of great significance since it could extract meaningful statistics and other characteristics. Especially in biomedical engineering, outstanding clustering algorithms for time series may help improve the health level of people. Considering data scale and time shifts of time series, in this paper, we introduce two incremental fuzzy clustering algorithms based on a Dynamic Time Warping (DTW) distance. For recruiting Single-Pass and Online patterns, our algorithms could handle large-scale time series data by splitting it into a set of chunks which are processed sequentially. Besides, our algorithms select DTW to measure distance of pair-wise time series and encourage higher clustering accuracy because DTW could determine an optimal match between any two time series by stretching or compressing segments of temporal data. Our new algorithms are compared to some existing prominent incremental fuzzy clustering algorithms on 12 benchmark time series datasets. The experimental results show that the proposed approaches could yield high quality clusters and were better than all the competitors in terms of clustering accuracy.

  8. Statistical Clustering and the Contents of the Infant Vocabulary

    ERIC Educational Resources Information Center

    Swingley, Daniel

    2005-01-01

    Infants parse speech into word-sized units according to biases that develop in the first year. One bias, present before the age of 7 months, is to cluster syllables that tend to co-occur. The present computational research demonstrates that this statistical clustering bias could lead to the extraction of speech sequences that are actual words,…

  9. The Optical Gravitational Lensing Experiment

    NASA Technical Reports Server (NTRS)

    Udalski, A.; Szymanski, M.; Kaluzny, J.; Kubiak, M.; Mateo, Mario

    1992-01-01

    The technical features are described of the Optical Gravitational Lensing Experiment, which aims to detect a statistically significant number of microlensing events toward the Galactic bulge. Clusters of galaxies observed during the 1992 season are listed and discussed and the reduction methods are described. Future plans are addressed.

  10. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  11. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  12. Spatial suicide clusters in Australia between 2010 and 2012: a comparison of cluster and non-cluster among young people and adults.

    PubMed

    Robinson, Jo; Too, Lay San; Pirkis, Jane; Spittal, Matthew J

    2016-11-22

    A suicide cluster has been defined as a group of suicides that occur closer together in time and space than would normally be expected. We aimed to examine the extent to which suicide clusters exist among young people and adults in Australia and to determine whether differences exist between cluster and non-cluster suicides. Suicide data were obtained from the National Coronial Information System for the period 2010 and 2012. Data on date of death, postcode, age at the time of death, sex, suicide method, ICD-10 code for cause of death, marital status, employment status, and aboriginality were retrieved. We examined the presence of spatial clusters separately for youth suicides and adult suicides using the Scan statistic. Pearson's chi-square was used to compare the characteristics of cluster suicides with non-cluster suicides. We identified 12 spatial clusters between 2010 and 2012. Five occurred among young people (n = 53, representing 5.6% [53/940] of youth suicides) and seven occurred among adults (n = 137, representing 2.3% [137/5939] of adult suicides). Clusters ranged in size from three to 21 for youth and from three to 31 for adults. When compared to adults, suicides by young people were significantly more likely to occur as part of a cluster (difference = 3.3%, 95% confidence interval [CI] = 1.8 to 4.8, p < 0.0001). Suicides by people with an Indigenous background were also significantly more likely to occur in a cluster than suicide by non-Indigenous people and this was the case among both young people and adults. Suicide clusters have a significant negative impact on the communities in which they occur. As a result it is important to find effective ways of managing and containing suicide clusters. To date there is limited evidence for the effectiveness of those strategies typically employed, in particular in Indigenous settings, and developing this evidence base needs to be a future priority. Future research that examines in more depth the socio-demographic and clinical factors associated with suicide clusters is also warranted in order that appropriate interventions can be developed.

  13. Emergence of patterns in random processes

    NASA Astrophysics Data System (ADS)

    Newman, William I.; Turcotte, Donald L.; Malamud, Bruce D.

    2012-08-01

    Sixty years ago, it was observed that any independent and identically distributed (i.i.d.) random variable would produce a pattern of peak-to-peak sequences with, on average, three events per sequence. This outcome was employed to show that randomness could yield, as a null hypothesis for animal populations, an explanation for their apparent 3-year cycles. We show how we can explicitly obtain a universal distribution of the lengths of peak-to-peak sequences in time series and that this can be employed for long data sets as a test of their i.i.d. character. We illustrate the validity of our analysis utilizing the peak-to-peak statistics of a Gaussian white noise. We also consider the nearest-neighbor cluster statistics of point processes in time. If the time intervals are random, we show that cluster size statistics are identical to the peak-to-peak sequence statistics of time series. In order to study the influence of correlations in a time series, we determine the peak-to-peak sequence statistics for the Langevin equation of kinetic theory leading to Brownian motion. To test our methodology, we consider a variety of applications. Using a global catalog of earthquakes, we obtain the peak-to-peak statistics of earthquake magnitudes and the nearest neighbor interoccurrence time statistics. In both cases, we find good agreement with the i.i.d. theory. We also consider the interval statistics of the Old Faithful geyser in Yellowstone National Park. In this case, we find a significant deviation from the i.i.d. theory which we attribute to antipersistence. We consider the interval statistics using the AL index of geomagnetic substorms. We again find a significant deviation from i.i.d. behavior that we attribute to mild persistence. Finally, we examine the behavior of Standard and Poor's 500 stock index's daily returns from 1928-2011 and show that, while it is close to being i.i.d., there is, again, significant persistence. We expect that there will be many other applications of our methodology both to interoccurrence statistics and to time series.

  14. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  15. Ion induced electron emission statistics under Agm- cluster bombardment of Ag

    NASA Astrophysics Data System (ADS)

    Breuers, A.; Penning, R.; Wucher, A.

    2018-05-01

    The electron emission from a polycrystalline silver surface under bombardment with Agm- cluster ions (m = 1, 2, 3) is investigated in terms of ion induced kinetic excitation. The electron yield γ is determined directly by a current measurement method on the one hand and implicitly by the analysis of the electron emission statistics on the other hand. Successful measurements of the electron emission spectra ensure a deeper understanding of the ion induced kinetic electron emission process, with particular emphasis on the effect of the projectile cluster size to the yield as well as to emission statistics. The results allow a quantitative comparison to computer simulations performed for silver atoms and clusters impinging onto a silver surface.

  16. Suicide clusters among young Kenyan men.

    PubMed

    Goodman, Michael L; Puffer, Eve S; Keiser, Philip H; Gitari, Stanley

    2017-11-01

    Suicide is a leading cause of global mortality. Suicide clusters have recently been identified among peer networks in high-income countries. This study investigates dynamics of suicide clustering within social networks of young Kenya men ( n = 532; 18-34 years). We found a strong, statistically significant association between reported number of friends who previously attempted suicide and present suicide ideation (odds ratio = 1.9; 95% confidence interval (1.42, 2.54); p < 0.001). This association was mediated by lower collective self-esteem (23% of total effect). Meaning in life further mediated the association between collective self-esteem and suicide ideation. Survivors of peer suicide should be evaluated for suicide risk.

  17. Modeling of Cluster-Induced Turbulence in Particle-Laden Channel Flow

    NASA Astrophysics Data System (ADS)

    Baker, Michael; Capecelatro, Jesse; Kong, Bo; Fox, Rodney; Desjardins, Olivier

    2017-11-01

    A phenomenon often observed in gas-solid flows is the formation of mesoscale clusters of particles due to the relative motion between the solid and fluid phases that is sustained through the dampening of collisional particle motion from interphase momentum coupling inside these clusters. The formation of such sustained clusters, leading to cluster-induced turbulence (CIT), can have a significant impact in industrial processes, particularly in regards to mixing, reaction progress, and heat transfer. Both Euler-Lagrange (EL) and Euler-Euler anisotropic Gaussian (EE-AG) approaches are used in this work to perform mesoscale simulations of CIT in fully developed gas-particle channel flow. The results from these simulations are applied in the development of a two-phase Reynolds-Averaged Navier-Stokes (RANS) model to capture the wall-normal flow characteristics in a less computationally expensive manner. Parameters such as mass loading, particle size, and gas velocity are varied to examine their respective impact on cluster formation and turbulence statistics. Acknowledging support from the NSF (AN:1437865).

  18. Spatial and space-time distribution of Plasmodium vivax and Plasmodium falciparum malaria in China, 2005-2014.

    PubMed

    Hundessa, Samuel H; Williams, Gail; Li, Shanshan; Guo, Jinpeng; Chen, Linping; Zhang, Wenyi; Guo, Yuming

    2016-12-19

    Despite the declining burden of malaria in China, the disease remains a significant public health problem with periodic outbreaks and spatial variation across the country. A better understanding of the spatial and temporal characteristics of malaria is essential for consolidating the disease control and elimination programme. This study aims to understand the spatial and spatiotemporal distribution of Plasmodium vivax and Plasmodium falciparum malaria in China during 2005-2009. Global Moran's I statistics was used to detect a spatial distribution of local P. falciparum and P. vivax malaria at the county level. Spatial and space-time scan statistics were applied to detect spatial and spatiotemporal clusters, respectively. Both P. vivax and P. falciparum malaria showed spatial autocorrelation. The most likely spatial cluster of P. vivax was detected in northern Anhui province between 2005 and 2009, and western Yunnan province between 2010 and 2014. For P. falciparum, the clusters included several counties of western Yunnan province from 2005 to 2011, Guangxi from 2012 to 2013, and Anhui in 2014. The most likely space-time clusters of P. vivax malaria and P. falciparum malaria were detected in northern Anhui province and western Yunnan province, respectively, during 2005-2009. The spatial and space-time cluster analysis identified high-risk areas and periods for both P. vivax and P. falciparum malaria. Both malaria types showed significant spatial and spatiotemporal variations. Contrary to P. vivax, the high-risk areas for P. falciparum malaria shifted from the west to the east of China. Further studies are required to examine the spatial changes in risk of malaria transmission and identify the underlying causes of elevated risk in the high-risk areas.

  19. Detection of Tuberculosis Infection Hotspots Using Activity Spaces Based Spatial Approach in an Urban Tokyo, from 2003 to 2011.

    PubMed

    Izumi, Kiyohiko; Ohkado, Akihiro; Uchimura, Kazuhiro; Murase, Yoshiro; Tatsumi, Yuriko; Kayebeta, Aya; Watanabe, Yu; Ishikawa, Nobukatsu

    2015-01-01

    Identifying ongoing tuberculosis infection sites is crucial for breaking chains of transmission in tuberculosis-prevalent urban areas. Previous studies have pointed out that detection of local accumulation of tuberculosis patients based on their residential addresses may be limited by a lack of matching between residences and tuberculosis infection sites. This study aimed to identify possible tuberculosis hotspots using TB genotype clustering statuses and a concept of "activity space", a place where patients spend most of their waking hours. We further compared the spatial distribution by different residential statuses and describe urban environmental features of the detected hotspots. Culture-positive tuberculosis patients notified to Shinjuku city from 2003 to 2011 were enrolled in this case-based cross-sectional study, and their demographic and clinical information, TB genotype clustering statuses, and activity space were collected. Spatial statistics (Global Moran's I and Getis-Ord Gi* statistics) identified significant hotspots in 152 census tracts, and urban environmental features and tuberculosis patients' characteristics in these hotspots were assessed. Of the enrolled 643 culture-positive tuberculosis patients, 416 (64.2%) were general inhabitants, 42 (6.5%) were foreign-born people, and 184 were homeless people (28.6%). The percentage of overall genotype clustering was 43.7%. Genotype-clustered general inhabitants and homeless people formed significant hotspots around a major railway station, whereas the non-clustered general inhabitants formed no hotspots. This suggested the detected hotspots of activity spaces may reflect ongoing tuberculosis transmission sites and were characterized by smaller residential floor size and a higher proportion of non-working households. Activity space-based spatial analysis suggested possible TB transmission sites around the major railway station and it can assist in further comprehension of TB transmission dynamics in an urban setting in Japan.

  20. Average Heating Rate of Hot Atmospheres in Distant Galaxy Clusters by Radio AGN: Evidence for Continuous AGN Heating

    NASA Astrophysics Data System (ADS)

    Ma, Cheng-Jiun; McNamara, B.; Nulsen, P.; Schaffer, R.

    2011-09-01

    X-ray observations of nearby clusters and galaxies have shown that energetic feedback from AGN is heating hot atmospheres and is probably the principal agent that is offsetting cooling flows. Here we examine AGN heating in distant X-ray clusters by cross correlating clusters selected from the 400 Square Degree X-ray Cluster survey with radio sources in the NRAO VLA Sky Survey. The jet power for each radio source was determined using scaling relations between radio power and cavity power determined for nearby clusters, groups, and galaxies with atmospheres containing X-ray cavities. Roughly 30% of the clusters show radio emission above a flux threshold of 3 mJy within the central 250 kpc that is presumably associated with the brightest cluster galaxy. We find no significant correlation between radio power, hence jet power, and the X-ray luminosities of clusters in redshift range 0.1 -- 0.6. The detection frequency of radio AGN is inconsistent with the presence of strong cooling flows in 400SD, but cannot rule out the presence of weak cooling flows. The average jet power of central radio AGN is approximately 2 10^{44} erg/s. The jet power corresponds to an average heating of approximately 0.2 keV/particle for gas within R_500. Assuming the current AGN heating rate remained constant out to redshifts of about 2, these figures would rise by a factor of two. Our results show that the integrated energy injected from radio AGN outbursts in clusters is statistically significant compared to the excess entropy in hot atmospheres that is required for the breaking of self-similarity in cluster scaling relations. It is not clear that central AGN in 400SD clusters are maintained by a self-regulated feedback loop at the base of a cooling flow. However, they may play a significant role in preventing the development of strong cooling flows at early epochs.

  1. Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data.

    PubMed

    Kim, Sehwi; Jung, Inkyung

    2017-01-01

    The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns.

  2. Optimizing the maximum reported cluster size in the spatial scan statistic for ordinal data

    PubMed Central

    Kim, Sehwi

    2017-01-01

    The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns. PMID:28753674

  3. Chronological, geographical, and seasonal trends of human cases of avian influenza A (H5N1) in Vietnam, 2003-2014: a spatial analysis.

    PubMed

    Manabe, Toshie; Yamaoka, Kazue; Tango, Toshiro; Binh, Nguyen Gia; Co, Dao Xuan; Tuan, Nguyen Dang; Izumi, Shinyu; Takasaki, Jin; Chau, Ngo Quy; Kudo, Koichiro

    2016-02-04

    Human cases of highly pathogenic avian influenza A (H5N1) virus infection continue to occur in Southeast Asia. The objective of this study was to identify when and where human H5N1 cases have occurred in Vietnam and how the situation has changed from the beginning of the H5N1 outbreaks in 2003 through 2014, to assist with implementing methods of targeted disease management. We assessed the disease clustering and seasonal variation of human H5N1 cases in Vietnam to evaluate the geographical and monthly timing trends. The clustering of H5N1 cases and associated mortality were examined over three time periods: the outbreak period (2003-2005), the post-outbreak (2006-2009), and the recent period (2010-2014) using the flexibly shaped space-time scan statistic. The most likely cases to co-cluster and the elevated risks for incidence and mortality were assessed via calculation of the relative risk (RR). The H5N1 case seasonal variation was analysed as the cyclic trend in incidence data using Roger's statistical test. Between 2003 and 2005, H5N1 cases (RR: 2.15, p = 0.001) and mortality (RR: 2.49, p = 0.021) were significantly clustered in northern Vietnam. After 2010, H5N1 cases tended to occur on the border with Cambodia in the south, while H5N1 mortality clustered significantly in the Mekong delta area (RR: 6.62, p = 0.002). A significant seasonal variation was observed (p < 0.001), with a higher incidence of morbidity in December through April. These findings indicate that clinical preparedness for H5N1 in Vietnam needs to be strengthened in southern Vietnam in December-April.

  4. Is There a Cosmological Constant?

    NASA Technical Reports Server (NTRS)

    Kochanek, Christopher; Oliversen, Ronald J. (Technical Monitor)

    2002-01-01

    The grant contributed to the publication of 18 refereed papers and 5 conference proceedings. The primary uses of the funding have been for page charges, travel for invited talks related to the grant research, and the support of a graduate student, Charles Keeton. The refereed papers address four of the primary goals of the proposal: (1) the statistics of radio lenses as a probe of the cosmological model (#1), (2) the role of spiral galaxies as lenses (#3), (3) the effects of dust on statistics of lenses (#7, #8), and (4) the role of groups and clusters as lenses (#2, #6, #10, #13, #15, #16). Four papers (#4, #5, #11, #12) address general issues of lens models, calibrations, and the relationship between lens galaxies and nearby galaxies. One considered cosmological effects in lensing X-ray sources (#9), and two addressed issues related to the overall power spectrum and theories of gravity (#17, #18). Our theoretical studies combined with the explosion in the number of lenses and the quality of the data obtained for them is greatly increasing our ability to characterize and understand the lens population. We can now firmly conclude both from our study of the statistics of radio lenses and our survey of extinctions in individual lenses that the statistics of optically selected quasars were significantly affected by extinction. However, the limits on the cosmological constant remain at lambda < 0.65 at a 2-sigma confidence level, which is in mild conflict with the results of the Type la supernova surveys. We continue to find that neither spiral galaxies nor groups and clusters contribute significantly to the production of gravitational lenses. The lack of group and cluster lenses is strong evidence for the role of baryonic cooling in increasing the efficiency of galaxies as lenses compared to groups and clusters of higher mass but lower central density. Unfortunately for the ultimate objective of the proposal, improved constraints on the cosmological constant, the next large survey for gravitational lenses did not release its results during the term of the proposal. The research supported the career development. of six graduate students (polar, Fletcher, Herold, Keeton, Deng and Rusin) and two post-docs (Labor and Munoz).

  5. The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.

    ERIC Educational Resources Information Center

    Wang, Lin; Fan, Xitao

    Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…

  6. On the Complexity of Race

    ERIC Educational Resources Information Center

    Zyphur, Michael J.

    2006-01-01

    Although a variety of studies have indicated that using statistical clustering techniques to examine genetic information may allow for geographically based groupings of individuals that tenuously map onto some conceptions of race, these studies have also indicated that the amount of genetic variation within these groupings is significantly larger…

  7. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    PubMed

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  8. The halo Boltzmann equation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biagetti, Matteo; Desjacques, Vincent; Kehagias, Alex

    2016-04-01

    Dark matter halos are the building blocks of the universe as they host galaxies and clusters. The knowledge of the clustering properties of halos is therefore essential for the understanding of the galaxy statistical properties. We derive an effective halo Boltzmann equation which can be used to describe the halo clustering statistics. In particular, we show how the halo Boltzmann equation encodes a statistically biased gravitational force which generates a bias in the peculiar velocities of virialized halos with respect to the underlying dark matter, as recently observed in N-body simulations.

  9. Application of microarray analysis on computer cluster and cloud platforms.

    PubMed

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  10. The Impact of Horizontal and Temporal Resolution on Convection and Precipitation with High-Resolution GEOS-5

    NASA Technical Reports Server (NTRS)

    Putman, William P.

    2012-01-01

    Using a high-resolution non-hydrostatic version of GEOS-5 with the cubed-sphere finite-volume dynamical core, the impact of spatial and temporal resolution on cloud properties will be evaluated. There are indications from examining convective cluster development in high resolution GEOS-5 forecasts that the temporal resolution within the model may playas significant a role as horizontal resolution. Comparing modeled convective cloud clusters versus satellite observations of brightness temperature, we have found that improved. temporal resolution in GEOS-S accounts for a significant portion of the improvements in the statistical distribution of convective cloud clusters. Using satellite simulators in GEOS-S we will compare the cloud optical properties of GEOS-S at various spatial and temporal resolutions with those observed from MODIS. The potential impact of these results on tropical cyclone formation and intensity will be examined as well.

  11. Spatial, temporal and spatio-temporal clusters of measles incidence at the county level in Guangxi, China during 2004-2014: flexibly shaped scan statistics.

    PubMed

    Tang, Xianyan; Geater, Alan; McNeil, Edward; Deng, Qiuyun; Dong, Aihu; Zhong, Ge

    2017-04-04

    Outbreaks of measles re-emerged in Guangxi province during 2013-2014, where measles again became a major public health concern. A better understanding of the patterns of measles cases would help in identifying high-risk areas and periods for optimizing preventive strategies, yet these patterns remain largely unknown. Thus, this study aimed to determine the patterns of measles clusters in space, time and space-time at the county level over the period 2004-2014 in Guangxi. Annual data on measles cases and population sizes for each county were obtained from Guangxi CDC and Guangxi Bureau of Statistics, respectively. Epidemic curves and Kulldorff's temporal scan statistics were used to identify seasonal peaks and high-risk periods. Tango's flexible scan statistics were implemented to determine irregular spatial clusters. Spatio-temporal clusters in elliptical cylinder shapes were detected by Kulldorff's scan statistics. Population attributable risk percent (PAR%) of children aged ≤24 months was used to identify regions with a heavy burden of measles. Seasonal peaks occurred between April and June, and a temporal measles cluster was detected in 2014. Spatial clusters were identified in West, Southwest and North Central Guangxi. Three phases of spatio-temporal clusters with high relative risk were detected: Central Guangxi during 2004-2005, Midwest Guangxi in 2007, and West and Southwest Guangxi during 2013-2014. Regions with high PAR% were mainly clustered in West, Southwest, North and Central Guangxi. A temporal uptrend of measles incidence existed in Guangxi between 2010 and 2014, while downtrend during 2004-2009. The hotspots shifted from Central to West and Southwest Guangxi, regions overburdened with measles. Thus, intensifying surveillance of timeliness and completeness of routine vaccination and implementing supplementary immunization activities for measles should prioritized in these regions.

  12. The remarkable geographical pattern of gastric cancer mortality in Ecuador.

    PubMed

    Montero-Oleas, Nadia; Núñez-González, Solange; Simancas-Racines, Daniel

    2017-12-01

    This study was aimed to describe the gastric cancer mortality trend, and to analyze the spatial distribution of gastric cancer mortality in Ecuador, between 2004 and 2015. Data were collected from the National Institute of Statistics and Census (INEC) database. Crude gastric cancer mortality rates, standardized mortality ratios (SMRs) and indirect standardized mortality rates (ISMRs) were calculated per 100,000 persons. For time trend analysis, joinpoint regression was used. The annual percentage rate change (APC) and the average annual percent change (AAPC) was computed for each province. Spatial age-adjusted analysis was used to detect high risk clusters of gastric cancer mortality, from 2010 to 2015, using Kulldorff spatial scan statistics. In Ecuador, between 2004 and 2015, gastric cancer caused a total of 19,115 deaths: 10,679 in men and 8436 in women. When crude rates were analyzed, a significant decline was detected (AAPC: -1.8%; p<0.001). ISMR also decreased, but this change was not statistically significant (APC: -0.53%; p=0.36). From 2004 to 2007 and from 2008 to 2011 the province with the highest ISMR was Carchi; and, from 2012 to 2015, was Cotopaxi. The most likely high occurrence cluster included Bolívar, Los Ríos, Chimborazo, Tungurahua, and Cotopaxi provinces, with a relative risk of 1.34 (p<0.001). There is a substantial geographic variation in gastric cancer mortality rates among Ecuadorian provinces. The spatial analysis indicates the presence of high occurrence clusters throughout the Andes Mountains. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  13. Modeling the Movement of Homicide by Type to Inform Public Health Prevention Efforts

    PubMed Central

    Grady, Sue; Pizarro, Jesenia M.; Melde, Chris

    2015-01-01

    Objectives. We modeled the spatiotemporal movement of hotspot clusters of homicide by motive in Newark, New Jersey, to investigate whether different homicide types have different patterns of clustering and movement. Methods. We obtained homicide data from the Newark Police Department Homicide Unit’s investigative files from 1997 through 2007 (n = 560). We geocoded the address at which each homicide victim was found and recorded the date of and the motive for the homicide. We used cluster detection software to model the spatiotemporal movement of statistically significant homicide clusters by motive, using census tract and month of occurrence as the spatial and temporal units of analysis. Results. Gang-motivated homicides showed evidence of clustering and diffusion through Newark. Additionally, gang-motivated homicide clusters overlapped to a degree with revenge and drug-motivated homicide clusters. Escalating dispute and nonintimate familial homicides clustered; however, there was no evidence of diffusion. Intimate partner and robbery homicides did not cluster. Conclusions. By tracking how homicide types diffuse through communities and determining which places have ongoing or emerging homicide problems by type, we can better inform the deployment of prevention and intervention efforts. PMID:26270315

  14. Adverse mental health effects of cannabis use in two indigenous communities in Arnhem Land, Northern Territory, Australia: exploratory study.

    PubMed

    Clough, Alan R; d'Abbs, Peter; Cairney, Sheree; Gray, Dennis; Maruff, Paul; Parker, Robert; O'Reilly, Bridie

    2005-07-01

    We investigated adverse mental health effects and their associations with levels of cannabis use among indigenous Australian cannabis users in remote communities in the Northern Territory. Local indigenous health workers and key informants assisted in developing 28 criteria describing mental health symptoms. Five symptom clusters were identified using cluster analysis of data compiled from interviews with 103 cannabis users. Agreement was assessed (method comparison approach, kappa-statistic) with a clinician's classification of the 28 criteria into five groups labelled: 'anxiety', 'dependency', 'mood', 'vegetative' and 'psychosis'. Participants were described as showing 'anxiety', 'dependency' etc., if they reported half or more of the symptoms comprising the cluster. Associations between participants' self-reported cannabis use and each symptom cluster were assessed (logistic regression adjusting for age, sex, other substance use). Agreement between two classifications of 28 criteria into five groups was 'moderate' (64%, kappa = 0.55, p < 0.001). When five clusters were combined into three, 'anxiety-dependency', 'mood-vegetative' and 'psychosis', agreement rose to 71% (kappa = 0.56, p < 0.001). 'Anxiety-dependency' was positively associated with number of 'cones' usually smoked per week and this remained significant when adjusted for confounders (p = 0.020) and tended to remain significant in those who had never sniffed petrol (p = 0.052). Users of more than five cones per week were more likely to display 'anxiety-dependency' symptoms than those who used one cone per week (OR = 15.8, 1.8-141.2, p = 0.013). A crude association between the 'mood-vegetative' symptom cluster and number of cones usually smoked per week (p = 0.014) also remained statistically significant when adjusted for confounders (p = 0.012) but was modified by interactions with petrol sniffing (p = 0.116) and alcohol use (p = 0.276). There were no associations between cannabis use and 'psychosis'. Risks for 'anxiety-dependency' symptoms in cannabis users increased as their level of use increased. Other plausible mental health effects of cannabis in this population of comparatively new users were probably masked by alcohol use and a history of petrol sniffing.

  15. Geographical Distribution Patterns of Iodine in Drinking-Water and Its Associations with Geological Factors in Shandong Province, China

    PubMed Central

    Gao, Jie; Zhang, Zhijie; Hu, Yi; Bian, Jianchao; Jiang, Wen; Wang, Xiaoming; Sun, Liqian; Jiang, Qingwu

    2014-01-01

    County-based spatial distribution characteristics and the related geological factors for iodine in drinking-water were studied in Shandong Province (China). Spatial autocorrelation analysis and spatial scan statistic were applied to analyze the spatial characteristics. Generalized linear models (GLMs) and geographically weighted regression (GWR) studies were conducted to explore the relationship between water iodine level and its related geological factors. The spatial distribution of iodine in drinking-water was significantly heterogeneous in Shandong Province (Moran’s I = 0.52, Z = 7.4, p < 0.001). Two clusters for high iodine in drinking-water were identified in the south-western and north-western parts of Shandong Province by the purely spatial scan statistic approach. Both GLMs and GWR indicated a significantly global association between iodine in drinking-water and geological factors. Furthermore, GWR showed obviously spatial variability across the study region. Soil type and distance to Yellow River were statistically significant at most areas of Shandong Province, confirming the hypothesis that the Yellow River causes iodine deposits in Shandong Province. Our results suggested that the more effective regional monitoring plan and water improvement strategies should be strengthened targeting at the cluster areas based on the characteristics of geological factors and the spatial variability of local relationships between iodine in drinking-water and geological factors. PMID:24852390

  16. Spatial distribution of unspecified chronic kidney disease in El Salvador by crop area cultivated and ambient temperature.

    PubMed

    VanDervort, Darcy R; López, Dina L; Orantes, Carlos M; Rodríguez, David S

    2014-04-01

    Chronic kidney disease of unknown etiology is occurring in various geographic areas worldwide. Cases lack typical risk factors associated with chronic kidney disease, such as diabetes and hypertension. It is epidemic in El Salvador, Central America, where it is diagnosed with increasing frequency in young, otherwise-healthy male farmworkers. Suspected causes include agrochemical use (especially in sugarcane fields), physical heat stress, and heavy metal exposure. To evaluate the geographic relationship between unspecified chronic kidney disease (unCKD) and nondiabetic chronic renal failure (ndESRD) hospital admissions in El Salvador with the proximity to cultivated crops and ambient temperatures. Data on unCKD and ndESRD were compared with environmental variables, crop area cultivated (indicator of agrochemical use) and high ambient temperatures. Using geographically weighted regression analysis, two model sets were created using reported municipal hospital admission rates are per thousand population for unCKD 2006-2010 and rates of ndESRD 2005-2010 [corrected]. These were assessed against local percent of land cultivated by crop (sugarcane, coffee, corn, cotton, sorghum, and beans) and mean maximum ambient temperature, with Moran's indices determining data clustering. Two-dimensional geographic models illustrated parameter spatial distribution. Bivariate geographically weighted regressions showed statistically significant correlations between percent area of sugarcane, corn, cotton, coffee, and bean cultivation, as well as mean maximum ambient temperature with both unCKD and ndESRD hospital admission rates. Percent area of sugarcane cultivation had greatest statistical weight (p ≤ 0.001; Rp2 = 0.77 for unCKD). The most statistically significant multivariate geographically weighted regression model for unCKD included percent area of sugarcane, cotton and corn cultivation (p ≤ 0.001; Rp2 = 0.80), while, for ndESRD, it included the percent area of sugarcane, corn, cotton and coffee cultivation (Rp2 = 0.52). Univariate unCKD and ndESRD Moran's I (0.20 and 0.33, respectively) indicated some degree of clustering. Ambient temperature did not improve multivariate geographically-weighted regression models for unCKD or ndESRD. Local bivariate Moran's indices with relatively high positive values and statistical significance (0.3-1.0, p ≤0.05) indicated positive clustering between unCKD hospital admission rates and percent area of sugarcane as well as cotton cultivation. The greatest positive response for clustering values did not consistently plot near the highest temperatures; there were some positive clusters in regions of lower temperatures. Clusters of ndESRD were also observed, some in areas of relatively low chronic kidney disease incidence in western El Salvador. High temperatures do not appear to strongly influence occurrence of unCKDu proxies. CKDu in El Salvador may arise from proximity to agriculture to which agrochemicals are applied, especially in sugarcane cultivation. The findings of this preliminary ecological study suggest that more research is needed to assess and quantify presence of specific agrochemicals in high-CKDu areas.

  17. Defining functioning levels in patients with schizophrenia: A combination of a novel clustering method and brain SPECT analysis.

    PubMed

    Catherine, Faget-Agius; Aurélie, Vincenti; Eric, Guedj; Pierre, Michel; Raphaëlle, Richieri; Marine, Alessandrini; Pascal, Auquier; Christophe, Lançon; Laurent, Boyer

    2017-12-30

    This study aims to define functioning levels of patients with schizophrenia by using a method of interpretable clustering based on a specific functioning scale, the Functional Remission Of General Schizophrenia (FROGS) scale, and to test their validity regarding clinical and neuroimaging characterization. In this observational study, patients with schizophrenia have been classified using a hierarchical top-down method called clustering using unsupervised binary trees (CUBT). Socio-demographic, clinical, and neuroimaging SPECT perfusion data were compared between the different clusters to ensure their clinical relevance. A total of 242 patients were analyzed. A four-group functioning level structure has been identified: 54 are classified as "minimal", 81 as "low", 64 as "moderate", and 43 as "high". The clustering shows satisfactory statistical properties, including reproducibility and discriminancy. The 4 clusters consistently differentiate patients. "High" functioning level patients reported significantly the lowest scores on the PANSS and the CDSS, and the highest scores on the GAF, the MARS and S-QoL 18. Functioning levels were significantly associated with cerebral perfusion of two relevant areas: the left inferior parietal cortex and the anterior cingulate. Our study provides relevant functioning levels in schizophrenia, and may enhance the use of functioning scale. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population

    PubMed Central

    Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi

    2015-01-01

    Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. PMID:26339613

  19. Predictors of comorbid personality disorders in patients with panic disorder with agoraphobia.

    PubMed

    Latas, M; Starcevic, V; Trajkovic, G; Bogojevic, G

    2000-01-01

    The aim of this study was to ascertain predictors of comorbid personality disorders in patients with panic disorder with agoraphobia (PDAG). Sixty consecutive outpatients with PDAG were administered the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II) for the purpose of diagnosing personality disorders. Logistic regressions were used to identify predictors of any comorbid personality disorder, any DSM-IV cluster A, cluster B, and cluster C personality disorder. Independent variables in these regressions were gender, age, duration of panic disorder (PD), severity of PDAG, and scores on self-report instruments that assess the patient's perception of their parents, childhood separation anxiety, and traumatic experiences. High levels of parental protection on the Parental Bonding Instrument (PBI), indicating a perception of the parents as overprotective and controlling, emerged as the only statistically significant predictor of any comorbid personality disorder. This finding was attributed to the association between parental overprotection and cluster B personality disorders, particularly borderline personality disorder. The duration of PD was a significant predictor of any cluster B and any cluster C personality disorder, suggesting that some of the cluster B and cluster C personality disorders may be a consequence of the long-lasting PDAG. Any cluster B personality disorder was also associated with younger age. In conclusion, despite a generally nonspecific nature of the relationship between parental overprotection in childhood and adult psychopathology, the findings of this study suggest some specificity for the association between parental overprotection in childhood and personality disturbance in PDAG patients, particularly cluster B personality disorders.

  20. A two-step initial mass function:. Consequences of clustered star formation for binary properties

    NASA Astrophysics Data System (ADS)

    Durisen, R. H.; Sterzik, M. F.; Pickett, B. K.

    2001-06-01

    If stars originate in transient bound clusters of moderate size, these clusters will decay due to dynamic interactions in which a hard binary forms and ejects most or all the other stars. When the cluster members are chosen at random from a reasonable initial mass function (IMF), the resulting binary characteristics do not match current observations. We find a significant improvement in the trends of binary properties from this scenario when an additional constraint is taken into account, namely that there is a distribution of total cluster masses set by the masses of the cloud cores from which the clusters form. Two distinct steps then determine final stellar masses - the choice of a cluster mass and the formation of the individual stars. We refer to this as a ``two-step'' IMF. Simple statistical arguments are used in this paper to show that a two-step IMF, combined with typical results from dynamic few-body system decay, tends to give better agreement between computed binary characteristics and observations than a one-step mass selection process.

  1. Population changes in residential clusters in Japan.

    PubMed

    Sekiguchi, Takuya; Tamura, Kohei; Masuda, Naoki

    2018-01-01

    Population dynamics in urban and rural areas are different. Understanding factors that contribute to local population changes has various socioeconomic and political implications. In the present study, we use population census data in Japan to examine contributors to the population growth of residential clusters between years 2005 and 2010. The data set covers the entirety of Japan and has a high spatial resolution of 500 × 500 m2, enabling us to examine population dynamics in various parts of the country (urban and rural) using statistical analysis. We found that, in addition to the area, population density, and age, the shape of the cluster and the spatial distribution of inhabitants within the cluster are significantly related to the population growth rate of a residential cluster. Specifically, the population tends to grow if the cluster is "round" shaped (given the area) and the population is concentrated near the center rather than periphery of the cluster. Combination of the present results and analysis framework with other factors that have been omitted in the present study, such as migration, terrain, and transportation infrastructure, will be fruitful.

  2. Point source pollution and variability of nitrate concentrations in water from shallow aquifers

    NASA Astrophysics Data System (ADS)

    Nemčić-Jurec, Jasna; Jazbec, Anamarija

    2017-06-01

    Agriculture is one of the several major sources of nitrate pollution, and therefore the EU Nitrate Directive, designed to decrease pollution, has been implemented. Point sources like septic systems and broken sewage systems also contribute to water pollution. Pollution of groundwater by nitrate from 19 shallow wells was studied in a typical agricultural region, middle Podravina, in northwest Croatia. The concentration of nitrate ranged from <0.1 to 367 mg/l in water from wells, and 29.8 % of 253 total samples were above maximum acceptable value of 50 mg/l (MAV). Among regions R1-R6, there was no statistically significant difference in nitrate concentrations ( F = 1.98; p = 0.15) during the years 2002-2007. Average concentrations of nitrate in all 19 wells for all the analyzed years were between recommended limit value of 25 mg/l (RLV) and MAV except in 2002 (concentration was under RLV). The results of the repeated measures ANOVA showed statistically significant differences between the wells at the point source distance (proximity) of <10 m, compared to the wells at the point source distance of >20 m ( F = 10.6; p < 0.001). Average annual concentrations of nitrate during the years studied are not statistically different, but interaction between proximity and years is statistically significant ( F = 2.07; p = 0.04). Results of k-means clustering confirmed division into four clusters according to the pollution. Principal component analysis showed that there is only one significant factor, proximity, which explains 91.6 % of the total variability of nitrate. Differences in water quality were found as a result of different environmental factors. These results will contribute to the implementation of the Nitrate Directive in Croatia and the EU.

  3. Identification of a prospective early motor progression cluster of Parkinson's disease: Data from the PPMI study.

    PubMed

    Vavougios, George D; Doskas, Triantafyllos; Kormas, Constantinos; Krogfelt, Karen A; Zarogiannis, Sotirios G; Stefanis, Leonidas

    2018-04-15

    The aim of our study is to phenotype PD motor progression, and to detect whether serum, cerebrospinal fluid (CSF), neuroimaging biomarkers and neuropsychological measures characterize PD motor progression phenotypes. We defined motor progression as a difference of at least one point in the Hoehn & Yahr (H&Y) scale between the baseline (Visit 0, V0), 12 months (Visit 04, V04) and 36 months (Visit 08, V08) milestones of the Progression Markers Initiative (PPMI) study. H&Y progression events were recorded at each milestone in order to be used as cluster analysis variables, in order to produce progression phenotypes. Subsequently, cross-cluster comparisons prior to and following (pairwise) propensity score matching were performed in order to assess phenotype - defining characteristics. Four progression clusters where identified: SPPD: Secondarily Progressive PD, H&Y progression between V04 and V08; EPPD: Early Progressive PD. H&Y progression between V0 and V04; NPPD: Non Progressive PD, no H&Y progression; MIPD: Minimally Improving PD, i.e. Minimal H&Y improvement H&Y progression between V04 and V08;. Independent Samples Mann Whitney U tests determined CSF aSyn (p = 0.006, adj p-value = 0.036. I) and Semantic Animal fluency T-score (SFT, p = 0.003, adjusted p-value = 0.016.) as statistically significant cross-cluster characteristics. Following Propensity Score Matching, SFT, Hopkins Verbal Learning Test (Retention/Recall), Serum IGF1, CSF aSyn, DaT-SPECT binding ratios (SBRs) and the Benton Judgement of Line Orientation Test (BJLOT) were determined as statistically significant predictors of cluster differentiation (p < 0.05). SFT, Serum IGF1, CSF aSyn and DaT-SPECT-derived, basal ganglia Striatal Binding Ratios warrant further investigation as possible motor progression biomarkers. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. PRIMUS: Galaxy clustering as a function of luminosity and color at 0.2 < z < 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skibba, Ramin A.; Smith, M. Stephen M.; Coil, Alison L.

    2014-04-01

    We present measurements of the luminosity and color-dependence of galaxy clustering at 0.2 < z < 1.0 in the Prism Multi-object Survey. We quantify the clustering with the redshift-space and projected two-point correlation functions, ξ(r{sub p} , π) and w{sub p} (r{sub p} ), using volume-limited samples constructed from a parent sample of over ∼130, 000 galaxies with robust redshifts in seven independent fields covering 9 deg{sup 2} of sky. We quantify how the scale-dependent clustering amplitude increases with increasing luminosity and redder color, with relatively small errors over large volumes. We find that red galaxies have stronger small-scale (0.1more » Mpc h {sup –1} < r{sub p} < 1 Mpc h {sup –1}) clustering and steeper correlation functions compared to blue galaxies, as well as a strong color dependent clustering within the red sequence alone. We interpret our measured clustering trends in terms of galaxy bias and obtain values of b {sub gal} ≈ 0.9-2.5, quantifying how galaxies are biased tracers of dark matter depending on their luminosity and color. We also interpret the color dependence with mock catalogs, and find that the clustering of blue galaxies is nearly constant with color, while redder galaxies have stronger clustering in the one-halo term due to a higher satellite galaxy fraction. In addition, we measure the evolution of the clustering strength and bias, and we do not detect statistically significant departures from passive evolution. We argue that the luminosity- and color-environment (or halo mass) relations of galaxies have not significantly evolved since z ∼ 1. Finally, using jackknife subsampling methods, we find that sampling fluctuations are important and that the COSMOS field is generally an outlier, due to having more overdense structures than other fields; we find that 'cosmic variance' can be a significant source of uncertainty for high-redshift clustering measurements.« less

  5. Synergistetes cluster A in saliva is associated with periodontitis.

    PubMed

    Belibasakis, G N; Oztürk, V-Ö; Emingil, G; Bostanci, N

    2013-12-01

    Synergistetes is a novel bacterial phylum consisting of gram-negative anaerobes. Increasing lines of evidence demonstrate that this phylum is associated with periodontal diseases. This study aimed to compare the presence and levels of Synergistetes clusters A and B, in saliva of patients with chronic periodontitis (CP), generalized aggressive periodontitis (G-AgP) and non-periodontitis subjects, and investigate their correlation with clinical parameters. Saliva was collected from patients with CP (n = 20), G-AgP (n = 21) and non-periodontitis subjects (n = 18). Full mouth clinical periodontal measurements were recorded. The numbers of Synergistetes cluster A and cluster B or the associated species Jonquetella anthropi were quantified by fluorescent in situ hybridization and microscopy. Synergistetes cluster A bacteria were detected more frequently, and at higher numbers and proportions in the two periodontitis groups, than the non-periodontitis control group. The prevalence was 27.7% in the control group, 85% in CP and 86% in G-AgP. Compared to the control group, the numbers were significantly higher by 12.5-fold in CP and 26.5-fold in G-AgP, whereas the difference between the two forms of periodontitis was not statistically significant. Within the total bacterial population, the proportion of this cluster was increased in CP and G-AgP compared to the control group, with the difference between the two forms of periodontitis being also significant. There was a positive correlation between the levels of Synergistetes cluster A in saliva and all full mouth clinical periodontal parameters. Nevertheless, Synergistetes cluster B bacteria and J. anthropi species were detected infrequently and at low levels in all the three subject groups. Synergistetes cluster A, but not cluster B, bacteria are found at higher prevalence, numbers and proportions in saliva from patients with periodontitis, than non-periodontitis subjects. These findings support the association of this cluster with periodontitis. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  6. PRIMUS: Galaxy Clustering as a Function of Luminosity and Color at 0.2 < z < 1

    NASA Astrophysics Data System (ADS)

    Skibba, Ramin A.; Smith, M. Stephen M.; Coil, Alison L.; Moustakas, John; Aird, James; Blanton, Michael R.; Bray, Aaron D.; Cool, Richard J.; Eisenstein, Daniel J.; Mendez, Alexander J.; Wong, Kenneth C.; Zhu, Guangtun

    2014-04-01

    We present measurements of the luminosity and color-dependence of galaxy clustering at 0.2 < z < 1.0 in the Prism Multi-object Survey. We quantify the clustering with the redshift-space and projected two-point correlation functions, ξ(rp , π) and wp (rp ), using volume-limited samples constructed from a parent sample of over ~130, 000 galaxies with robust redshifts in seven independent fields covering 9 deg2 of sky. We quantify how the scale-dependent clustering amplitude increases with increasing luminosity and redder color, with relatively small errors over large volumes. We find that red galaxies have stronger small-scale (0.1 Mpc h -1 < rp < 1 Mpc h -1) clustering and steeper correlation functions compared to blue galaxies, as well as a strong color dependent clustering within the red sequence alone. We interpret our measured clustering trends in terms of galaxy bias and obtain values of b gal ≈ 0.9-2.5, quantifying how galaxies are biased tracers of dark matter depending on their luminosity and color. We also interpret the color dependence with mock catalogs, and find that the clustering of blue galaxies is nearly constant with color, while redder galaxies have stronger clustering in the one-halo term due to a higher satellite galaxy fraction. In addition, we measure the evolution of the clustering strength and bias, and we do not detect statistically significant departures from passive evolution. We argue that the luminosity- and color-environment (or halo mass) relations of galaxies have not significantly evolved since z ~ 1. Finally, using jackknife subsampling methods, we find that sampling fluctuations are important and that the COSMOS field is generally an outlier, due to having more overdense structures than other fields; we find that "cosmic variance" can be a significant source of uncertainty for high-redshift clustering measurements.

  7. Spatial modelling and mapping of female genital mutilation in Kenya.

    PubMed

    Achia, Thomas N O

    2014-03-25

    Female genital mutilation/cutting (FGM/C) is still prevalent in several communities in Kenya and other areas in Africa, as well as being practiced by some migrants from African countries living in other parts of the world. This study aimed at detecting clustering of FGM/C in Kenya, and identifying those areas within the country where women still intend to continue the practice. A broader goal of the study was to identify geographical areas where the practice continues unabated and where broad intervention strategies need to be introduced. The prevalence of FGM/C was investigated using the 2008 Kenya Demographic and Health Survey (KDHS) data. The 2008 KDHS used a multistage stratified random sampling plan to select women of reproductive age (15-49 years) and asked questions concerning their FGM/C status and their support for the continuation of FGM/C. A spatial scan statistical analysis was carried out using SaTScan™ to test for statistically significant clustering of the practice of FGM/C in the country. The risk of FGM/C was also modelled and mapped using a hierarchical spatial model under the Integrated Nested Laplace approximation approach using the INLA library in R. The prevalence of FGM/C stood at 28.2% and an estimated 10.3% of the women interviewed indicated that they supported the continuation of FGM. On the basis of the Deviance Information Criterion (DIC), hierarchical spatial models with spatially structured random effects were found to best fit the data for both response variables considered. Age, region, rural-urban classification, education, marital status, religion, socioeconomic status and media exposure were found to be significantly associated with FGM/C. The current FGM/C status of a woman was also a significant predictor of support for the continuation of FGM/C. Spatial scan statistics confirm FGM clusters in the North-Eastern and South-Western regions of Kenya (p<0.001). This suggests that the fight against FGM/C in Kenya is not yet over. There are still deep cultural and religious beliefs to be addressed in a bid to eradicate the practice. Interventions by government and other stakeholders must address these challenges and target the identified clusters.

  8. Weighted community detection and data clustering using message passing

    NASA Astrophysics Data System (ADS)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  9. Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

    PubMed Central

    Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

    2014-01-01

    Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565

  10. ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

    NASA Technical Reports Server (NTRS)

    Wharton, S. W.; Turner, B. J.

    1981-01-01

    An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.

  11. Cluster analysis of particulate matter (PM10) and black carbon (BC) concentrations

    NASA Astrophysics Data System (ADS)

    Žibert, Janez; Pražnikar, Jure

    2012-09-01

    The monitoring of air-pollution constituents like particulate matter (PM10) and black carbon (BC) can provide information about air quality and the dynamics of emissions. Air quality depends on natural and anthropogenic sources of emissions as well as the weather conditions. For a one-year period the diurnal concentrations of PM10 and BC in the Port of Koper were analysed by clustering days into similar groups according to the similarity of the BC and PM10 hourly derived day-profiles without any prior assumptions about working and non-working days, weather conditions or hot and cold seasons. The analysis was performed by using k-means clustering with the squared Euclidean distance as the similarity measure. The analysis showed that 10 clusters in the BC case produced 3 clusters with just one member day and 7 clusters that encompasses more than one day with similar BC profiles. Similar results were found in the PM10 case, where one cluster has a single-member day, while 7 clusters contain several member days. The clustering analysis revealed that the clusters with less pronounced bimodal patterns and low hourly and average daily concentrations for both types of measurements include the most days in the one-year analysis. A typical day profile of the BC measurements includes a bimodal pattern with morning and evening peaks, while the PM10 measurements reveal a less pronounced bimodality. There are also clusters with single-peak day-profiles. The BC data in such cases exhibit morning peaks, while the PM10 data consist of noon or afternoon single peaks. Single pronounced peaks can be explained by appropriate cluster wind speed profiles. The analysis also revealed some special day-profiles. The BC cluster with a high midnight peak at 30/04/2010 and the PM10 cluster with the highest observed concentration of PM10 at 01/05/2010 (208.0 μg m-3) coincide with 1 May, which is a national holiday in Slovenia and has very strong tradition of bonfire parties. The clustering of the diurnal concentration showed that various different day-profiles are presented in a cold period, while this is not the case for the hot season. Additional analysis of ship traffic and rain fall data showed that there is no statistically significant difference between the ship gross (bruto) registered tonnage (BRT) values in the case of BC and PM10 clusters, but that there is statistically significant differences between the rain fall in the BC and PM10 clusters. The wind-rose for clusters which included most days in the sampling period indicating that emitted PM10 and BC from Port of Koper were manly transported in the west direction over the sea and in the east direction, where there is in no populated area. Presented analysis showed that both BC and PM10 concentrations were driven by rain intensity and wind speed.

  12. Evaluation of clinical image processing algorithms used in digital mammography.

    PubMed

    Zanca, Federica; Jacobs, Jurgen; Van Ongeval, Chantal; Claus, Filip; Celis, Valerie; Geniets, Catherine; Provost, Veerle; Pauwels, Herman; Marchal, Guy; Bosmans, Hilde

    2009-03-01

    Screening is the only proven approach to reduce the mortality of breast cancer, but significant numbers of breast cancers remain undetected even when all quality assurance guidelines are implemented. With the increasing adoption of digital mammography systems, image processing may be a key factor in the imaging chain. Although to our knowledge statistically significant effects of manufacturer-recommended image processings have not been previously demonstrated, the subjective experience of our radiologists, that the apparent image quality can vary considerably between different algorithms, motivated this study. This article addresses the impact of five such algorithms on the detection of clusters of microcalcifications. A database of unprocessed (raw) images of 200 normal digital mammograms, acquired with the Siemens Novation DR, was collected retrospectively. Realistic simulated microcalcification clusters were inserted in half of the unprocessed images. All unprocessed images were subsequently processed with five manufacturer-recommended image processing algorithms (Agfa Musica 1, IMS Raffaello Mammo 1.2, Sectra Mamea AB Sigmoid, Siemens OPVIEW v2, and Siemens OPVIEW v1). Four breast imaging radiologists were asked to locate and score the clusters in each image on a five point rating scale. The free-response data were analyzed by the jackknife free-response receiver operating characteristic (JAFROC) method and, for comparison, also with the receiver operating characteristic (ROC) method. JAFROC analysis revealed highly significant differences between the image processings (F = 8.51, p < 0.0001), suggesting that image processing strongly impacts the detectability of clusters. Siemens OPVIEW2 and Siemens OPVIEW1 yielded the highest and lowest performances, respectively. ROC analysis of the data also revealed significant differences between the processing but at lower significance (F = 3.47, p = 0.0305) than JAFROC. Both statistical analysis methods revealed that the same six pairs of modalities were significantly different, but the JAFROC confidence intervals were about 32% smaller than ROC confidence intervals. This study shows that image processing has a significant impact on the detection of microcalcifications in digital mammograms. Objective measurements, such as described here, should be used by the manufacturers to select the optimal image processing algorithm.

  13. Post-traumatic stress disorder in adult victims of cluster munitions in Lebanon: a 10-year longitudinal study.

    PubMed

    Fares, Jawad; Gebeily, Souheil; Saad, Mohamad; Harati, Hayat; Nabha, Sanaa; Said, Najwane; Kanso, Mohamad; Abdel Rassoul, Ronza; Fares, Youssef

    2017-08-18

    This study aims to explore the short-term and long-term prevalence and effects of post-traumatic stress disorder (PTSD) among victims of cluster munitions. A prospective 10-year longitudinal study that took place in Lebanon. Two-hundred-and-forty-four Lebanese civilian victims of submunition blasts, who were injured in 2006 and were over 18 years old, were interviewed. Included were participants who had been diagnosed with PTSD according to the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) and the PTSD Checklist - Civilian Version in 2006. Interviewees were present for the 10-year follow-up. PTSD prevalence rates of participants in 2006 and 2016 were compared. Analysis of the demographical data pertaining to the association of long-term PTSD with other variables was performed. p Values <0.05 were considered statistically significant for all analyses (95% CI). All the 244 civilians injured by cluster munitions in 2006 responded, and were present for long-term follow-up in 2016. The prevalence of PTSD decreased significantly from 98% to 43% after 10 years (p<0.001). A lower long-term prevalence was significantly associated with male sex (p<0.001), family support (p<0.001) and religion (p<0.001). Hospitalisation (p=0.005) and severe functional impairment (p<0.001) post-trauma were significantly associated with increased prevalence of long-term PTSD. Symptoms of negative cognition and mood were more common in the long run. In addition, job instability was the most frequent socioeconomic repercussion among the participants (88%). Psychological symptoms, especially PTSD, remain high in war-affected populations many years after the war; this is particularly evident for Lebanese civilians who were victimised by cluster munitions. Screening programmes and psychological interventions need to be implemented in vulnerable populations exposed to war traumas. Officials and public health advocates should consider the socioeconomic implications, and help raise awareness against the harm induced by cluster munitions and similar weaponry. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  14. Post-traumatic stress disorder in adult victims of cluster munitions in Lebanon: a 10-year longitudinal study

    PubMed Central

    Fares, Jawad; Gebeily, Souheil; Saad, Mohamad; Harati, Hayat; Nabha, Sanaa; Said, Najwane; Kanso, Mohamad; Abdel Rassoul, Ronza; Fares, Youssef

    2017-01-01

    Objective This study aims to explore the short-term and long-term prevalence and effects of post-traumatic stress disorder (PTSD) among victims of cluster munitions. Design and setting A prospective 10-year longitudinal study that took place in Lebanon. Participants Two-hundred-and-forty-four Lebanese civilian victims of submunition blasts, who were injured in 2006 and were over 18 years old, were interviewed. Included were participants who had been diagnosed with PTSD according to the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) and the PTSD Checklist - Civilian Version in 2006. Interviewees were present for the 10-year follow-up. Main outcome measures PTSD prevalence rates of participants in 2006 and 2016 were compared. Analysis of the demographical data pertaining to the association of long-term PTSD with other variables was performed. p Values <0.05 were considered statistically significant for all analyses (95% CI). Results All the 244 civilians injured by cluster munitions in 2006 responded, and were present for long-term follow-up in 2016. The prevalence of PTSD decreased significantly from 98% to 43% after 10 years (p<0.001). A lower long-term prevalence was significantly associated with male sex (p<0.001), family support (p<0.001) and religion (p<0.001). Hospitalisation (p=0.005) and severe functional impairment (p<0.001) post-trauma were significantly associated with increased prevalence of long-term PTSD. Symptoms of negative cognition and mood were more common in the long run. In addition, job instability was the most frequent socioeconomic repercussion among the participants (88%). Conclusions Psychological symptoms, especially PTSD, remain high in war-affected populations many years after the war; this is particularly evident for Lebanese civilians who were victimised by cluster munitions. Screening programmes and psychological interventions need to be implemented in vulnerable populations exposed to war traumas. Officials and public health advocates should consider the socioeconomic implications, and help raise awareness against the harm induced by cluster munitions and similar weaponry. PMID:28821528

  15. Spatio-temporal surveillance of water based infectious disease (malaria) in Rawalpindi, Pakistan using geostatistical modeling techniques.

    PubMed

    Ahmad, Sheikh Saeed; Aziz, Neelam; Butt, Amna; Shabbir, Rabia; Erum, Summra

    2015-09-01

    One of the features of medical geography that has made it so useful in health research is statistical spatial analysis, which enables the quantification and qualification of health events. The main objective of this research was to study the spatial distribution patterns of malaria in Rawalpindi district using spatial statistical techniques to identify the hot spots and the possible risk factor. Spatial statistical analyses were done in ArcGIS, and satellite images for land use classification were processed in ERDAS Imagine. Four hundred and fifty water samples were also collected from the study area to identify the presence or absence of any microbial contamination. The results of this study indicated that malaria incidence varied according to geographical location, with eco-climatic condition and showing significant positive spatial autocorrelation. Hotspots or location of clusters were identified using Getis-Ord Gi* statistic. Significant clustering of malaria incidence occurred in rural central part of the study area including Gujar Khan, Kaller Syedan, and some part of Kahuta and Rawalpindi Tehsil. Ordinary least square (OLS) regression analysis was conducted to analyze the relationship of risk factors with the disease cases. Relationship of different land cover with the disease cases indicated that malaria was more related with agriculture, low vegetation, and water class. Temporal variation of malaria cases showed significant positive association with the meteorological variables including average monthly rainfall and temperature. The results of the study further suggested that water supply and sewage system and solid waste collection system needs a serious attention to prevent any outbreak in the study area.

  16. Optimized Clustering Estimators for BAO Measurements Accounting for Significant Redshift Uncertainty

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ross, Ashley J.; Banik, Nilanjan; Avila, Santiago

    2017-05-15

    We determine an optimized clustering statistic to be used for galaxy samples with significant redshift uncertainty, such as those that rely on photometric redshifts. To do so, we study the BAO information content as a function of the orientation of galaxy clustering modes with respect to their angle to the line-of-sight (LOS). The clustering along the LOS, as observed in a redshift-space with significant redshift uncertainty, has contributions from clustering modes with a range of orientations with respect to the true LOS. For redshift uncertaintymore » $$\\sigma_z \\geq 0.02(1+z)$$ we find that while the BAO information is confined to transverse clustering modes in the true space, it is spread nearly evenly in the observed space. Thus, measuring clustering in terms of the projected separation (regardless of the LOS) is an efficient and nearly lossless compression of the signal for $$\\sigma_z \\geq 0.02(1+z)$$. For reduced redshift uncertainty, a more careful consideration is required. We then use more than 1700 realizations of galaxy simulations mimicking the Dark Energy Survey Year 1 sample to validate our analytic results and optimized analysis procedure. We find that using the correlation function binned in projected separation, we can achieve uncertainties that are within 10 per cent of of those predicted by Fisher matrix forecasts. We predict that DES Y1 should achieve a 5 per cent distance measurement using our optimized methods. We expect the results presented here to be important for any future BAO measurements made using photometric redshift data.« less

  17. Optimized clustering estimators for BAO measurements accounting for significant redshift uncertainty

    NASA Astrophysics Data System (ADS)

    Ross, Ashley J.; Banik, Nilanjan; Avila, Santiago; Percival, Will J.; Dodelson, Scott; Garcia-Bellido, Juan; Crocce, Martin; Elvin-Poole, Jack; Giannantonio, Tommaso; Manera, Marc; Sevilla-Noarbe, Ignacio

    2017-12-01

    We determine an optimized clustering statistic to be used for galaxy samples with significant redshift uncertainty, such as those that rely on photometric redshifts. To do so, we study the baryon acoustic oscillation (BAO) information content as a function of the orientation of galaxy clustering modes with respect to their angle to the line of sight (LOS). The clustering along the LOS, as observed in a redshift-space with significant redshift uncertainty, has contributions from clustering modes with a range of orientations with respect to the true LOS. For redshift uncertainty σz ≥ 0.02(1 + z), we find that while the BAO information is confined to transverse clustering modes in the true space, it is spread nearly evenly in the observed space. Thus, measuring clustering in terms of the projected separation (regardless of the LOS) is an efficient and nearly lossless compression of the signal for σz ≥ 0.02(1 + z). For reduced redshift uncertainty, a more careful consideration is required. We then use more than 1700 realizations (combining two separate sets) of galaxy simulations mimicking the Dark Energy Survey Year 1 (DES Y1) sample to validate our analytic results and optimized analysis procedure. We find that using the correlation function binned in projected separation, we can achieve uncertainties that are within 10 per cent of those predicted by Fisher matrix forecasts. We predict that DES Y1 should achieve a 5 per cent distance measurement using our optimized methods. We expect the results presented here to be important for any future BAO measurements made using photometric redshift data.

  18. Operational foreshock forecasting: Fifteen years after

    NASA Astrophysics Data System (ADS)

    Ogata, Y.

    2010-12-01

    We are concerned with operational forecasting of the probability that events are foreshocks of a forthcoming earthquake that is significantly larger (mainshock). Specifically, we define foreshocks as the preshocks substantially smaller than the mainshock by a magnitude gap of 0.5 or larger. The probability gain of foreshock forecast is extremely high compare to long-term forecast by renewal processes or various alarm-based intermediate-term forecasts because of a large event’s low occurrence rate in a short period and a narrow target region. Thus, it is desired to establish operational foreshock probability forecasting as seismologists have done for aftershocks. When a series of earthquakes occurs in a region, we attempt to discriminate foreshocks from a swarm or mainshock-aftershock sequence. Namely, after real time identification of an earthquake cluster using methods such as the single-link algorithm, the probability is calculated by applying statistical features that discriminate foreshocks from other types of clusters, by considering the events' stronger proximity in time and space and tendency towards chronologically increasing magnitudes. These features were modeled for probability forecasting and the coefficients of the model were estimated in Ogata et al. (1996) for the JMA hypocenter data (M≧4, 1926-1993). Currently, fifteen years has passed since the publication of the above-stated work so that we are able to present the performance and validation of the forecasts (1994-2009) by using the same model. Taking isolated events into consideration, the probability of the first events in a potential cluster being a foreshock vary in a range between 0+% and 10+% depending on their locations. This conditional forecasting performs significantly better than the unconditional (average) foreshock probability of 3.7% throughout Japan region. Furthermore, when we have the additional events in a cluster, the forecast probabilities range more widely from nearly 0% to about 40% depending on the discrimination features among the events in the cluster. This conditional forecasting further performs significantly better than the unconditional foreshock probability of 7.3%, which is the average probability of the plural events in the earthquake clusters. Indeed, the frequency ratios of the actual foreshocks are consistent with the forecasted probabilities. Reference: Ogata, Y., Utsu, T. and Katsura, K. (1996). Statistical discrimination of foreshocks from other earthquake clusters, Geophys. J. Int. 127, 17-30.

  19. Cosmic variance of the galaxy cluster weak lensing signal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gruen, D.; Seitz, S.; Becker, M. R.

    Intrinsic variations of the projected density profiles of clusters of galaxies at fixed mass are a source of uncertainty for cluster weak lensing. We present a semi-analytical model to account for this effect, based on a combination of variations in halo concentration, ellipticity and orientation, and the presence of correlated haloes. We calibrate the parameters of our model at the 10 per cent level to match the empirical cosmic variance of cluster profiles at M 200m ≈ 10 14…10 15h –1M ⊙, z = 0.25…0.5 in a cosmological simulation. We show that weak lensing measurements of clusters significantly underestimate massmore » uncertainties if intrinsic profile variations are ignored, and that our model can be used to provide correct mass likelihoods. Effects on the achievable accuracy of weak lensing cluster mass measurements are particularly strong for the most massive clusters and deep observations (with ≈20 per cent uncertainty from cosmic variance alone at M 200m ≈ 10 15h –1M ⊙ and z = 0.25), but significant also under typical ground-based conditions. We show that neglecting intrinsic profile variations leads to biases in the mass-observable relation constrained with weak lensing, both for intrinsic scatter and overall scale (the latter at the 15 per cent level). Furthermore, these biases are in excess of the statistical errors of upcoming surveys and can be avoided if the cosmic variance of cluster profiles is accounted for.« less

  20. Cosmic variance of the galaxy cluster weak lensing signal

    DOE PAGES

    Gruen, D.; Seitz, S.; Becker, M. R.; ...

    2015-04-13

    Intrinsic variations of the projected density profiles of clusters of galaxies at fixed mass are a source of uncertainty for cluster weak lensing. We present a semi-analytical model to account for this effect, based on a combination of variations in halo concentration, ellipticity and orientation, and the presence of correlated haloes. We calibrate the parameters of our model at the 10 per cent level to match the empirical cosmic variance of cluster profiles at M 200m ≈ 10 14…10 15h –1M ⊙, z = 0.25…0.5 in a cosmological simulation. We show that weak lensing measurements of clusters significantly underestimate massmore » uncertainties if intrinsic profile variations are ignored, and that our model can be used to provide correct mass likelihoods. Effects on the achievable accuracy of weak lensing cluster mass measurements are particularly strong for the most massive clusters and deep observations (with ≈20 per cent uncertainty from cosmic variance alone at M 200m ≈ 10 15h –1M ⊙ and z = 0.25), but significant also under typical ground-based conditions. We show that neglecting intrinsic profile variations leads to biases in the mass-observable relation constrained with weak lensing, both for intrinsic scatter and overall scale (the latter at the 15 per cent level). Furthermore, these biases are in excess of the statistical errors of upcoming surveys and can be avoided if the cosmic variance of cluster profiles is accounted for.« less

  1. Removal of impulse noise clusters from color images with local order statistics

    NASA Astrophysics Data System (ADS)

    Ruchay, Alexey; Kober, Vitaly

    2017-09-01

    This paper proposes a novel algorithm for restoring images corrupted with clusters of impulse noise. The noise clusters often occur when the probability of impulse noise is very high. The proposed noise removal algorithm consists of detection of bulky impulse noise in three color channels with local order statistics followed by removal of the detected clusters by means of vector median filtering. With the help of computer simulation we show that the proposed algorithm is able to effectively remove clustered impulse noise. The performance of the proposed algorithm is compared in terms of image restoration metrics with that of common successful algorithms.

  2. [Temporal-spatial analysis of bacillary dysentery in the Three Gorges Area of China, 2005-2016].

    PubMed

    Zhang, P; Zhang, J; Chang, Z R; Li, Z J

    2018-01-10

    Objective: To analyze the spatial and temporal distributions of bacillary dysentery in Chongqing, Yichang and Enshi (the Three Gorges Area) from 2005 to 2016, and provide evidence for the disease prevention and control. Methods: The incidence data of bacillary dysentery in the Three Gorges Area during this period were collected from National Notifiable Infectious Disease Reporting System. The spatial-temporal scan statistic was conducted with software SaTScan 9.4 and bacillary dysentery clusters were visualized with software ArcGIS 10.3. Results: A total of 126 196 cases were reported in the Three Gorges Area during 2005-2016, with an average incidence rate of 29.67/100 000. The overall incidence was in a downward trend, with an average annual decline rate of 4.74%. Cases occurred all the year round but with an obvious seasonal increase between May and October. Among the reported cases, 44.71% (56 421/126 196) were children under 5-year-old, the cases in children outside child care settings accounted for 41.93% (52 918/126 196) of the total. The incidence rates in districts of Yuzhong, Dadukou, Jiangbei, Shapingba, Jiulongpo, Nanan, Yubei, Chengkou of Chongqing and districts of Xiling and Wujiagang of Yichang city of Hubei province were high, ranging from 60.20/100 000 to 114.81/100 000. Spatial-temporal scan statistic for the spatial and temporal distributions of bacillary dysentery during this period revealed that the temporal distribution was during May-October, and there were 12 class Ⅰ clusters, 35 class Ⅱ clusters, and 9 clusters without statistical significance in counties with high incidence. All the class Ⅰ clusters were in urban area of Chongqing (Yuzhong, Dadukou, Jiangbei, Shapingba, Jiulongpo, Nanan, Beibei, Yubei, Banan) and surrounding counties, and the class Ⅱ clusters transformed from concentrated distribution to scattered distribution. Conclusions: Temporal and spatial cluster of bacillary dysentery incidence existed in the three gorges area during 2005-2016. It is necessary to strengthen the bacillary dysentery prevention and control in urban areas of Chongqing and Yichang.

  3. Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

    NASA Technical Reports Server (NTRS)

    Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

    1999-01-01

    We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

  4. Spatial and temporal changes in household structure locations using high-resolution satellite imagery for population assessment: an analysis in southern Zambia, 2006-2011.

    PubMed

    Shields, Timothy; Pinchoff, Jessie; Lubinda, Jailos; Hamapumbu, Harry; Searle, Kelly; Kobayashi, Tamaki; Thuma, Philip E; Moss, William J; Curriero, Frank C

    2016-05-31

    Satellite imagery is increasingly available at high spatial resolution and can be used for various purposes in public health research and programme implementation. Comparing a census generated from two satellite images of the same region in rural southern Zambia obtained four and a half years apart identified patterns of household locations and change over time. The length of time that a satellite image-based census is accurate determines its utility. Households were enumerated manually from satellite images obtained in 2006 and 2011 of the same area. Spatial statistics were used to describe clustering, cluster detection, and spatial variation in the location of households. A total of 3821 household locations were enumerated in 2006 and 4256 in 2011, a net change of 435 houses (11.4% increase). Comparison of the images indicated that 971 (25.4%) structures were added and 536 (14.0%) removed. Further analysis suggested similar household clustering in the two images and no substantial difference in concentration of households across the study area. Cluster detection analysis identified a small area where significantly more household structures were removed than expected; however, the amount of change was of limited practical significance. These findings suggest that random sampling of households for study participation would not induce geographic bias if based on a 4.5-year-old image in this region. Application of spatial statistical methods provides insights into the population distribution changes between two time periods and can be helpful in assessing the accuracy of satellite imagery.

  5. Coagulation-fragmentation for a finite number of particles and application to telomere clustering in the yeast nucleus

    NASA Astrophysics Data System (ADS)

    Hozé, Nathanaël; Holcman, David

    2012-01-01

    We develop a coagulation-fragmentation model to study a system composed of a small number of stochastic objects moving in a confined domain, that can aggregate upon binding to form local clusters of arbitrary sizes. A cluster can also dissociate into two subclusters with a uniform probability. To study the statistics of clusters, we combine a Markov chain analysis with a partition number approach. Interestingly, we obtain explicit formulas for the size and the number of clusters in terms of hypergeometric functions. Finally, we apply our analysis to study the statistical physics of telomeres (ends of chromosomes) clustering in the yeast nucleus and show that the diffusion-coagulation-fragmentation process can predict the organization of telomeres.

  6. The influence of academic examinations on energy and nutrient intake in male university students.

    PubMed

    Barker, Margo E; Blain, Richard J; Russell, Jean M

    2015-09-25

    Taking examinations is central to student experience at University and may cause psychological stress. Although stress is recognised to impact on food intake, the effects of undertaking examinations on students' dietary intake have not been well characterised. The purpose of this study was to assess how students' energy and nutrient intake may alter during examination periods. The study design was a within-subject comparison of students' energy and nutrient intake during an examination period contrasted with that outside an examination period (baseline). A total of 20 male students from the University of Sheffield completed an automated photographic 4-d dietary record alongside four 24-h recalls in each time period. Daily energy and nutrient intake was estimated for each student by time period and change in energy and nutrient intake calculated. Intakes at baseline were compared to UK dietary recommendations. Cluster analysis categorised students according to their change in energy intake between baseline and the examination period. Non-parametric statistical tests identified differences by cluster. Baseline intakes did not meet recommendations for energy, non-milk extrinsic sugars, non-starch polysaccharide and sodium. Three defined clusters of students were identified: Cluster D who decreased daily energy intake by 12.06 MJ (n = 5), Cluster S who had similar energy intakes (n = 13) and Cluster I who substantially increased energy intake by 6.37 MJ (n = 2) between baseline and examination period. There were statistically significant differences (all p < 0.05) in change in intake of protein, carbohydrate, calcium and sodium between clusters. Cluster D recorded greater energy, carbohydrate and protein intakes than Cluster I at baseline. The majority of students were dietary resilient. Students who demonstrated hypophagia in the examination period had a high energy and nutrient intake at baseline, conversely those who showed hyperphagia had a low energy and nutrient intake. These patterns require confirmation in studies including women, but if confirmed, there is need to address some students' poor food choice especially during examinations.

  7. Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

    NASA Technical Reports Server (NTRS)

    Wolf, S. F.; Lipschutz, M. E.

    1993-01-01

    Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.

  8. Assessment of trace elements levels in patients with Type 2 diabetes using multivariate statistical analysis.

    PubMed

    Badran, M; Morsy, R; Soliman, H; Elnimr, T

    2016-01-01

    The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.

  9. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  10. Extracting Galaxy Cluster Gas Inhomogeneity from X-Ray Surface Brightness: A Statistical Approach and Application to Abell 3667

    NASA Astrophysics Data System (ADS)

    Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi

    2008-11-01

    Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.

  11. Detection of major climatic and environmental predictors of liver fluke exposure risk in Ireland using spatial cluster analysis.

    PubMed

    Selemetas, Nikolaos; de Waal, Theo

    2015-04-30

    Fasciolosis caused by Fasciola hepatica (liver fluke) can cause significant economic and production losses in dairy cow farms. The aim of the current study was to identify important weather and environmental predictors of the exposure risk to liver fluke by detecting clusters of fasciolosis in Ireland. During autumn 2012, bulk-tank milk samples from 4365 dairy farms were collected throughout Ireland. Using an in-house antibody-detection ELISA, the analysis of BTM samples showed that 83% (n=3602) of dairy farms had been exposed to liver fluke. The Getis-Ord Gi* statistic identified 74 high-risk and 130 low-risk significant (P<0.01) clusters of fasciolosis. The low-risk clusters were mostly located in the southern regions of Ireland, whereas the high-risk clusters were mainly situated in the western part. Several climatic variables (monthly and seasonal mean rainfall and temperatures, total wet days and rain days) and environmental datasets (soil types, enhanced vegetation index and normalised difference vegetation index) were used to investigate dissimilarities in the exposure to liver fluke between clusters. Rainfall, total wet days and rain days, and soil type were the significant classes of climatic and environmental variables explaining the differences between significant clusters. A discriminant function analysis was used to predict the exposure risk to liver fluke using 80% of data for modelling and the remaining subset of 20% for post hoc model validation. The most significant predictors of the model risk function were total rainfall in August and September and total wet days. The risk model presented 100% sensitivity and 91% specificity and an accuracy of 95% correctly classified cases. A risk map of exposure to liver fluke was constructed with higher probability of exposure in western and north-western regions. The results of this study identified differences between clusters of fasciolosis in Ireland regarding climatic and environmental variables and detected significant predictors of the exposure risk to liver fluke. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Spatial distribution of HIV, HCV, and co-infections among drug users in the southwestern border areas of China (2004-2014): a cohort study of a national methadone maintenance treatment program.

    PubMed

    Li, Mingli; Li, Rongjian; Shen, Zhiyong; Li, Chunying; Liang, Nengxiu; Peng, Zhenren; Huang, Wenbo; He, Chongwei; Zhong, Feng; Tang, Xianyan; Lan, Guanghua

    2017-09-30

    A methadone maintenance treatment (MMT) program to curb the dual epidemics of HIV/AIDS and drug use has been administered by China since 2004. Little is known regarding the geographic heterogeneity of HIV and hepatitis C virus (HCV) infections among MMT clients in the resource-constrained context of Chinese provinces, such as Guangxi. This study aimed to characterize the geographic distribution patterns and co-clustered epidemic factors of HIV, HCV and co-infections at the county level among drug users receiving MMT in Guangxi Zhuang Autonomous Region, located in the southwestern border area of China. Baseline data on drug users' demographic, behavioral and biological characteristics in the MMT clinics of Guangxi Zhuang Autonomous Region during the period of March 2004 to December 2014 were obtained from national HIV databases. Residential addresses were entered into a geographical information system (GIS) program and analyzed for spatial clustering of HIV, HCV and co-infections among MMT clients at the county level using geographic autocorrelation analysis and geographic scan statistics. A total of 31,015 MMT clients were analyzed, and the prevalence of HIV, HCV and co-infections were 13.05%, 72.51% and 11.96% respectively. Both the geographic autocorrelation analysis and geographic scan statistics showed that HIV, HCV and co-infections in Guangxi Zhuang Autonomous Region exhibited significant geographic clustering at the county level, and the Moran's I values were 0.33, 0.41 and 0.30, respectively (P < 0.05). The most significant high-risk overlapping clusters for these infections were restricted to within a 10.95 km 2 radius of each of the 13 locations where P county was the cluster center. These infections also co-clustered with certain characteristics, such as being unmarried, having a primary level of education or below, having used drugs for more than 10 years, and receptive sharing of syringes with others. The high-risk clusters for these characteristics were more likely to reside in the areas surrounding P county. HIV, HCV and co-infections among MMT clients in Guangxi Zhuang Autonomous Region all presented substantial geographic heterogeneity at the county level with a number of overlapping significant clusters. The areas surrounding P county were effective in enrolling high-risk clients in their MMT programs which, in turn, might enable people who inject drugs to inject less, share fewer syringes, and receive referrals for HIV or HCV treatment in a timely manner.

  13. Determining Distance, Age, and Activity in a New Benchmark Cluster: Ruprecht 147

    NASA Astrophysics Data System (ADS)

    Wright, Jason T.

    2009-08-01

    This proposal seeks 0.7 night of time on Hectochelle to observe the F, G, and K dwarfs of Ruprecht 147, recently identified as the closest old stellar cluster. At only ~ 200 pc and at an age of ~ 1-2 Gyr, this will be an important benchmark in stellar astrophysics, providing the only sample of spectroscopically accessible old, late-type stars of determinable age. Hectochelle is the ideal instrument to study this cluster, with a FOV, fiber count, and telescope aperture well matched to the cluster's diameter (~ 1°), richness (~ 100 identified members), and distance modulus (6.5-7 mag., putting the G and K dwarfs at B=11-15). Hectochelle will measure the Ca II line strengths of members to establish, for the first time, the chromospheric activity levels of a statistically significant sample of single, G and K dwarfs of this modest age. Hectochelle will also vet background stars for suitability as astrometric reference stars for a forthcoming HST FGS proposal to robustly measure the cluster's distance.

  14. Local bladder cancer clusters in southeastern Michigan accounting for risk factors, covariates and residential mobility.

    PubMed

    Jacquez, Geoffrey M; Shi, Chen; Meliker, Jaymie R

    2015-01-01

    In case control studies disease risk not explained by the significant risk factors is the unexplained risk. Considering unexplained risk for specific populations, places and times can reveal the signature of unidentified risk factors and risk factors not fully accounted for in the case-control study. This potentially can lead to new hypotheses regarding disease causation. Global, local and focused Q-statistics are applied to data from a population-based case-control study of 11 southeast Michigan counties. Analyses were conducted using both year- and age-based measures of time. The analyses were adjusted for arsenic exposure, education, smoking, family history of bladder cancer, occupational exposure to bladder cancer carcinogens, age, gender, and race. Significant global clustering of cases was not found. Such a finding would indicate large-scale clustering of cases relative to controls through time. However, highly significant local clusters were found in Ingham County near Lansing, in Oakland County, and in the City of Jackson, Michigan. The Jackson City cluster was observed in working-ages and is thus consistent with occupational causes. The Ingham County cluster persists over time, suggesting a broad-based geographically defined exposure. Focused clusters were found for 20 industrial sites engaged in manufacturing activities associated with known or suspected bladder cancer carcinogens. Set-based tests that adjusted for multiple testing were not significant, although local clusters persisted through time and temporal trends in probability of local tests were observed. Q analyses provide a powerful tool for unpacking unexplained disease risk from case-control studies. This is particularly useful when the effect of risk factors varies spatially, through time, or through both space and time. For bladder cancer in Michigan, the next step is to investigate causal hypotheses that may explain the excess bladder cancer risk localized to areas of Oakland and Ingham counties, and to the City of Jackson.

  15. K-means cluster analysis of tourist destination in special region of Yogyakarta using spatial approach and social network analysis (a case study: post of @explorejogja instagram account in 2016)

    NASA Astrophysics Data System (ADS)

    Iswandhani, N.; Muhajir, M.

    2018-03-01

    This research was conducted in Department of Statistics Islamic University of Indonesia. The data used are primary data obtained by post @explorejogja instagram account from January until December 2016. In the @explorejogja instagram account found many tourist destinations that can be visited by tourists both in the country and abroad, Therefore it is necessary to form a cluster of existing tourist destinations based on the number of likes from user instagram assumed as the most popular. The purpose of this research is to know the most popular distribution of tourist spot, the cluster formation of tourist destinations, and central popularity of tourist destinations based on @explorejogja instagram account in 2016. Statistical analysis used is descriptive statistics, k-means clustering, and social network analysis. The results of this research were obtained the top 10 most popular destinations in Yogyakarta, map of html-based tourist destination distribution consisting of 121 tourist destination points, formed 3 clusters each consisting of cluster 1 with 52 destinations, cluster 2 with 9 destinations and cluster 3 with 60 destinations, and Central popularity of tourist destinations in the special region of Yogyakarta by district.

  16. Change in personality status in neurotic disorders.

    PubMed

    Seivewright, Helen; Tyrer, Peter; Johnson, Tony

    2002-06-29

    Personality disorders are generally thought not to change by much over time. We assessed the personality status of 202 patients who had a defined diagnostic and statistical manual (DSM)-III neurotic disorder, dysthymia, panic disorder, or generalised anxiety. All patients had had drug and psychological treatment in a randomised controlled trial. 12 years after entry to the study, we reassessed the personality status of 178 (88%) of these patients using the same test (personality assessment schedule). The personality traits of patients in the cluster B flamboyant group (antisocial, histrionic) became significantly less pronounced over 12 years, but those in the cluster A odd, eccentric group (schizoid, schizotypal, paranoid), and the cluster C anxious, fearful group (obsessional, avoidant) became more pronounced. The measure of agreement between baseline and 12-year personality clusters was poor or slight (kappa=0.14, 95% CI 0.04-0.23). Our results suggest that the assumption that personality characteristics do not change with time is incorrect.

  17. Search for a gamma-ray line feature from a group of nearby galaxy clusters with Fermi LAT Pass 8 data

    NASA Astrophysics Data System (ADS)

    Liang, Yun-Feng; Shen, Zhao-Qiang; Li, Xiang; Fan, Yi-Zhong; Huang, Xiaoyuan; Lei, Shi-Jun; Feng, Lei; Liang, En-Wei; Chang, Jin

    2016-05-01

    Galaxy clusters are the largest gravitationally bound objects in the Universe and may be suitable targets for indirect dark matter searches. With 85 months of Fermi LAT Pass 8 publicly available data, we analyze the gamma-ray emission in the direction of 16 nearby galaxy clusters with an unbinned likelihood analysis. No statistically or globally significant γ -ray line feature is identified and a tentative line signal may present at ˜43 GeV . The 95% confidence level upper limits on the velocity-averaged cross section of dark matter particles annihilating into double γ rays (i.e., ⟨σ v ⟩χχ →γ γ) are derived. Unless very optimistic boost factors of dark matter annihilation in these galaxy clusters have been assumed, such constraints are much weaker than the bounds set by the Galactic γ -ray data.

  18. Kinetic energy spectra in thermionic emission from small tungsten cluster anions: evidence for nonclassical electron capture.

    PubMed

    Concina, Bruno; Baguenard, Bruno; Calvo, Florent; Bordas, Christian

    2010-03-14

    The delayed electron emission from small mass-selected anionic tungsten clusters W(n)(-) has been studied for sizes in the range 9 < or = n < or = 21. Kinetic energy spectra have been measured for delays of about 100 ns after laser excitation by a velocity-map imaging spectrometer. They are analyzed in the framework of microreversible statistical theories. The low-energy behavior shows some significant deviations with respect to the classical Langevin capture model, which we interpret as possibly due to the influence of quantum dynamical effects such as tunneling through the centrifugal barrier, rather than shape effects. The cluster temperature has been extracted from both the experimental kinetic energy spectrum and the absolute decay rate. Discrepancies between the two approaches suggest that the sticking probability can be as low as a few percent for the smallest clusters.

  19. Pattern of comorbidity among anxious and odd personality disorders: the case of obsessive-compulsive personality disorder.

    PubMed

    Rossi, A; Marinangeli, M G; Butti, G; Kalyvoka, A; Petruzzi, C

    2000-09-01

    The aim of this study was to examine the pattern of comorbidity among obsessive-compulsive personality disorder (OCPD) and other personality disorders (PDs) in a sample of 400 psychiatric inpatients. PDs were assessed using the Semistructured Clinical Interview for DSM-III-R Personality Disorders (SCID-II). Odds ratios (ORs) were calculated to determine significant comorbidity among OCPD and other axis II disorders. The most elevated odds ratios were found for the cooccurrence of OCPD with cluster A PDs (the "odd" PDs, or paranoid and schizoid PDs). These results are consistent with those of previous studies showing a higher cooccurrence of OCPD with cluster A than with cluster C ("anxious") PDs. In light of these observations, issues associated with the nosologic status of OCPD within the Diagnostic and Statistical Manual of Mental Disorders clustering system remain unsettled.

  20. A Data Analytics Approach to Discovering Unique Microstructural Configurations Susceptible to Fatigue

    NASA Astrophysics Data System (ADS)

    Jha, S. K.; Brockman, R. A.; Hoffman, R. M.; Sinha, V.; Pilchak, A. L.; Porter, W. J.; Buchanan, D. J.; Larsen, J. M.; John, R.

    2018-05-01

    Principal component analysis and fuzzy c-means clustering algorithms were applied to slip-induced strain and geometric metric data in an attempt to discover unique microstructural configurations and their frequencies of occurrence in statistically representative instantiations of a titanium alloy microstructure. Grain-averaged fatigue indicator parameters were calculated for the same instantiation. The fatigue indicator parameters strongly correlated with the spatial location of the microstructural configurations in the principal components space. The fuzzy c-means clustering method identified clusters of data that varied in terms of their average fatigue indicator parameters. Furthermore, the number of points in each cluster was inversely correlated to the average fatigue indicator parameter. This analysis demonstrates that data-driven methods have significant potential for providing unbiased determination of unique microstructural configurations and their frequencies of occurrence in a given volume from the point of view of strain localization and fatigue crack initiation.

  1. Classification of patients based on their evaluation of hospital outcomes: cluster analysis following a national survey in Norway

    PubMed Central

    2013-01-01

    Background A general trend towards positive patient-reported evaluations of hospitals could be taken as a sign that most patients form a homogeneous, reasonably pleased group, and consequently that there is little need for quality improvement. The objective of this study was to explore this assumption by identifying and statistically validating clusters of patients based on their evaluation of outcomes related to overall satisfaction, malpractice and benefit of treatment. Methods Data were collected using a national patient-experience survey of 61 hospitals in the 4 health regions in Norway during spring 2011. Postal questionnaires were mailed to 23,420 patients after their discharge from hospital. Cluster analysis was performed to identify response clusters of patients, based on their responses to single items about overall patient satisfaction, benefit of treatment and perception of malpractice. Results Cluster analysis identified six response groups, including one cluster with systematically poorer evaluation across outcomes (18.5% of patients) and one small outlier group (5.3%) with very poor scores across all outcomes. One-Way ANOVA with post-hoc tests showed that most differences between the six response groups on the three outcome items were significant. The response groups were significantly associated with nine patient-experience indicators (p < 0.001), and all groups were significantly different from each of the other groups on a majority of the patient-experience indicators. Clusters were significantly associated with age, education, self-perceived health, gender, and the degree to write open comments in the questionnaire. Conclusions The study identified five response clusters with distinct patient-reported outcome scores, in addition to a heterogeneous outlier group with very poor scores across all outcomes. The outlier group and the cluster with systematically poorer evaluation across outcomes comprised almost one-quarter of all patients, clearly demonstrating the need to tailor quality initiatives and improve patient-perceived quality in hospitals. More research on patient clustering in patient evaluation is needed, as well as standardization of methodology to increase comparability across studies. PMID:23433450

  2. The role of poverty rate and racial distribution in the geographic clustering of breast cancer survival among older women: a geographic and multilevel analysis.

    PubMed

    Schootman, Mario; Jeffe, Donna B; Lian, Min; Gillanders, William E; Aft, Rebecca

    2009-03-01

    The authors examined disparities in survival among women aged 66 years or older in association with census-tract-level poverty rate, racial distribution, and individual-level factors, including patient-, treatment-, and tumor-related factors, utilization of medical care, and mammography use. They used linked data from the 1992-1999 Surveillance, Epidemiology, and End Results (SEER) programs, 1991-1999 Medicare claims, and the 1990 US Census. A geographic information system and advanced statistics identified areas of increased or reduced breast cancer survival and possible reasons for geographic variation in survival in 2 of the 5 SEER areas studied. In the Detroit, Michigan, area, one geographic cluster of shorter-than-expected breast cancer survival was identified (hazard ratio (HR) = 1.60). An additional area where survival was longer than expected approached statistical significance (HR = 0.4; P = 0.056). In the Atlanta, Georgia, area, one cluster of shorter- (HR = 1.81) and one cluster of longer-than-expected (HR = 0.72) breast cancer survival were identified. Stage at diagnosis and census-tract poverty (and patient's race in Atlanta) explained the geographic variation in breast cancer survival. No geographic clusters were identified in the 3 other SEER programs. Interventions to reduce late-stage breast cancer, focusing on areas of high poverty and targeting African Americans, may reduce disparities in breast cancer survival in the Detroit and Atlanta areas.

  3. On the Determination of Poisson Statistics for Haystack Radar Observations of Orbital Debris

    NASA Technical Reports Server (NTRS)

    Stokely, Christopher L.; Benbrook, James R.; Horstman, Matt

    2007-01-01

    A convenient and powerful method is used to determine if radar detections of orbital debris are observed according to Poisson statistics. This is done by analyzing the time interval between detection events. For Poisson statistics, the probability distribution of the time interval between events is shown to be an exponential distribution. This distribution is a special case of the Erlang distribution that is used in estimating traffic loads on telecommunication networks. Poisson statistics form the basis of many orbital debris models but the statistical basis of these models has not been clearly demonstrated empirically until now. Interestingly, during the fiscal year 2003 observations with the Haystack radar in a fixed staring mode, there are no statistically significant deviations observed from that expected with Poisson statistics, either independent or dependent of altitude or inclination. One would potentially expect some significant clustering of events in time as a result of satellite breakups, but the presence of Poisson statistics indicates that such debris disperse rapidly with respect to Haystack's very narrow radar beam. An exception to Poisson statistics is observed in the months following the intentional breakup of the Fengyun satellite in January 2007.

  4. Cluster mass estimators from CMB temperature and polarization lensing

    NASA Astrophysics Data System (ADS)

    Hu, Wayne; DeDeo, Simon; Vale, Chris

    2007-12-01

    Upcoming Sunyaev Zel'dovich surveys are expected to return ~104 intermediate mass clusters at high redshift. Their average masses must be known to the same accuracy as desired for the dark energy properties. Internal to the surveys, the cosmic microwave background (CMB) potentially provides a source for lensing mass measurements whose distance is precisely known and behind all clusters. We develop statistical mass estimators from six quadratic combinations of CMB temperature and polarization fields that can simultaneously recover large-scale structure and cluster mass profiles. The performance of these estimators on idealized Navarro Frenk White (NFW) clusters suggests that surveys with a ~1' beam and 10\\,\\muK^{\\prime} noise in uncontaminated temperature maps can make a ~10σ detection, or equivalently a ~10% mass measurement for each 103 set of clusters. With internal or external acoustic scale E-polarization measurements, the ET cross-correlation estimator can provide a stringent test for contaminants on a first detection at ~1/3 the significance. For surveys that reach below 3\\,\\muK^{\\prime}, the EB cross-correlation estimator should provide the most precise measurements and potentially the strongest control over contaminants.

  5. Joint Clustering and Component Analysis of Correspondenceless Point Sets: Application to Cardiac Statistical Modeling.

    PubMed

    Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F

    2015-01-01

    Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.

  6. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    NASA Astrophysics Data System (ADS)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  7. Identifying seizure clusters in patients with psychogenic nonepileptic seizures.

    PubMed

    Baird, Grayson L; Harlow, Lisa L; Machan, Jason T; Thomas, Dave; LaFrance, W C

    2017-08-01

    The present study explored how seizure clusters may be defined for those with psychogenic nonepileptic seizures (PNES), a topic for which there is a paucity of literature. The sample was drawn from a multisite randomized clinical trial for PNES; seizure data are from participants' seizure diaries. Three possible cluster definitions were examined: 1) common clinical definition, where ≥3 seizures in a day is considered a cluster, along with two novel statistical definitions, where ≥3 seizures in a day are considered a cluster if the observed number of seizures statistically exceeds what would be expected relative to a patient's: 1) average seizure rate prior to the trial, 2) observed seizure rate for the previous seven days. Prevalence of clusters was 62-68% depending on cluster definition used, and occurrence rate of clusters was 6-19% depending on cluster definition. Based on these data, clusters seem to be common in patients with PNES, and more research is needed to identify if clusters are related to triggers and outcomes. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Spatial temporal clustering for hotspot using kulldorff scan statistic method (KSS): A case in Riau Province

    NASA Astrophysics Data System (ADS)

    Hudjimartsu, S. A.; Djatna, T.; Ambarwari, A.; Apriliantono

    2017-01-01

    The forest fires in Indonesia occurs frequently in the dry season. Almost all the causes of forest fires are caused by the human activity itself. The impact of forest fires is the loss of biodiversity, pollution hazard and harm the economy of surrounding communities. To prevent fires required the method, one of them with spatial temporal clustering. Spatial temporal clustering formed grouping data so that the results of these groupings can be used as initial information on fire prevention. To analyze the fires, used hotspot data as early indicator of fire spot. Hotspot data consists of spatial and temporal dimensions can be processed using the Spatial Temporal Clustering with Kulldorff Scan Statistic (KSS). The result of this research is to the effectiveness of KSS method to cluster spatial hotspot in a case within Riau Province and produces two types of clusters, most cluster and secondary cluster. This cluster can be used as an early fire warning information.

  9. Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

    PubMed

    Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

    2012-01-01

    Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.

  10. Source Evaluation and Trace Metal Contamination in Benthic Sediments from Equatorial Ecosystems Using Multivariate Statistical Techniques

    PubMed Central

    Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.

    2016-01-01

    Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934

  11. Regional Patterns and Spatial Clusters of Nonstationarities in Annual Peak Instantaneous Streamflow

    NASA Astrophysics Data System (ADS)

    White, K. D.; Baker, B.; Mueller, C.; Villarini, G.; Foley, P.; Friedman, D.

    2017-12-01

    Information about hydrologic changes resulting from changes in climate, land use, and land cover is a necessity planning and design or water resources infrastructure. The United States Army Corps of Engineers (USACE) evaluated and selected 12 methods to detect abrupt and slowly varying nonstationarities in records of maximum peak annual flows. They deployed a publicly available tool[1]in 2016 and a guidance document in 2017 to support identification of nonstationarities in a reproducible manner using a robust statistical framework. This statistical framework has now been applied to streamflow records across the continental United States to explore the presence of regional patterns and spatial clusters of nonstationarities in peak annual flow. Incorporating this geographic dimension into the detection of nonstationarities provides valuable insight for the process of attribution of these significant changes. This poster summarizes the methods used and provides the results of the regional analysis. [1] Available here - http://www.corpsclimate.us/ptcih.cfm

  12. Generic Feature Selection with Short Fat Data

    PubMed Central

    Clarke, B.; Chu, J.-H.

    2014-01-01

    SUMMARY Consider a regression problem in which there are many more explanatory variables than data points, i.e., p ≫ n. Essentially, without reducing the number of variables inference is impossible. So, we group the p explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of n, p, classes of statistics, clustering algorithms, penalty terms, and data types. When n is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [n/K] statistics where K is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an Lq norm with high enough q. PMID:25346546

  13. *K-means and cluster models for cancer signatures.

    PubMed

    Kakushadze, Zura; Yu, Willie

    2017-09-01

    We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.

  14. UV properties of hot stars in NGC 6752

    NASA Technical Reports Server (NTRS)

    Altner, Bruce

    1990-01-01

    The UV properties of hot stars found in the center of NGC 6752 are compared with those outside the core. Few, if any, faint sdB stars are found in the central region, whereas they occur in significant numbers far from the core. A statistically complete photographic survey is used to demonstrate that the faint blue stars in NGC 6752 occur in greater numbers with increasing distance form the center, and the International Ultraviolet Explorer (IUE) findings extend this result all the way to the center of the cluster. A similar phenomenon has been observed optically in other clusters, such as M15.

  15. Managing Clustered Data Using Hierarchical Linear Modeling

    ERIC Educational Resources Information Center

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  16. Distribution-based fuzzy clustering of electrical resistivity tomography images for interface detection

    NASA Astrophysics Data System (ADS)

    Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.

    2014-04-01

    A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.

  17. Lymphohaematopoietic system cancer incidence in an urban area near a coke oven plant: an ecological investigation

    PubMed Central

    Parodi, S; Vercelli, M; Stella, A; Stagnaro, E; Valerio, F

    2003-01-01

    Aims: To evaluate the incidence risk of lymphohaematopoietic cancers for the 1986–94 period in Cornigliano, a district of Genoa (Italy), where a coke oven is located a few hundred metres from the residential area. Methods: The whole of Genoa and one of its 25 districts (Rivarolo) were selected as controls. The trend of risk around the coke oven was evaluated via Stone's method, while the geographic pattern of such risks across the Cornigliano district was evaluated by computing full Bayes estimates of standardised incidence ratio (FBE-SIR). Results: In males, elevated relative risks (RR) were observed for all lymphohaematopoietic cancers (RR 1.7 v Rivarolo and 1.6 v Genoa), for NHL (RR 2.4 v Rivarolo and 1.7 v Genoa), and for leukaemia (RR 2.4 v Rivarolo and 1.9 v Genoa). In females, statistically non-significant RR were observed. In males no excess of risk was found close to the coke oven. In females, a rising risk for NHL was observed approaching the plant, although statistical significance was not reached, while the risk for leukaemia was not evaluable due to the small number of cases. Analysis of the geographic pattern of risk suggested the presence of a cluster of NHL in both sexes in the eastern part of the district, where a foundry had been operational until the early 1980s. A cluster of leukaemia cases was observed in males in a northern part of the area, where no major sources of benzene seemed to be present. Conclusions: The estimated risks seem to be slightly or not at all related to the distance from the coke oven. The statistically significant higher risks observed in males for NHL and leukaemia, and the clusters of leukaemia in males and of NHL in both sexes deserve further investigations in order to trace the exposures associated with such risks. PMID:12598665

  18. Laboratory-Based Prospective Surveillance for Community Outbreaks of Shigella spp. in Argentina

    PubMed Central

    Viñas, María R.; Tuduri, Ezequiel; Galar, Alicia; Yih, Katherine; Pichel, Mariana; Stelling, John; Brengi, Silvina P.; Della Gaspera, Anabella; van der Ploeg, Claudia; Bruno, Susana; Rogé, Ariel; Caffer, María I.; Kulldorff, Martin; Galas, Marcelo

    2013-01-01

    Background To implement effective control measures, timely outbreak detection is essential. Shigella is the most common cause of bacterial diarrhea in Argentina. Highly resistant clones of Shigella have emerged, and outbreaks have been recognized in closed settings and in whole communities. We hereby report our experience with an evolving, integrated, laboratory-based, near real-time surveillance system operating in six contiguous provinces of Argentina during April 2009 to March 2012. Methodology To detect localized shigellosis outbreaks timely, we used the prospective space-time permutation scan statistic algorithm of SaTScan, embedded in WHONET software. Twenty three laboratories sent updated Shigella data on a weekly basis to the National Reference Laboratory. Cluster detection analysis was performed at several taxonomic levels: for all Shigella spp., for serotypes within species and for antimicrobial resistance phenotypes within species. Shigella isolates associated with statistically significant signals (clusters in time/space with recurrence interval ≥365 days) were subtyped by pulsed field gel electrophoresis (PFGE) using PulseNet protocols. Principal Findings In three years of active surveillance, our system detected 32 statistically significant events, 26 of them identified before hospital staff was aware of any unexpected increase in the number of Shigella isolates. Twenty-six signals were investigated by PFGE, which confirmed a close relationship among the isolates for 22 events (84.6%). Seven events were investigated epidemiologically, which revealed links among the patients. Seventeen events were found at the resistance profile level. The system detected events of public health importance: infrequent resistance profiles, long-lasting and/or re-emergent clusters and events important for their duration or size, which were reported to local public health authorities. Conclusions/Significance The WHONET-SaTScan system may serve as a model for surveillance and can be applied to other pathogens, implemented by other networks, and scaled up to national and international levels for early detection and control of outbreaks. PMID:24349586

  19. Enabling Comprehension of Patient Subgroups and Characteristics in Large Bipartite Networks: Implications for Precision Medicine

    PubMed Central

    Bhavnani, Suresh K.; Chen, Tianlong; Ayyaswamy, Archana; Visweswaran, Shyam; Bellala, Gowtham; Rohit, Divekar; Kevin E., Bassler

    2017-01-01

    A primary goal of precision medicine is to identify patient subgroups based on their characteristics (e.g., comorbidities or genes) with the goal of designing more targeted interventions. While network visualization methods such as Fruchterman-Reingold have been used to successfully identify such patient subgroups in small to medium sized data sets, they often fail to reveal comprehensible visual patterns in large and dense networks despite having significant clustering. We therefore developed an algorithm called ExplodeLayout, which exploits the existence of significant clusters in bipartite networks to automatically “explode” a traditional network layout with the goal of separating overlapping clusters, while at the same time preserving key network topological properties that are critical for the comprehension of patient subgroups. We demonstrate the utility of ExplodeLayout by visualizing a large dataset extracted from Medicare consisting of readmitted hip-fracture patients and their comorbidities, demonstrate its statistically significant improvement over a traditional layout algorithm, and discuss how the resulting network visualization enabled clinicians to infer mechanisms precipitating hospital readmission in specific patient subgroups. PMID:28815099

  20. Patient-perceived changes in the system of values after cancer diagnosis.

    PubMed

    Greszta, Elżbieta; Siemińska, Maria J

    2011-03-01

    A cross-sectional study investigated changes in patients' value systems following a diagnosis of cancer. Fifty patients at 1 to 6 months following cancer diagnosis, were asked to compare their current values with their recollection of past values. Using the Rokeach Value Survey we obtained statistically significant results showing that twenty-seven out of thirty-six values changed their importance from the patients' perspective: 16 values significantly increased, while 11 values significantly decreased in importance. Changes with respect to nine values were insignificant. We indentified clusters of values increasing in importance the most: Religious morality (Salvation, Forgiving, Helpful, Clean), Personal orientation (Self-Respect, True Friendship, Happiness), Self-constriction (Self-Controlled, Obedient, Honest), Family security (Family Security, Responsible), and Delayed gratification (Wisdom, Inner Harmony). We also observed that the following value clusters decreased in importance: Immediate gratification (An Exciting Life, Pleasure, A Comfortable Life); Self-expansion (Capable, Ambitious, Broadminded), Competence (A Sense of Accomplishment, Imaginative, Intellectual). The remaining values belonged to clusters that as a group changed slightly or not at all. Practical implications of the study are discussed.

  1. Earthquake Predictability: Results From Aggregating Seismicity Data And Assessment Of Theoretical Individual Cases Via Synthetic Data

    NASA Astrophysics Data System (ADS)

    Adamaki, A.; Roberts, R.

    2016-12-01

    For many years an important aim in seismological studies has been forecasting the occurrence of large earthquakes. Despite some well-established statistical behavior of earthquake sequences, expressed by e.g. the Omori law for aftershock sequences and the Gutenburg-Richter distribution of event magnitudes, purely statistical approaches to short-term earthquake prediction have in general not been successful. It seems that better understanding of the processes leading to critical stress build-up prior to larger events is necessary to identify useful precursory activity, if this exists, and statistical analyses are an important tool in this context. There has been considerable debate on the usefulness or otherwise of foreshock studies for short-term earthquake prediction. We investigate generic patterns of foreshock activity using aggregated data and by studying not only strong but also moderate magnitude events. Aggregating empirical local seismicity time series prior to larger events observed in and around Greece reveals a statistically significant increasing rate of seismicity over 20 days prior to M>3.5 earthquakes. This increase cannot be explained by tempo-spatial clustering models such as ETAS, implying genuine changes in the mechanical situation just prior to larger events and thus the possible existence of useful precursory information. Because of tempo-spatial clustering, including aftershocks to foreshocks, even if such generic behavior exists it does not necessarily follow that foreshocks have the potential to provide useful precursory information for individual larger events. Using synthetic catalogs produced based on different clustering models and different presumed system sensitivities we are now investigating to what extent the apparently established generic foreshock rate acceleration may or may not imply that the foreshocks have potential in the context of routine forecasting of larger events. Preliminary results suggest that this is the case, but that it is likely that physically-based models of foreshock clustering will be a necessary, but not necessarily sufficient, basis for successful forecasting.

  2. Analysis of risk factors for cluster behavior of dental implant failures.

    PubMed

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  3. Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

    NASA Astrophysics Data System (ADS)

    Black, Joshua A.; Knowles, Peter J.

    2018-06-01

    The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.

  4. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  5. Clustering, randomness, and regularity in cloud fields. 4. Stratocumulus cloud fields

    NASA Astrophysics Data System (ADS)

    Lee, J.; Chou, J.; Weger, R. C.; Welch, R. M.

    1994-07-01

    To complete the analysis of the spatial distribution of boundary layer cloudiness, the present study focuses on nine stratocumulus Landsat scenes. The results indicate many similarities between stratocumulus and cumulus spatial distributions. Most notably, at full spatial resolution all scenes exhibit a decidedly clustered distribution. The strength of the clustering signal decreases with increasing cloud size; the clusters themselves consist of a few clouds (less than 10), occupy a small percentage of the cloud field area (less than 5%), contain between 20% and 60% of the cloud field population, and are randomly located within the scene. In contrast, stratocumulus in almost every respect are more strongly clustered than are cumulus cloud fields. For instance, stratocumulus clusters contain more clouds per cluster, occupy a larger percentage of the total area, and have a larger percentage of clouds participating in clusters than the corresponding cumulus examples. To investigate clustering at intermediate spatial scales, the local dimensionality statistic is introduced. Results obtained from this statistic provide the first direct evidence for regularity among large (>900 m in diameter) clouds in stratocumulus and cumulus cloud fields, in support of the inhibition hypothesis of Ramirez and Bras (1990). Also, the size compensated point-to-cloud cumulative distribution function statistic is found to be necessary to obtain a consistent description of stratocumulus cloud distributions. A hypothesis regarding the underlying physical mechanisms responsible for cloud clustering is presented. It is suggested that cloud clusters often arise from 4 to 10 triggering events localized within regions less than 2 km in diameter and randomly distributed within the cloud field. As the size of the cloud surpasses the scale of the triggering region, the clustering signal weakens and the larger cloud locations become more random.

  6. Clustering, randomness, and regularity in cloud fields. 4: Stratocumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Lee, J.; Chou, J.; Weger, R. C.; Welch, R. M.

    1994-01-01

    To complete the analysis of the spatial distribution of boundary layer cloudiness, the present study focuses on nine stratocumulus Landsat scenes. The results indicate many similarities between stratocumulus and cumulus spatial distributions. Most notably, at full spatial resolution all scenes exhibit a decidedly clustered distribution. The strength of the clustering signal decreases with increasing cloud size; the clusters themselves consist of a few clouds (less than 10), occupy a small percentage of the cloud field area (less than 5%), contain between 20% and 60% of the cloud field population, and are randomly located within the scene. In contrast, stratocumulus in almost every respect are more strongly clustered than are cumulus cloud fields. For instance, stratocumulus clusters contain more clouds per cluster, occupy a larger percentage of the total area, and have a larger percentage of clouds participating in clusters than the corresponding cumulus examples. To investigate clustering at intermediate spatial scales, the local dimensionality statistic is introduced. Results obtained from this statistic provide the first direct evidence for regularity among large (more than 900 m in diameter) clouds in stratocumulus and cumulus cloud fields, in support of the inhibition hypothesis of Ramirez and Bras (1990). Also, the size compensated point-to-cloud cumulative distribution function statistic is found to be necessary to obtain a consistent description of stratocumulus cloud distributions. A hypothesis regarding the underlying physical mechanisms responsible for cloud clustering is presented. It is suggested that cloud clusters often arise from 4 to 10 triggering events localized within regions less than 2 km in diameter and randomly distributed within the cloud field. As the size of the cloud surpasses the scale of the triggering region, the clustering signal weakens and the larger cloud locations become more random.

  7. Transcriptome profiling analysis reveals biomarkers in colon cancer samples of various differentiation

    PubMed Central

    Yu, Tonghu; Zhang, Huaping; Qi, Hong

    2018-01-01

    The aim of the present study was to investigate more colon cancer-related genes in different stages. Gene expression profile E-GEOD-62932 was extracted for differentially expressed gene (DEG) screening. Series test of cluster analysis was used to obtain significant trending models. Based on the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes databases, functional and pathway enrichment analysis were processed and a pathway relation network was constructed. Gene co-expression network and gene signal network were constructed for common DEGs. The DEGs with the same trend were clustered and in total, 16 clusters with statistical significance were obtained. The screened DEGs were enriched into small molecule metabolic process and metabolic pathways. The pathway relation network was constructed with 57 nodes. A total of 328 common DEGs were obtained. Gene signal network was constructed with 71 nodes. Gene co-expression network was constructed with 161 nodes and 211 edges. ABCD3, CPT2, AGL and JAM2 are potential biomarkers for the diagnosis of colon cancer. PMID:29928385

  8. Characterization of spatial and temporal variability in hydrochemistry of Johor Straits, Malaysia.

    PubMed

    Abdullah, Pauzi; Abdullah, Sharifah Mastura Syed; Jaafar, Othman; Mahmud, Mastura; Khalik, Wan Mohd Afiq Wan Mohd

    2015-12-15

    Characterization of hydrochemistry changes in Johor Straits within 5 years of monitoring works was successfully carried out. Water quality data sets (27 stations and 19 parameters) collected in this area were interpreted subject to multivariate statistical analysis. Cluster analysis grouped all the stations into four clusters ((Dlink/Dmax) × 100<90) and two clusters ((Dlink/Dmax) × 100<80) for site and period similarities. Principal component analysis rendered six significant components (eigenvalue>1) that explained 82.6% of the total variance of the data set. Classification matrix of discriminant analysis assigned 88.9-92.6% and 83.3-100% correctness in spatial and temporal variability, respectively. Times series analysis then confirmed that only four parameters were not significant over time change. Therefore, it is imperative that the environmental impact of reclamation and dredging works, municipal or industrial discharge, marine aquaculture and shipping activities in this area be effectively controlled and managed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Epidemiological characteristics of reported sporadic and outbreak cases of E. coli O157 in people from Alberta, Canada (2000-2002): methodological challenges of comparing clustered to unclustered data.

    PubMed

    Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A

    2008-04-01

    Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.

  10. Effects of seismic intensity and socioeconomic status on injury and displacement after the 2007 Peru earthquake.

    PubMed

    Milch, Karen; Gorokhovich, Yuri; Doocy, Shannon

    2010-10-01

    Earthquakes are a major cause of displacement, particularly in developing countries. Models of injury and displacement can be applied to assist governments and aid organisations in effectively targeting preparedness and relief efforts. A stratified cluster survey was conducted in January 2008 to evaluate risk factors for injury and displacement following the 15 August 2007 earthquake in southern Peru. In statistical modelling, seismic intensity, distance to rupture, living conditions, and educational attainment collectively explained 54.9 per cent of the variability in displacement rates across clusters. Living conditions was a particularly significant predictor of injury and displacement, indicating a strong relationship between risk and socioeconomic status. Contrary to expectations, urban, periurban, and rural clusters did not exhibit significantly different injury and displacement rates. Proxies of socioeconomic status, particularly the living conditions index score, proved relevant in explaining displacement, likely due to unmeasured aspects of housing construction practices and building materials. © 2010 The Author(s). Journal compilation © Overseas Development Institute, 2010.

  11. Statistical Analysis of Large Scale Structure by the Discrete Wavelet Transform

    NASA Astrophysics Data System (ADS)

    Pando, Jesus

    1997-10-01

    The discrete wavelet transform (DWT) is developed as a general statistical tool for the study of large scale structures (LSS) in astrophysics. The DWT is used in all aspects of structure identification including cluster analysis, spectrum and two-point correlation studies, scale-scale correlation analysis and to measure deviations from Gaussian behavior. The techniques developed are demonstrated on 'academic' signals, on simulated models of the Lymanα (Lyα) forests, and on observational data of the Lyα forests. This technique can detect clustering in the Ly-α clouds where traditional techniques such as the two-point correlation function have failed. The position and strength of these clusters in both real and simulated data is determined and it is shown that clusters exist on scales as large as at least 20 h-1 Mpc at significance levels of 2-4 σ. Furthermore, it is found that the strength distribution of the clusters can be used to distinguish between real data and simulated samples even where other traditional methods have failed to detect differences. Second, a method for measuring the power spectrum of a density field using the DWT is developed. All common features determined by the usual Fourier power spectrum can be calculated by the DWT. These features, such as the index of a power law or typical scales, can be detected even when the samples are geometrically complex, the samples are incomplete, or the mean density on larger scales is not known (the infrared uncertainty). Using this method the spectra of Ly-α forests in both simulated and real samples is calculated. Third, a method for measuring hierarchical clustering is introduced. Because hierarchical evolution is characterized by a set of rules of how larger dark matter halos are formed by the merging of smaller halos, scale-scale correlations of the density field should be one of the most sensitive quantities in determining the merging history. We show that these correlations can be completely determined by the correlations between discrete wavelet coefficients on adjacent scales and at nearly the same spatial position, Cj,j+12/cdot2. Scale-scale correlations on two samples of the QSO Ly-α forests absorption spectra are computed. Lastly, higher order statistics are developed to detect deviations from Gaussian behavior. These higher order statistics are necessary to fully characterize the Ly-α forests because the usual 2nd order statistics, such as the two-point correlation function or power spectrum, give inconclusive results. It is shown how this technique takes advantage of the locality of the DWT to circumvent the central limit theorem. A non-Gaussian spectrum is defined and this spectrum reveals not only the magnitude, but the scales of non-Gaussianity. When applied to simulated and observational samples of the Ly-α clouds, it is found that different popular models of structure formation have different spectra while two, independent observational data sets, have the same spectra. Moreover, the non-Gaussian spectra of real data sets are significantly different from the spectra of various possible random samples. (Abstract shortened by UMI.)

  12. An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

    PubMed Central

    Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.

    2017-01-01

    Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422

  13. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

    PubMed

    Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2017-04-01

    Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  14. Cluster Subcutaneous Allergen Specific Immunotherapy for the Treatment of Allergic Rhinitis: A Systematic Review and Meta-Analysis

    PubMed Central

    Sun, Yueqi; Luo, Xi; Li, Huabin

    2014-01-01

    Background Although allergen specific immunotherapy (SIT) represents the only immune- modifying and curative option available for patients with allergic rhinitis (AR), the optimal schedule for specific subcutaneous immunotherapy (SCIT) is still unknown. The objective of this study is to systematically assess the efficacy and safety of cluster SCIT for patients with AR. Methods By searching PubMed, EMBASE and the Cochrane clinical trials database from 1980 through May 10th, 2013, we collected and analyzed the randomized controlled trials (RCTs) of cluster SCIT to assess its efficacy and safety. Results Eight trials involving 567 participants were included in this systematic review. Our meta-analysis showed that cluster SCIT have similar effect in reduction of both rhinitis symptoms and the requirement for anti-allergic medication compared with conventional SCIT, but when comparing cluster SCIT with placebo, no statistic significance were found in reduction of symptom scores or medication scores. Some caution is required in this interpretation as there was significant heterogeneity between studies. Data relating to Rhinoconjunctivitis Quality of Life Questionnaire (RQLQ) in 3 included studies were analyzed, which consistently point to the efficacy of cluster SCIT in improving quality of life compared to placebo. To assess the safety of cluster SCIT, meta-analysis showed that no differences existed in the incidence of either local adverse reaction or systemic adverse reaction between the cluster group and control group. Conclusion Based on the current limited evidence, we still could not conclude affirmatively that cluster SCIT was a safe and efficacious option for the treatment of AR patients. Further large-scale, well-designed RCTs on this topic are still needed. PMID:24489740

  15. Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study

    PubMed Central

    Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

    2018-01-01

    Abstract Background A huge amount of literature suggests that adolescents’ health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. Methods The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students’ data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Results Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: ‘Alcohol drinking’, ‘Smoking’, ‘Physical activity’, ‘Screen time’, ‘Signs & symptoms’, ‘Healthy eating’, ‘Violence’ and ‘Sweet tooth’. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Conclusions Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany. PMID:27908972

  16. Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study.

    PubMed

    Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

    2018-03-01

    A huge amount of literature suggests that adolescents' health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students' data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: 'Alcohol drinking', 'Smoking', 'Physical activity', 'Screen time', 'Signs & symptoms', 'Healthy eating', 'Violence' and 'Sweet tooth'. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany.

  17. Sulfur in Cometary Dust

    NASA Technical Reports Server (NTRS)

    Fomenkova, M. N.

    1997-01-01

    The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.

  18. Assessing the Milky Way Satellites Associated with the Sagittarius Dwarf Spheroidal Galaxy

    NASA Astrophysics Data System (ADS)

    Law, David R.; Majewski, Steven R.

    2010-08-01

    Numerical models of the tidal disruption of the Sagittarius (Sgr) dwarf galaxy have recently been developed that for the first time simultaneously satisfy most observational constraints on the angular position, distance, and radial velocity trends of both leading and trailing tidal streams emanating from the dwarf. We use these dynamical models in combination with extant three-dimensional position and velocity data for Galactic globular clusters and dSph galaxies to identify those Milky Way satellites that are likely to have originally formed in the gravitational potential well of the Sgr dwarf, and have been stripped from Sgr during its extended interaction with the Milky Way. We conclude that the globular clusters Arp 2, M 54, NGC 5634, Terzan 8, and Whiting 1 are almost certainly associated with the Sgr dwarf, and that Berkeley 29, NGC 5053, Pal 12, and Terzan 7 are likely to be as well (albeit at lower confidence). The initial Sgr system therefore may have contained five to nine globular clusters, corresponding to a specific frequency SN = 5-9 for an initial Sgr luminosity MV = -15.0. Our result is consistent with the 8 ± 2 genuine Sgr globular clusters expected on the basis of statistical modeling of the Galactic globular cluster distribution and the corresponding false-association rate due to chance alignments with the Sgr streams. The globular clusters identified as most likely to be associated with Sgr are consistent with previous reconstructions of the Sgr age-metallicity relation, and show no evidence for a second-parameter effect shaping their horizontal branch morphologies. We find no statistically significant evidence to suggest that any of the recently discovered population of ultrafaint dwarf galaxies are associated with the Sgr tidal streams, but are unable to rule out this possibility conclusively for all systems.

  19. TURBULENCE-INDUCED RELATIVE VELOCITY OF DUST PARTICLES. IV. THE COLLISION KERNEL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pan, Liubin; Padoan, Paolo, E-mail: lpan@cfa.harvard.edu, E-mail: ppadoan@icc.ub.edu

    Motivated by its importance for modeling dust particle growth in protoplanetary disks, we study turbulence-induced collision statistics of inertial particles as a function of the particle friction time, τ{sub p}. We show that turbulent clustering significantly enhances the collision rate for particles of similar sizes with τ{sub p} corresponding to the inertial range of the flow. If the friction time, τ{sub p,} {sub h}, of the larger particle is in the inertial range, the collision kernel per unit cross section increases with increasing friction time, τ{sub p,} {sub l}, of the smaller particle and reaches the maximum at τ{sub p,}more » {sub l} = τ{sub p,} {sub h}, where the clustering effect peaks. This feature is not captured by the commonly used kernel formula, which neglects the effect of clustering. We argue that turbulent clustering helps alleviate the bouncing barrier problem for planetesimal formation. We also investigate the collision velocity statistics using a collision-rate weighting factor to account for higher collision frequency for particle pairs with larger relative velocity. For τ{sub p,} {sub h} in the inertial range, the rms relative velocity with collision-rate weighting is found to be invariant with τ{sub p,} {sub l} and scales with τ{sub p,} {sub h} roughly as ∝ τ{sub p,h}{sup 1/2}. The weighting factor favors collisions with larger relative velocity, and including it leads to more destructive and less sticking collisions. We compare two collision kernel formulations based on spherical and cylindrical geometries. The two formulations give consistent results for the collision rate and the collision-rate weighted statistics, except that the spherical formulation predicts more head-on collisions than the cylindrical formulation.« less

  20. Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

    PubMed

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification.

  1. An assessment of the effects of cell size on AGNPS modeling of watershed runoff

    USGS Publications Warehouse

    Wu, S.-S.; Usery, E.L.; Finn, M.P.; Bosch, D.D.

    2008-01-01

    This study investigates the changes in simulated watershed runoff from the Agricultural NonPoint Source (AGNPS) pollution model as a function of model input cell size resolution for eight different cell sizes (30 m, 60 m, 120 m, 210 m, 240 m, 480 m, 960 m, and 1920 m) for the Little River Watershed (Georgia, USA). Overland cell runoff (area-weighted cell runoff), total runoff volume, clustering statistics, and hot spot patterns were examined for the different cell sizes and trends identified. Total runoff volumes decreased with increasing cell size. Using data sets of 210-m cell size or smaller in conjunction with a representative watershed boundary allows one to model the runoff volumes within 0.2 percent accuracy. The runoff clustering statistics decrease with increasing cell size; a cell size of 960 m or smaller is necessary to indicate significant high-runoff clustering. Runoff hot spot areas have a decreasing trend with increasing cell size; a cell size of 240 m or smaller is required to detect important hot spots. Conclusions regarding cell size effects on runoff estimation cannot be applied to local watershed areas due to the inconsistent changes of runoff volume with cell size; but, optimal cells sizes for clustering and hot spot analyses are applicable to local watershed areas due to the consistent trends.

  2. The Abundance of Large Arcs From CLASH

    NASA Astrophysics Data System (ADS)

    Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Coe, Dan A.; Clash Team

    2015-01-01

    We have developed an automated arc-finding algorithm to perform a rigorous comparison of the observed and simulated abundance of large lensed background galaxies (a.k.a arcs). We use images from the CLASH program to derive our observed arc abundance. Simulated CLASH images are created by performing ray tracing through mock clusters generated by the N-body simulation calibrated tool -- MOKA, and N-body/hydrodynamic simulations -- MUSIC, over the same mass and redshift range as the CLASH X-ray selected sample. We derive a lensing efficiency of 15 ± 3 arcs per cluster for the X-ray selected CLASH sample and 4 ± 2 arcs per cluster for the simulated sample. The marginally significant difference (3.0 σ) between the results for the observations and the simulations can be explained by the systematically smaller area with magnification larger than 3 (by a factor of ˜4) in both MOKA and MUSIC mass models relative to those derived from the CLASH data. Accounting for this difference brings the observed and simulated arc statistics into full agreement. We find that the source redshift distribution does not have big impact on the arc abundance but the arc abundance is very sensitive to the concentration of the dark matter halos. Our results suggest that the solution to the "arc statistics problem" lies primarily in matching the cluster dark matter distribution.

  3. An investigation on thermal patterns in Iran based on spatial autocorrelation

    NASA Astrophysics Data System (ADS)

    Fallah Ghalhari, Gholamabbas; Dadashi Roudbari, Abbasali

    2018-02-01

    The present study aimed at investigating temporal-spatial patterns and monthly patterns of temperature in Iran using new spatial statistical methods such as cluster and outlier analysis, and hotspot analysis. To do so, climatic parameters, monthly average temperature of 122 synoptic stations, were assessed. Statistical analysis showed that January with 120.75% had the most fluctuation among the studied months. Global Moran's Index revealed that yearly changes of temperature in Iran followed a strong spatially clustered pattern. Findings showed that the biggest thermal cluster pattern in Iran, 0.975388, occurred in May. Cluster and outlier analyses showed that thermal homogeneity in Iran decreases in cold months, while it increases in warm months. This is due to the radiation angle and synoptic systems which strongly influence thermal order in Iran. The elevations, however, have the most notable part proved by Geographically weighted regression model. Iran's thermal analysis through hotspot showed that hot thermal patterns (very hot, hot, and semi-hot) were dominant in the South, covering an area of 33.5% (about 552,145.3 km2). Regions such as mountain foot and low lands lack any significant spatial autocorrelation, 25.2% covering about 415,345.1 km2. The last is the cold thermal area (very cold, cold, and semi-cold) with about 25.2% covering about 552,145.3 km2 of the whole area of Iran.

  4. Alerts in electronic medical records to promote a colorectal cancer screening programme: a cluster randomised controlled trial in primary care.

    PubMed

    Guiriguet, Carolina; Muñoz-Ortiz, Laura; Burón, Andrea; Rivero, Irene; Grau, Jaume; Vela-Vallespín, Carmen; Vilarrubí, Mercedes; Torres, Miquel; Hernández, Cristina; Méndez-Boo, Leonardo; Toràn, Pere; Caballeria, Llorenç; Macià, Francesc; Castells, Antoni

    2016-07-01

    Participation rates in colorectal cancer screening are below recommended European targets. To evaluate the effectiveness of an alert in primary care electronic medical records (EMRs) to increase individuals' participation in an organised, population-based colorectal cancer screening programme when compared with usual care. Cluster randomised controlled trial in primary care centres of Barcelona, Spain. Participants were males and females aged 50-69 years, who were invited to the first round of a screening programme based on the faecal immunochemical test (FIT) (n = 41 042), and their primary care professional. The randomisation unit was the physician cluster (n = 130) and patients were blinded to the study group. The control group followed usual care as per the colorectal cancer screening programme. In the intervention group, as well as usual care, an alert to health professionals (cluster level) to promote screening was introduced in the individual's primary care EMR for 1 year. The main outcome was colorectal cancer screening participation at individual participant level. In total, 67 physicians and 21 619 patients (intervention group) and 63 physicians and 19 423 patients (control group) were randomised. In the intention-to-treat analysis screening participation was 44.1% and 42.2% respectively (odds ratio 1.08, 95% confidence interval [CI] = 0.97 to 1.20, P = 0.146). However, in the per-protocol analysis screening uptake in the intervention group showed a statistically significant increase, after adjusting for potential confounders (OR, 1.11; 95% CI = 1.02 to 1.22; P = 0.018). The use of an alert in an individual's primary care EMR is associated with a statistically significant increased uptake of an organised, FIT-based colorectal cancer screening programme in patients attending primary care centres. © British Journal of General Practice 2016.

  5. Counts of galaxy clusters as cosmological probes: the impact of baryonic physics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balaguera-Antolínez, Andrés; Porciani, Cristiano, E-mail: abalan@astro.uni-bonn.de, E-mail: porciani@astro.uni-bonn.de

    2013-04-01

    The halo mass function from N-body simulations of collisionless matter is generally used to retrieve cosmological parameters from observed counts of galaxy clusters. This neglects the observational fact that the baryonic mass fraction in clusters is a random variable that, on average, increases with the total mass (within an overdensity of 500). Considering a mock catalog that includes tens of thousands of galaxy clusters, as expected from the forthcoming generation of surveys, we show that the effect of a varying baryonic mass fraction will be observable with high statistical significance. The net effect is a change in the overall normalizationmore » of the cluster mass function and a milder modification of its shape. Our results indicate the necessity of taking into account baryonic corrections to the mass function if one wants to obtain unbiased estimates of the cosmological parameters from data of this quality. We introduce the formalism necessary to accomplish this goal. Our discussion is based on the conditional probability of finding a given value of the baryonic mass fraction for clusters of fixed total mass. Finally, we show that combining information from the cluster counts with measurements of the baryonic mass fraction in a small subsample of clusters (including only a few tens of objects) will nearly optimally constrain the cosmological parameters.« less

  6. Response to traumatic brain injury neurorehabilitation through an artificial intelligence and statistics hybrid knowledge discovery from databases methodology.

    PubMed

    Gibert, Karina; García-Rudolph, Alejandro; García-Molina, Alberto; Roig-Rovira, Teresa; Bernabeu, Montse; Tormos, José María

    2008-01-01

    Develop a classificatory tool to identify different populations of patients with Traumatic Brain Injury based on the characteristics of deficit and response to treatment. A KDD framework where first, descriptive statistics of every variable was done, data cleaning and selection of relevant variables. Then data was mined using a generalization of Clustering based on rules (CIBR), an hybrid AI and Statistics technique which combines inductive learning (AI) and clustering (Statistics). A prior Knowledge Base (KB) is considered to properly bias the clustering; semantic constraints implied by the KB hold in final clusters, guaranteeing interpretability of the resultis. A generalization (Exogenous Clustering based on rules, ECIBR) is presented, allowing to define the KB in terms of variables which will not be considered in the clustering process itself, to get more flexibility. Several tools as Class panel graph are introduced in the methodology to assist final interpretation. A set of 5 classes was recommended by the system and interpretation permitted profiles labeling. From the medical point of view, composition of classes is well corresponding with different patterns of increasing level of response to rehabilitation treatments. All the patients initially assessable conform a single group. Severe impaired patients are subdivided in four profiles which clearly distinct response patterns. Particularly interesting the partial response profile, where patients could not improve executive functions. Meaningful classes were obtained and, from a semantics point of view, the results were sensibly improved regarding classical clustering, according to our opinion that hybrid AI & Stats techniques are more powerful for KDD than pure ones.

  7. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    ERIC Educational Resources Information Center

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  8. ICAP: An Interactive Cluster Analysis Procedure for analyzing remotely sensed data. [to classify the radiance data to produce a thematic map

    NASA Technical Reports Server (NTRS)

    Wharton, S. W.

    1980-01-01

    An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets.

  9. A new approach for the assessment of temporal clustering of extratropical wind storms

    NASA Astrophysics Data System (ADS)

    Schuster, Mareike; Eddounia, Fadoua; Kuhnel, Ivan; Ulbrich, Uwe

    2017-04-01

    A widely-used methodology to assess the clustering of storms in a region is based on dispersion statistics of a simple homogeneous Poisson process. This clustering measure is determined by the ratio of the variance and the mean of the local storm statistics per grid point. Resulting values larger than 1, i.e. when the variance is larger than the mean, indicate clustering; while values lower than 1 indicate a sequencing of storms that is more regular than a random process. However, a disadvantage of this methodology is that the characteristics are valid for a pre-defined climatological time period, and it is not possible to identify a temporal variability of clustering. Also, the absolute value of the dispersion statistics is not particularly intuitive. We have developed an approach to describe temporal clustering of storms which offers a more intuitive comprehension, and at the same time allows to assess temporal variations. The approach is based on the local distribution of waiting times between the occurrence of two individual storm events, the former being computed through the post-processing of individual windstorm tracks which in turn are obtained by an objective tracking algorithm. Based on this distribution a threshold can be set, either by the waiting time expected from a random process or by a quantile of the observed distribution. Thus, it can be determined if two consecutive wind storm events count as part of a (temporal) cluster. We analyze extratropical wind storms in a reanalysis dataset and compare the results of the traditional clustering measure with our new methodology. We assess what range of clustering events (in terms of duration and frequency) is covered and identify if the historically known clustered seasons are detectable by the new clustering measure in the reanalysis.

  10. Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

    PubMed Central

    Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

    2012-01-01

    Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450

  11. The Detection of Clusters with Spatial Heterogeneity

    ERIC Educational Resources Information Center

    Zhang, Zuoyi

    2011-01-01

    This thesis consists of two parts. In Chapter 2, we focus on the spatial scan statistics with overdispersion and Chapter 3 is devoted to the randomized permutation test for identifying local patterns of spatial association. The spatial scan statistic has been widely used in spatial disease surveillance and spatial cluster detection. To apply it, a…

  12. Order statistics applied to the most massive and most distant galaxy clusters

    NASA Astrophysics Data System (ADS)

    Waizmann, J.-C.; Ettori, S.; Bartelmann, M.

    2013-06-01

    In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.

  13. Spatial modelling and mapping of female genital mutilation in Kenya

    PubMed Central

    2014-01-01

    Background Female genital mutilation/cutting (FGM/C) is still prevalent in several communities in Kenya and other areas in Africa, as well as being practiced by some migrants from African countries living in other parts of the world. This study aimed at detecting clustering of FGM/C in Kenya, and identifying those areas within the country where women still intend to continue the practice. A broader goal of the study was to identify geographical areas where the practice continues unabated and where broad intervention strategies need to be introduced. Methods The prevalence of FGM/C was investigated using the 2008 Kenya Demographic and Health Survey (KDHS) data. The 2008 KDHS used a multistage stratified random sampling plan to select women of reproductive age (15–49 years) and asked questions concerning their FGM/C status and their support for the continuation of FGM/C. A spatial scan statistical analysis was carried out using SaTScan™ to test for statistically significant clustering of the practice of FGM/C in the country. The risk of FGM/C was also modelled and mapped using a hierarchical spatial model under the Integrated Nested Laplace approximation approach using the INLA library in R. Results The prevalence of FGM/C stood at 28.2% and an estimated 10.3% of the women interviewed indicated that they supported the continuation of FGM. On the basis of the Deviance Information Criterion (DIC), hierarchical spatial models with spatially structured random effects were found to best fit the data for both response variables considered. Age, region, rural–urban classification, education, marital status, religion, socioeconomic status and media exposure were found to be significantly associated with FGM/C. The current FGM/C status of a woman was also a significant predictor of support for the continuation of FGM/C. Spatial scan statistics confirm FGM clusters in the North-Eastern and South-Western regions of Kenya (p < 0.001). Conclusion This suggests that the fight against FGM/C in Kenya is not yet over. There are still deep cultural and religious beliefs to be addressed in a bid to eradicate the practice. Interventions by government and other stakeholders must address these challenges and target the identified clusters. PMID:24661558

  14. 20 Years Spatial-Temporal Analysis of Dengue Fever and Hemorrhagic Fever in Mexico.

    PubMed

    Hernández-Gaytán, Sendy Isarel; Díaz-Vásquez, Francisco Javier; Duran-Arenas, Luis Gerardo; López Cervantes, Malaquías; Rothenberg, Stephen J

    2017-10-01

    Dengue Fever (DF) is a human vector-borne disease and a major public health problem worldwide. In Mexico, DF and Dengue Hemorrhagic Fever (DHF) cases have increased in recent years. The aim of this study was to identify variations in the spatial distribution of DF and DHF cases over time using space-time statistical analysis and geographic information systems. Official data of DF and DHF cases were obtained in 32 states from 1995-2015. Space-time scan statistics were used to determine the space-time clusters of DF and DHF cases nationwide, and a geographic information system was used to display the location of clusters. A total of 885,748 DF cases was registered of which 13.4% (n = 119,174) correspond to DHF in the 32 states from 1995-2015. The most likely cluster of DF (relative risk = 25.5) contained the states of Jalisco, Colima, and Nayarit, on the Pacific coast in 2009, and the most likely cluster of DHF (relative risk = 8.5) was in the states of Chiapas, Tabasco, Campeche, Oaxaca, Veracruz, Quintana Roo, Yucatán, Puebla, Morelos, and Guerrero principally on the Gulf coast over 2006-2015. The geographic distribution of DF and DHF cases has increased in recent years and cases are significantly clustered in two coastal areas (Pacific and Gulf of Mexico). This provides the basis for further investigation of risk factors as well as interventions in specific areas. Copyright © 2018 IMSS. Published by Elsevier Inc. All rights reserved.

  15. Multimorbidity and health-related quality of life (HRQoL) in a nationally representative population sample: implications of count versus cluster method for defining multimorbidity on HRQoL.

    PubMed

    Wang, Lili; Palmer, Andrew J; Cocker, Fiona; Sanderson, Kristy

    2017-01-09

    No universally accepted definition of multimorbidity (MM) exists, and implications of different definitions have not been explored. This study examined the performance of the count and cluster definitions of multimorbidity on the sociodemographic profile and health-related quality of life (HRQoL) in a general population. Data were derived from the nationally representative 2007 Australian National Survey of Mental Health and Wellbeing (n = 8841). The HRQoL scores were measured using the Assessment of Quality of Life (AQoL-4D) instrument. The simple count (2+ & 3+ conditions) and hierarchical cluster methods were used to define/identify clusters of multimorbidity. Linear regression was used to assess the associations between HRQoL and multimorbidity as defined by the different methods. The assessment of multimorbidity, which was defined using the count method, resulting in the prevalence of 26% (MM2+) and 10.1% (MM3+). Statistically significant clusters identified through hierarchical cluster analysis included heart or circulatory conditions (CVD)/arthritis (cluster-1, 9%) and major depressive disorder (MDD)/anxiety (cluster-2, 4%). A sensitivity analysis suggested that the stability of the clusters resulted from hierarchical clustering. The sociodemographic profiles were similar between MM2+, MM3+ and cluster-1, but were different from cluster-2. HRQoL was negatively associated with MM2+ (β: -0.18, SE: -0.01, p < 0.001), MM3+ (β: -0.23, SE: -0.02, p < 0.001), cluster-1 (β: -0.10, SE: 0.01, p < 0.001) and cluster-2 (β: -0.36, SE: 0.01, p < 0.001). Our findings confirm the existence of an inverse relationship between multimorbidity and HRQoL in the Australian population and indicate that the hierarchical clustering approach is validated when the outcome of interest is HRQoL from this head-to-head comparison. Moreover, a simple count fails to identify if there are specific conditions of interest that are driving poorer HRQoL. Researchers should exercise caution when selecting a definition of multimorbidity because it may significantly influence the study outcomes.

  16. Dispersed or clustered housing for adults with intellectual disability: a systematic review.

    PubMed

    Mansell, Jim; Beadle-Brown, Julie

    2009-12-01

    The purpose of this review was to evaluate the available research on the quality and costs of dispersed community-based housing when compared with clustered housing. Searches against specified criteria yielded 19 papers based on 10 studies presenting data comparing dispersed housing with some kind of clustered housing (village communities, residential campuses, or clusters of houses). The studies reported the experience of nearly 2,500 people from four different countries. In five of eight quality of life domains there were no studies reporting benefits of clustered settings. In respect of interpersonal relations, emotional, and physical well-being, clustered settings had some advantages. However, in many of these cases the better results refer only to village communities and not to campus housing or clustered housing. In terms of costs, clustered housing was usually less expensive because of lower staffing levels. In two of the three studies that examined costs controlling for user characteristics, there was no statistically significant difference. Dispersed housing appears to be superior to clustered housing on the majority of quality indicators studied. The only exception to this is that village communities for people with less severe disabilities have some benefits; this is not, however, a model which can be feasibly provided for everyone. Clustered housing is usually less expensive than dispersed housing but this is because it provides fewer staff hours per person. There is no evidence that clustered housing can deliver the same quality of life as dispersed housing at a lower cost.

  17. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality

    PubMed Central

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M

    2008-01-01

    Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. Conclusion The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. Method We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit. PMID:18992163

  18. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality.

    PubMed

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M

    2008-11-07

    Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit.

  19. Role of childhood traumatic experience in personality disorders in China.

    PubMed

    Zhang, TianHong; Chow, Annabelle; Wang, LanLan; Dai, YunFei; Xiao, ZePing

    2012-08-01

    There has been no large-scale examination of the association between types of childhood abuse and personality disorders (PDs) in China using standardized assessment tools and the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria. Hence, this study aimed to explore the relationship between retrospective reports of various types of childhood maltreatments and current DSM-IV PDs in a clinical population in China, Shanghai. One thousand four hundred two subjects were randomly sampled from the Shanghai Psychological Counselling Centre. PDs were assessed using the Personality Diagnostic Questionnaire, Fourth Edition Plus. Participants were also interviewed using the Structured Clinical Interview for DSM-IV axis II. The Child Trauma Questionnaire (CTQ) was used to assess childhood maltreatment in 5 domains (emotional abuse, physical abuse, sexual abuse, emotional neglect, and physical neglect). According to Pearson correlations, childhood maltreatment had a strong association with most PDs. Subsequently, using partial correlations, significant relationships were also demonstrated between cluster B PDs and all the traumatic factors except physical neglect. A strongest positive correlation was found between cluster B PD and CTQ total scores (r = .312, P < .01). Using the Kruskal-Wallis rank sum test, significant differences in 4 groups of subjects (clusters A, B, and C PD and non-PD) in terms of emotional abuse (χ(2) = 34.864, P < .01), physical abuse (χ(2) = 14.996, P < .05), sex abuse (χ(2) = 9.211, P < .05), and emotional neglect (χ(2) = 17.987, P < .01) were found. Stepwise regression analysis indicated that emotional abuse and emotional neglect were predictive for clusters A and B PD, and sexual abuse was highly predictive for cluster B PD; only emotional neglect was predictive for cluster C PD. Early traumatic experiences are strongly related to the development of PDs. The effects of childhood maltreatment in the 3 clusters of PDs are different. Childhood trauma has the most significant impact on cluster B PD. Copyright © 2012 Elsevier Inc. All rights reserved.

  20. EXPLORING ANTICORRELATIONS AND LIGHT ELEMENT VARIATIONS IN NORTHERN GLOBULAR CLUSTERS OBSERVED BY THE APOGEE SURVEY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mészáros, Szabolcs; Martell, Sarah L.; Shetrone, Matthew

    We investigate the light-element behavior of red giant stars in northern globular clusters (GCs) observed by the SDSS-III Apache Point Observatory Galactic Evolution Experiment. We derive abundances of 9 elements (Fe, C, N, O, Mg, Al, Si, Ca, and Ti) for 428 red giant stars in 10 GCs. The intrinsic abundance range relative to measurement errors is examined, and the well-known C–N and Mg–Al anticorrelations are explored using an extreme-deconvolution code for the first time in a consistent way. We find that Mg and Al drive the population membership in most clusters, except in M107 and M71, the two mostmore » metal-rich clusters in our study, where the grouping is most sensitive to N. We also find a diversity in the abundance distributions, with some clusters exhibiting clear abundance bimodalities (for example M3 and M53) while others show extended distributions. The spread of Al abundances increases significantly as cluster average metallicity decreases as previously found by other works, which we take as evidence that low metallicity, intermediate mass AGB polluters were more common in the more metal-poor clusters. The statistically significant correlation of [Al/Fe] with [Si/Fe] in M15 suggests that {sup 28}Si leakage has occurred in this cluster. We also present C, N, and O abundances for stars cooler than 4500 K and examine the behavior of A(C+N+O) in each cluster as a function of temperature and [Al/Fe]. The scatter of A(C+N+O) is close to its estimated uncertainty in all clusters and independent of stellar temperature. A(C+N+O) exhibits small correlations and anticorrelations with [Al/Fe] in M3 and M13, but we cannot be certain about these relations given the size of our abundance uncertainties. Star-to-star variations of α-element (Si, Ca, Ti) abundances are comparable to our estimated errors in all clusters.« less

  1. Exploring Anticorrelations and Light Element Variations in Northern Globular Clusters Observed by the APOGEE Survey

    NASA Astrophysics Data System (ADS)

    Mészáros, Szabolcs; Martell, Sarah L.; Shetrone, Matthew; Lucatello, Sara; Troup, Nicholas W.; Bovy, Jo; Cunha, Katia; García-Hernández, Domingo A.; Overbeek, Jamie C.; Allende Prieto, Carlos; Beers, Timothy C.; Frinchaboy, Peter M.; García Pérez, Ana E.; Hearty, Fred R.; Holtzman, Jon; Majewski, Steven R.; Nidever, David L.; Schiavon, Ricardo P.; Schneider, Donald P.; Sobeck, Jennifer S.; Smith, Verne V.; Zamora, Olga; Zasowski, Gail

    2015-05-01

    We investigate the light-element behavior of red giant stars in northern globular clusters (GCs) observed by the SDSS-III Apache Point Observatory Galactic Evolution Experiment. We derive abundances of 9 elements (Fe, C, N, O, Mg, Al, Si, Ca, and Ti) for 428 red giant stars in 10 GCs. The intrinsic abundance range relative to measurement errors is examined, and the well-known C-N and Mg-Al anticorrelations are explored using an extreme-deconvolution code for the first time in a consistent way. We find that Mg and Al drive the population membership in most clusters, except in M107 and M71, the two most metal-rich clusters in our study, where the grouping is most sensitive to N. We also find a diversity in the abundance distributions, with some clusters exhibiting clear abundance bimodalities (for example M3 and M53) while others show extended distributions. The spread of Al abundances increases significantly as cluster average metallicity decreases as previously found by other works, which we take as evidence that low metallicity, intermediate mass AGB polluters were more common in the more metal-poor clusters. The statistically significant correlation of [Al/Fe] with [Si/Fe] in M15 suggests that 28Si leakage has occurred in this cluster. We also present C, N, and O abundances for stars cooler than 4500 K and examine the behavior of A(C+N+O) in each cluster as a function of temperature and [Al/Fe]. The scatter of A(C+N+O) is close to its estimated uncertainty in all clusters and independent of stellar temperature. A(C+N+O) exhibits small correlations and anticorrelations with [Al/Fe] in M3 and M13, but we cannot be certain about these relations given the size of our abundance uncertainties. Star-to-star variations of α-element (Si, Ca, Ti) abundances are comparable to our estimated errors in all clusters.

  2. Statistical analysis of 4 types of neck whiplash injuries based on classical meridian theory.

    PubMed

    Chen, Yemeng; Zhao, Yan; Xue, Xiaolin; Li, Hui; Wu, Xiuyan; Zhang, Qunce; Zheng, Xin; Wang, Tianfang

    2015-01-01

    As one component of the Chinese medicine meridian system, the meridian sinew (Jingjin, (see text), tendino-musculo) is specially described as being for acupuncture treatment of the musculoskeletal system because of its dynamic attributes and tender point correlations. In recent decades, the therapeutic importance of the sinew meridian has become revalued in clinical application. Based on this theory, the authors have established therapeutic strategies of acupuncture treatment in Whiplash-Associated Disorders (WAD) by categorizing four types of neck symptom presentations. The advantage of this new system is to make it much easier for the clinician to find effective acupuncture points. This study attempts to prove the significance of the proposed therapeutic strategies by analyzing data collected from a clinical survey of various WAD using non-supervised statistical methods, such as correlation analysis, factor analysis, and cluster analysis. The clinical survey data have successfully verified discrete characteristics of four neck syndromes, based upon the range of motion (ROM) and tender point location findings. A summary of the relationships among the symptoms of the four neck syndromes has shown the correlation coefficient as having a statistical significance (P < 0.01 or P < 0.05), especially with regard to ROM. Furthermore, factor and cluster analyses resulted in a total of 11 categories of general symptoms, which implies syndrome factors are more related to the Liver, as originally described in classical theory. The hypothesis of meridian sinew syndromes in WAD is clearly supported by the statistical analysis of the clinical trials. This new discovery should be beneficial in improving therapeutic outcomes.

  3. Cluster mislocation in kinematic Sunyaev-Zel'dovich effect extraction

    NASA Astrophysics Data System (ADS)

    Calafut, Victoria; Bean, Rachel; Yu, Byeonghee

    2017-12-01

    We investigate the impact of a variety of analysis assumptions that influence cluster identification and location on the kinematic Sunyaev-Zel'dovich (kSZ) pairwise momentum signal and covariance estimation. Photometric and spectroscopic galaxy tracers from SDSS, WISE, and DECaLs, spanning redshifts 0.05

  4. Risk factors for pulmonary cavitation in tuberculosis patients from China.

    PubMed

    Zhang, Liqun; Pang, Yu; Yu, Xia; Wang, Yufeng; Lu, Jie; Gao, Mengqiu; Huang, Hairong; Zhao, Yanlin

    2016-10-12

    Pulmonary cavitation is one of the most frequently observed clinical characteristics in tuberculosis (TB). The objective of this study was to investigate the potential risk factors associated with cavitary TB in China. A total of 385 smear-positive patients were enrolled in the study, including 192 (49.9%) patients with cavitation as determined by radiographic findings. Statistical analysis revealed that the distribution of patients with diabetes in the cavitary group was significantly higher than that in the non-cavitary group (adjusted odds ratio (OR) (95% confidence interval (CI)):12.08 (5.75-25.35), P<0.001). Similarly, we also found that the proportion of individuals with multidrug-resistant TB in the cavitary group was also higher than that in the non-cavitary group (adjusted OR (95% CI): 2.48 (1.52-4.07), P<0.001). Of the 385 Mycobacterium tuberculosis strains, 330 strains (85.7%) were classified as the Beijing genotype, which included 260 strains that belonged to the modern Beijing sublineage and 70 to the ancient Beijing sublineage. In addition, there were 80 and 31 strains belonging to large and small clusters, respectively. Statistical analysis revealed that cavitary disease was observed more frequently among the large clusters than the small clusters (P=0.037). In conclusion, our findings demonstrate that diabetes and multidrug resistance are risk factors associated with cavitary TB. In addition, there was no significant difference in the cavitary presentation between patients infected with the Beijing genotype strains and those infected with the non-Beijing genotype strains.

  5. [Mortality from Suicide in the Municipalities of Mainland Portugal: Spatio-Temporal Evolution between 1980 and 2015].

    PubMed

    Loureiro, Adriana; Almendra, Ricardo; Costa, Cláudia; Santana, Paula

    2018-01-31

    Suicide is considered a public health priority. It is a complex phenomenon resulting from the interaction of several factors, which do not depend solely on individual conditions. This study analyzes the spatio-temporal evolution of suicide mortality between 1980 and 2015, identifying areas of high risk, and their variation, in the 278 municipalities of Continental Portugal. Based on the number of self-inflicted injuries and deaths from suicide and the resident population, the spatio-temporal evolution of the suicide mortality rate was assessed via: i) a Poisson joinpoint regression model, and ii) spatio-temporal clustering methods. The suicide mortality rate evolution showed statistically significant increases over three periods (1980 - 1984; 1999 - 2002 and 2006 - 2015) and two statistically significant periods of decrease (1984 - 1995 and 1995 - 1999). The spatio-temporal analysis identified five clusters of high suicide risk (relative risk >1) and four clusters of low suicide risk (relative risk < 1). The periods when suicide mortality increases seem to overlap with times of economic and financial instability. The geographical pattern of suicide risk has changed: presently, the suicide rates from the municipalities in the Center and North are showing more similarity with those seen in the South, thus increasing the ruralization of the phenomenon of suicide. Between 1980 and 2015 the spacio-temporal pattern of mortality from suicide has been changing and is a phenomenon that is currently experiencing a growing trend (since 2006) and is of higher risk in rural areas.

  6. Cluster-based analysis improves predictive validity of spike-triggered receptive field estimates

    PubMed Central

    Malone, Brian J.

    2017-01-01

    Spectrotemporal receptive field (STRF) characterization is a central goal of auditory physiology. STRFs are often approximated by the spike-triggered average (STA), which reflects the average stimulus preceding a spike. In many cases, the raw STA is subjected to a threshold defined by gain values expected by chance. However, such correction methods have not been universally adopted, and the consequences of specific gain-thresholding approaches have not been investigated systematically. Here, we evaluate two classes of statistical correction techniques, using the resulting STRF estimates to predict responses to a novel validation stimulus. The first, more traditional technique eliminated STRF pixels (time-frequency bins) with gain values expected by chance. This correction method yielded significant increases in prediction accuracy, including when the threshold setting was optimized for each unit. The second technique was a two-step thresholding procedure wherein clusters of contiguous pixels surviving an initial gain threshold were then subjected to a cluster mass threshold based on summed pixel values. This approach significantly improved upon even the best gain-thresholding techniques. Additional analyses suggested that allowing threshold settings to vary independently for excitatory and inhibitory subfields of the STRF resulted in only marginal additional gains, at best. In summary, augmenting reverse correlation techniques with principled statistical correction choices increased prediction accuracy by over 80% for multi-unit STRFs and by over 40% for single-unit STRFs, furthering the interpretational relevance of the recovered spectrotemporal filters for auditory systems analysis. PMID:28877194

  7. Improved dengue fever prevention through innovative intervention methods in the city of Salto, Uruguay.

    PubMed

    Basso, César; García da Rosa, Elsa; Romero, Sonnia; González, Cristina; Lairihoy, Rosario; Roche, Ingrid; Caffera, Ruben M; da Rosa, Ricardo; Calfani, Marisel; Alfonso-Sierra, Eduardo; Petzold, Max; Kroeger, Axel; Sommerfeld, Johannes

    2015-02-01

    Uruguay is located at the southern border of Aedes aegypti distribution on the South American sub-continent. The reported dengue cases in the country are all imported from surrounding countries. One of the cities at higher risk of local dengue transmission is Salto, a border city with heavy traffic from dengue endemic areas. We completed an intervention study using a cluster randomized trial design in 20 randomly selected 'clusters' in Salto. The clusters were located in neighborhoods of differing geography and economic, cultural and social aspects. Entomological surveys were carried out to measure the impact of the intervention on vector densities. Through participatory processes of all stakeholders, an appropriate ecosystem management intervention was defined. Residents collected the abundant small water holding containers and the Ministry of Public Health and the Municipality of Salto were responsible for collecting and eliminating them. Additional vector breeding places were large water tanks; they were either altered so that they could not hold water any more or covered so that oviposition by mosquitoes could not take place. The response from the community and national programme managers was encouraging. The intervention evidenced opportunities for cost savings and reducing dengue vector densities (although not to statistically significant levels). The observed low vector density limits the potential reduction due to the intervention. A larger sample size is needed to obtain a statistically significant difference. © The author 2015. The World Health Organization has granted Oxford University Press permission for the reproduction of this article.

  8. Assessing market uncertainty by means of a time-varying intermittency parameter for asset price fluctuations

    NASA Astrophysics Data System (ADS)

    Rypdal, Martin; Sirnes, Espen; Løvsletten, Ola; Rypdal, Kristoffer

    2013-08-01

    Maximum likelihood estimation techniques for multifractal processes are applied to high-frequency data in order to quantify intermittency in the fluctuations of asset prices. From time records as short as one month these methods permit extraction of a meaningful intermittency parameter λ characterising the degree of volatility clustering. We can therefore study the time evolution of volatility clustering and test the statistical significance of this variability. By analysing data from the Oslo Stock Exchange, and comparing the results with the investment grade spread, we find that the estimates of λ are lower at times of high market uncertainty.

  9. Exploring the individual patterns of spiritual well-being in people newly diagnosed with advanced cancer: a cluster analysis.

    PubMed

    Bai, Mei; Dixon, Jane; Williams, Anna-Leila; Jeon, Sangchoon; Lazenby, Mark; McCorkle, Ruth

    2016-11-01

    Research shows that spiritual well-being correlates positively with quality of life (QOL) for people with cancer, whereas contradictory findings are frequently reported with respect to the differentiated associations between dimensions of spiritual well-being, namely peace, meaning and faith, and QOL. This study aimed to examine individual patterns of spiritual well-being among patients newly diagnosed with advanced cancer. Cluster analysis was based on the twelve items of the 12-item Functional Assessment of Chronic Illness Therapy-Spiritual Well-Being Scale at Time 1. A combination of hierarchical and k-means (non-hierarchical) clustering methods was employed to jointly determine the number of clusters. Self-rated health, depressive symptoms, peace, meaning and faith, and overall QOL were compared at Time 1 and Time 2. Hierarchical and k-means clustering methods both suggested four clusters. Comparison of the four clusters supported statistically significant and clinically meaningful differences in QOL outcomes among clusters while revealing contrasting relations of faith with QOL. Cluster 1, Cluster 3, and Cluster 4 represented high, medium, and low levels of overall QOL, respectively, with correspondingly high, medium, and low levels of peace, meaning, and faith. Cluster 2 was distinguished from other clusters by its medium levels of overall QOL, peace, and meaning and low level of faith. This study provides empirical support for individual difference in response to a newly diagnosed cancer and brings into focus conceptual and methodological challenges associated with the measure of spiritual well-being, which may partly contribute to the attenuated relation between faith and QOL.

  10. Molecular dynamics study of Al and Ni 3Al sputtering by Al clusters bombardment

    NASA Astrophysics Data System (ADS)

    Zhurkin, Eugeni E.; Kolesnikov, Anton S.

    2002-06-01

    The sputtering of Al and Ni 3Al (1 0 0) surfaces induced by impact of Al ions and Al N clusters ( N=2,4,6,9,13,55) with energies of 100 and 500 eV/atom is studied at atomic scale by means of classical molecular dynamics (MD). The MD code we used implements many-body tight binding potential splined to ZBL at short distances. Special attention has been paid to model dense cascades: we used quite big computation cells with lateral periodic and damped boundary conditions. In addition, long simulation times (10-25 ps) and representative statistics (up to 1000 runs per each case) were considered. The total sputtering yields, energy and time spectrums of sputtered particles, as well as preferential sputtering of compound target were analyzed, both in the linear and non-linear regimes. The significant "cluster enhancement" of sputtering yield was found for cluster sizes N⩾13. In parallel, we estimated collision cascade features depending on cluster size in order to interpret the nature of observed non-linear effects.

  11. Permutation Tests of Hierarchical Cluster Analyses of Carrion Communities and Their Potential Use in Forensic Entomology.

    PubMed

    van der Ham, Joris L

    2016-05-19

    Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Using data mining to segment healthcare markets from patients' preference perspectives.

    PubMed

    Liu, Sandra S; Chen, Jie

    2009-01-01

    This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with average linkage and Pearson correlation procedures are employed and compared to show how each procedure best determines segmentation variables. Data mining tools identified three differentiable segments by means of cluster analysis. These three clusters have significantly different demographic profiles. The study reveals, when compared with traditional statistical methods, that data mining provides an efficient and effective tool for market segmentation. When there are numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis for reducing variables to clearly and meaningfully understand clusters. Interests and applications in data mining are increasing in many businesses. However, this technology is seldom applied to healthcare customer experience management. The paper shows that efficient and effective application of data mining methods can aid the understanding of patient healthcare preferences.

  13. High-resolution Spectroscopic Abundances of Red Giant Branch Stars in NGC 6584 and NGC 7099

    NASA Astrophysics Data System (ADS)

    O’Malley, Erin M.; Chaboyer, Brian

    2018-04-01

    We obtain high-resolution spectra of red giant branch stars in NGC 6584 and NGC 7099 to perform a detailed abundance analysis. We confirm cluster membership for these stars based on consistent radial velocities measured in this study and small pixel offsets between the observations of Sarajedini et al. and Piotto et al. We find mean metallicities of [Fe/H] = ‑1.53 ± 0.08 dex and [Fe/H] = ‑2.29 ± 0.07 dex for NGC 6584 and NGC 7099, respectively. We also find these clusters to be enhanced in their [α/Fe] ratios, consistent with what is expected for metal-poor globular clusters. Additionally, we find evidence of a statistically significant Na–O anti-correlation in both clusters. Finally, with the use of HST photometry, we compare the location of the enhanced and pristine populations in chromosome maps of the clusters to confirm previous photometric evidence of multiple stellar populations. Although we cannot confirm the nature of the polluter stars responsible for the abundance differences, our results can be used to constrain pollution models.

  14. Socio-Spatial Patterning of Off-Sale and On-Sale Alcohol Outlets in a Texas City

    PubMed Central

    Han, Daikwon; Gorman, Dennis M.

    2014-01-01

    Introduction and Aims To examine the socio-spatial patterning of off-sale and on-sale alcohol outlets following a policy change that ended prohibition of off-sale outlets in Lubbock, Texas. Design and Methods The spatial patterning of alcohol outlets by licensing type was examined using the k-function difference (D statistic) to compare the relative degree of spatial aggregation of the two types of alcohol outlets and by the spatial scan statistic to identify statistically significant geographic clusters of outlets. The sociodemographic characteristics of the areas containing clusters of outlets were compared to the rest of the city. In addition, the socioeconomic characteristics of census block groups with and without existing on-sale outlets were compared, as were the socioeconomic characteristics of census block groups with and without the newly issued off-sale licenses. Results The existing on-sale premises in Lubbock and the newly established off-sale premises introduced as a result of the 2009 policy change displayed different spatial patterns, with the latter being more spatially dispersed. A large cluster of on-sale outlets identified in the north-east of the city was located in a socially and economically disadvantaged area of the city. Discussion and Conclusion The findings support the view that it is important to understand the local context of deprivation within a city when examining the location of alcohol outlets and add to the existing research by drawing attention to the importance of geographic scale in assessing such relationships. PMID:24320205

  15. Socio-spatial patterning of off-sale and on-sale alcohol outlets in a Texas city.

    PubMed

    Han, Daikwon; Gorman, Dennis M

    2014-03-01

    To examine the socio-spatial patterning of off-sale and on-sale alcohol outlets following a policy change that ended prohibition of off-sale outlets in Lubbock, Texas. The spatial patterning of alcohol outlets by licensing type was examined using the k-function difference (D statistic) to compare the relative degree of spatial aggregation of the two types of alcohol outlets and by the spatial scan statistic to identify statistically significant geographic clusters of outlets. The sociodemographic characteristics of the areas containing clusters of outlets were compared with the rest of the city. In addition, the socioeconomic characteristics of census block groups with and without existing on-sale outlets were compared, as were the socioeconomic characteristics of census block groups with and without the newly issued off-sale licenses. The existing on-sale premises in Lubbock and the newly established off-sale premises introduced as a result of the 2009 policy change displayed different spatial patterns, with the latter being more spatially dispersed. A large cluster of on-sale outlets identified in the north-east of the city was located in a socially and economically disadvantaged area of the city. The findings support the view that it is important to understand the local context of deprivation within a city when examining the location of alcohol outlets and add to the existing research by drawing attention to the importance of geographic scale in assessing such relationships. © 2013 Australasian Professional Society on Alcohol and other Drugs.

  16. Pooled Genome-Wide Analysis to Identify Novel Risk Loci for Pediatric Allergic Asthma

    PubMed Central

    Ricci, Giampaolo; Astolfi, Annalisa; Remondini, Daniel; Cipriani, Francesca; Formica, Serena; Dondi, Arianna; Pession, Andrea

    2011-01-01

    Background Genome-wide association studies of pooled DNA samples were shown to be a valuable tool to identify candidate SNPs associated to a phenotype. No such study was up to now applied to childhood allergic asthma, even if the very high complexity of asthma genetics is an appropriate field to explore the potential of pooled GWAS approach. Methodology/Principal Findings We performed a pooled GWAS and individual genotyping in 269 children with allergic respiratory diseases comparing allergic children with and without asthma. We used a modular approach to identify the most significant loci associated with asthma by combining silhouette statistics and physical distance method with cluster-adapted thresholding. We found 97% concordance between pooled GWAS and individual genotyping, with 36 out of 37 top-scoring SNPs significant at individual genotyping level. The most significant SNP is located inside the coding sequence of C5, an already identified asthma susceptibility gene, while the other loci regulate functions that are relevant to bronchial physiopathology, as immune- or inflammation-mediated mechanisms and airway smooth muscle contraction. Integration with gene expression data showed that almost half of the putative susceptibility genes are differentially expressed in experimental asthma mouse models. Conclusion/Significance Combined silhouette statistics and cluster-adapted physical distance threshold analysis of pooled GWAS data is an efficient method to identify candidate SNP associated to asthma development in an allergic pediatric population. PMID:21359210

  17. Prevalence and clustering of soil-transmitted helminth infections in a tribal area in southern India.

    PubMed

    Kaliappan, Saravanakumar Puthupalayam; George, Santosh; Francis, Mark Rohit; Kattula, Deepthi; Sarkar, Rajiv; Minz, Shantidani; Mohan, Venkata Raghava; George, Kuryan; Roy, Sheela; Ajjampur, Sitara Swarna Rao; Muliyil, Jayaprakash; Kang, Gagandeep

    2013-12-01

    To estimate the prevalence, spatial patterns and clustering in the distribution of soil-transmitted helminth (STH) infections, and factors associated with hookworm infections in a tribal population in Tamil Nadu, India. Cross-sectional study with one-stage cluster sampling of 22 clusters. Demographic and risk factor data and stool samples for microscopic ova/cysts examination were collected from 1237 participants. Geographical information systems mapping assessed spatial patterns of infection. The overall prevalence of STH was 39% (95% CI 36%–42%), with hookworm 38% (95% CI 35–41%) and Ascaris lumbricoides 1.5% (95% CI 0.8–2.2%). No Trichuris trichiura infection was detected. People involved in farming had higher odds of hookworm infection (1.68, 95% CI 1.31–2.17, P < 0.001). In the multiple logistic regression, adults (2.31, 95% CI 1.80–2.96, P < 0.001), people with pet cats (1.55, 95% CI 1.10–2.18, P = 0.011) and people who did not wash their hands with soap after defecation (1.84, 95% CI 1.27–2.67, P = 0.001) had higher odds of hookworm infection, but gender and poor usage of foot wear did not significantly increase risk. Cluster analysis, based on design effect calculation, did not show any clustering of cases among the study population; however, spatial scan statistic detected a significant cluster for hookworm infections in one village. Multiple approaches including health education, improving the existing sanitary practices and regular preventive chemotherapy are needed to control the burden of STH in similar endemic areas.

  18. An Information-Theoretic-Cluster Visualization for Self-Organizing Maps.

    PubMed

    Brito da Silva, Leonardo Enzo; Wunsch, Donald C

    2018-06-01

    Improved data visualization will be a significant tool to enhance cluster analysis. In this paper, an information-theoretic-based method for cluster visualization using self-organizing maps (SOMs) is presented. The information-theoretic visualization (IT-vis) has the same structure as the unified distance matrix, but instead of depicting Euclidean distances between adjacent neurons, it displays the similarity between the distributions associated with adjacent neurons. Each SOM neuron has an associated subset of the data set whose cardinality controls the granularity of the IT-vis and with which the first- and second-order statistics are computed and used to estimate their probability density functions. These are used to calculate the similarity measure, based on Renyi's quadratic cross entropy and cross information potential (CIP). The introduced visualizations combine the low computational cost and kernel estimation properties of the representative CIP and the data structure representation of a single-linkage-based grouping algorithm to generate an enhanced SOM-based visualization. The visual quality of the IT-vis is assessed by comparing it with other visualization methods for several real-world and synthetic benchmark data sets. Thus, this paper also contains a significant literature survey. The experiments demonstrate the IT-vis cluster revealing capabilities, in which cluster boundaries are sharply captured. Additionally, the information-theoretic visualizations are used to perform clustering of the SOM. Compared with other methods, IT-vis of large SOMs yielded the best results in this paper, for which the quality of the final partitions was evaluated using external validity indices.

  19. Correlates of the molecular vaginal microbiota composition of African women.

    PubMed

    Gautam, Raju; Borgdorff, Hanneke; Jespers, Vicky; Francis, Suzanna C; Verhelst, Rita; Mwaura, Mary; Delany-Moretlwe, Sinead; Ndayisaba, Gilles; Kyongo, Jordan K; Hardy, Liselotte; Menten, Joris; Crucitti, Tania; Tsivtsivadze, Evgeni; Schuren, Frank; van de Wijgert, Janneke H H M

    2015-02-21

    Sociodemographic, behavioral and clinical correlates of the vaginal microbiome (VMB) as characterized by molecular methods have not been adequately studied. VMB dominated by bacteria other than lactobacilli may cause inflammation, which may facilitate HIV acquisition and other adverse reproductive health outcomes. We characterized the VMB of women in Kenya, Rwanda, South Africa and Tanzania (KRST) using a 16S rDNA phylogenetic microarray. Cytokines were quantified in cervicovaginal lavages. Potential sociodemographic, behavioral, and clinical correlates were also evaluated. Three hundred thirteen samples from 230 women were available for analysis. Five VMB clusters were identified: one cluster each dominated by Lactobacillus crispatus (KRST-I) and L. iners (KRST-II), and three clusters not dominated by a single species but containing multiple (facultative) anaerobes (KRST-III/IV/V). Women in clusters KRST-I and II had lower mean concentrations of interleukin (IL)-1α (p < 0.001) and Granulocyte Colony Stimulating Factor (G-CSF) (p = 0.01), but higher concentrations of interferon-γ-induced protein (IP-10) (p < 0.01) than women in clusters KRST-III/IV/V. A lower proportion of women in cluster KRST-I tested positive for bacterial sexually transmitted infections (STIs; ptrend = 0.07) and urinary tract infection (UTI; p = 0.06), and a higher proportion of women in clusters KRST-I and II had vaginal candidiasis (ptrend = 0.09), but these associations did not reach statistical significance. Women who reported unusual vaginal discharge were more likely to belong to clusters KRST-III/IV/V (p = 0.05). Vaginal dysbiosis in African women was significantly associated with vaginal inflammation; the associations with increased prevalence of STIs and UTI, and decreased prevalence of vaginal candidiasis, should be confirmed in larger studies.

  20. The association between content of the elements S, Cl, K, Fe, Cu, Zn and Br in normal and cirrhotic liver tissue from Danes and Greenlandic Inuit examined by dual hierarchical clustering analysis.

    PubMed

    Laursen, Jens; Milman, Nils; Pind, Niels; Pedersen, Henrik; Mulvad, Gert

    2014-01-01

    Meta-analysis of previous studies evaluating associations between content of elements sulphur (S), chlorine (Cl), potassium (K), iron (Fe), copper (Cu), zinc (Zn) and bromine (Br) in normal and cirrhotic autopsy liver tissue samples. Normal liver samples from 45 Greenlandic Inuit, median age 60 years and from 71 Danes, median age 61 years. Cirrhotic liver samples from 27 Danes, median age 71 years. Element content was measured using X-ray fluorescence spectrometry. Dual hierarchical clustering analysis, creating a dual dendrogram, one clustering element contents according to calculated similarities, one clustering elements according to correlation coefficients between the element contents, both using Euclidian distance and Ward Procedure. One dendrogram separated subjects in 7 clusters showing no differences in ethnicity, gender or age. The analysis discriminated between elements in normal and cirrhotic livers. The other dendrogram clustered elements in four clusters: sulphur and chlorine; copper and bromine; potassium and zinc; iron. There were significant correlations between the elements in normal liver samples: S was associated with Cl, K, Br and Zn; Cl with S and Br; K with S, Br and Zn; Cu with Br. Zn with S and K. Br with S, Cl, K and Cu. Fe did not show significant associations with any other element. In contrast to simple statistical methods, which analyses content of elements separately one by one, dual hierarchical clustering analysis incorporates all elements at the same time and can be used to examine the linkage and interplay between multiple elements in tissue samples. Copyright © 2013 Elsevier GmbH. All rights reserved.

  1. Social influences on health-related behaviour clustering during adulthood in two British birth cohort studies.

    PubMed

    Mawditt, Claire; Sacker, Amanda; Britton, Annie; Kelly, Yvonne; Cable, Noriko

    2018-05-01

    Building upon evidence linking socio-economic position (SEP) in childhood and adulthood with health-related behaviours (HRB) in adulthood, we examined how pre-adolescent SEP predicted membership of three HRB clusters: "Risky", "Moderate Smokers" and "Mainstream" (the latter pattern consisting of more beneficial HRBs), that were detected in our previous work. Data were taken from two British cohorts (born in 1958 and 1970) in pre-adolescence (age 11 and 10, respectively) and adulthood (age 33 and 34). SEP constructs in pre-adolescence and adulthood were derived through Confirmatory Factor Analysis. Conceptualised paths from pre-adolescent SEP to HRB cluster membership via adult SEP in our path models were tested for statistical significance separately by gender and cohort. Adult SEP mediated the path between pre-adolescent SEP and adult HRB clusters. More disadvantaged SEP in pre-adolescence predicted more disadvantaged SEP in adulthood which was associated with membership of the "Risky" and "Moderate Smokers" clusters compared to the "Mainstream" cluster. For example, large positive indirect effects between pre-adolescent SEP and adult HRB via adult SEP were present (coefficient 1958 Women = 0.39; 1970 Women = 0.36, 1958 Men = 0.51; 1970 Men = 0.39; p < 0.01) when comparing "Risky" and "Mainstream" cluster membership. Amongst men we found a small significant direct association (p < 0.001) between pre-adolescent SEP and HRB cluster membership. Our findings suggest that associations between adult SEP and HRBs are not likely to be pre-determined by earlier social circumstances, providing optimism for interventions relevant to reducing social gradients in HRBs. Observing consistent findings across the cohorts implies the social patterning of adult lifestyles may persist across time. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  2. On the blind use of statistical tools in the analysis of globular cluster stars

    NASA Astrophysics Data System (ADS)

    D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco

    2018-04-01

    As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.

  3. Origin of Pareto-like spatial distributions in ecosystems.

    PubMed

    Manor, Alon; Shnerb, Nadav M

    2008-12-31

    Recent studies of cluster distribution in various ecosystems revealed Pareto statistics for the size of spatial colonies. These results were supported by cellular automata simulations that yield robust criticality for endogenous pattern formation based on positive feedback. We show that this patch statistics is a manifestation of the law of proportionate effect. Mapping the stochastic model to a Markov birth-death process, the transition rates are shown to scale linearly with cluster size. This mapping provides a connection between patch statistics and the dynamics of the ecosystem; the "first passage time" for different colonies emerges as a powerful tool that discriminates between endogenous and exogenous clustering mechanisms. Imminent catastrophic shifts (such as desertification) manifest themselves in a drastic change of the stability properties of spatial colonies.

  4. Effect of telecare on use of health and social care services: findings from the Whole Systems Demonstrator cluster randomised trial

    PubMed Central

    Steventon, Adam; Bardsley, Martin; Billings, John; Dixon, Jennifer; Doll, Helen; Beynon, Michelle; Hirani, Shashi; Cartwright, Martin; Rixon, Lorna; Knapp, Martin; Henderson, Catherine; Rogers, Anne; Hendy, Jane; Fitzpatrick, Ray; Newman, Stanton

    2013-01-01

    Objective: to assess the impact of telecare on the use of social and health care. Part of the evaluation of the Whole Systems Demonstrator trial. Participants and setting: a total of 2,600 people with social care needs were recruited from 217 general practices in three areas in England. Design: a cluster randomised trial comparing telecare with usual care, general practice being the unit of randomisation. Participants were followed up for 12 months and analyses were conducted as intention-to-treat. Data sources: trial data were linked at the person level to administrative data sets on care funded at least in part by local authorities or the National Health Service. Main outcome measures: the proportion of people admitted to hospital within 12 months. Secondary endpoints included mortality, rates of secondary care use (seven different metrics), contacts with general practitioners and practice nurses, proportion of people admitted to permanent residential or nursing care, weeks in domiciliary social care and notional costs. Results: 46.8% of intervention participants were admitted to hospital, compared with 49.2% of controls. Unadjusted differences were not statistically significant (odds ratio: 0.90, 95% CI: 0.75–1.07, P = 0.211). They reached statistical significance after adjusting for baseline covariates, but this was not replicated when adjusting for the predictive risk score. Secondary metrics including impacts on social care use were not statistically significant. Conclusions: telecare as implemented in the Whole Systems Demonstrator trial did not lead to significant reductions in service use, at least in terms of results assessed over 12 months. International Standard Randomised Controlled Trial Number Register ISRCTN43002091. PMID:23443509

  5. X-ray emission from a complete sample of Abell clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Briel, Ulrich G.; Henry, J. Patrick

    1993-11-01

    The ROSAT All-Sky Survey (RASS) is used to investigate the X-ray properties of a complete sample of Abell clusters with measured redshifts and accurate positions. The sample comprises the 145 clusters within a 561 square degree region at high galactic latitude. The mean redshift is 0.17. This sample is especially well suited to be studied within the RASS since the mean exposure time is higher than average and the mean galactic column density is very low. These together produce a flux limit of about 4.2 x 10-13 erg/sq cm/s in the 0.5 to 2.5 keV energy band. Sixty-six (46%) individual clusters are detected at a significance level higher than 99.7% of which 7 could be chance coincidences of background or foreground sources. At redshifts greater than 0.3 six clusters out of seven (86%) are detected at the same significance level. The detected objects show a clear X-ray luminosity -- galaxy count relation with a dispersion consistent with other external estimates of the error in the counts. By analyzing the excess of positive fluctuations of the X-ray flux at the cluster positions, compared with the fluctuations of randomly drawn background fields, it is possible to extend these results below the nominal flux limit. We find 80% of richness R greater than or = 0 and 86% of R greater than or = 1 clusters are X-ray emitters with fluxes above 1 x 10-13 erg/sq cm/s. Nearly 90% of the clusters meeting the requirements to be in Abell's statistical sample emit above the same level. We therefore conclude that almost all Abell clusters are real clusters and the Abell catalog is not strongly contaminated by projection effects. We use the Kaplan-Meier product limit estimator to calculate the cumulative X-ray luminosity function. We show that the shape of the luminosity functions are similiar for different richness classes, but the characteristic luminosities of richness 2 clusters are about twice those of richness 1 clusters which are in turn about twice those of richness 0 clusters. This result is another manifestation of the luminosity -- richness elation for Abell clusters.

  6. Spectral reflectance of surface soils - A statistical analysis

    NASA Technical Reports Server (NTRS)

    Crouse, K. R.; Henninger, D. L.; Thompson, D. R.

    1983-01-01

    The relationship of the physical and chemical properties of soils to their spectral reflectance as measured at six wavebands of Thematic Mapper (TM) aboard NASA's Landsat-4 satellite was examined. The results of performing regressions of over 20 soil properties on the six TM bands indicated that organic matter, water, clay, cation exchange capacity, and calcium were the properties most readily predicted from TM data. The middle infrared bands, bands 5 and 7, were the best bands for predicting soil properties, and the near infrared band, band 4, was nearly as good. Clustering 234 soil samples on the TM bands and characterizing the clusters on the basis of soil properties revealed several clear relationships between properties and reflectance. Discriminant analysis found organic matter, fine sand, base saturation, sand, extractable acidity, and water to be significant in discriminating among clusters.

  7. Association between differential gene expression and body mass index among endometrial cancers from The Cancer Genome Atlas Project.

    PubMed

    Roque, Dario R; Makowski, Liza; Chen, Ting-Huei; Rashid, Naim; Hayes, D Neil; Bae-Jump, Victoria

    2016-08-01

    The Cancer Genome Atlas (TCGA) identified four integrated clusters for endometrial cancer (EC): POLE, MSI, CNL and CNH. We evaluated differences in gene expression profiles of obese and non-obese women with EC and examined the association of body mass index (BMI) within the clusters identified in TCGA. TCGA RNAseq data was used to identify genes related to increasing BMI among ECs. The POLE, MSI and CNL clusters were composed mostly of endometrioid EC. Patient BMI was compared between these three clusters with one-way ANOVA. Association between gene expression and BMI was also assessed while adjusting for confounding effects of potential confounding factors. p-Values testing the association between gene expression and BMI were adjusted for multiple hypothesis testing over the 20,531 genes considered. Mean BMI was statistically different between the ECs in the CNL (35.8) versus POLE (29.8) cluster (p=0.006) and approached significance for the MSI (33.0) versus CNL (35.8) cluster (p=0.05). 181 genes were significantly up- or down-regulated with increasing BMI in endometrioid EC (q-value<0.01), including LPL, IRS-1, IGFBP4, IGFBP7 and the progesterone receptor. DAVID functional annotation analysis revealed significant enrichment in "cell cycle" (adjusted p-value=1.5E-5) and "DNA metabolic processes" (adjusted p-value=1E-3) for the identified genes. Obesity related genes were found to be upregulated with increasing BMI among endometrioid ECs. Patients with POLE tumors have the lowest median BMI when compared to MSI and CNL. Given the heterogeneity among endometrioid EC, consideration should be given to abandoning the Type I and II classification of EC tumors. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. User’s guide for GcClust—An R package for clustering of regional geochemical data

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David B.

    2016-04-08

    GcClust is a software package developed by the U.S. Geological Survey for statistical clustering of regional geochemical data, and similar data such as regional mineralogical data. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of the user’s guide are bundled together in R’s unit of sharable code, which is called a “package.” The user’s guide includes step-by-step instructions showing how the functions are used to cluster data and to evaluate the clustering results. These functions are demonstrated in this report using test data, which are included in the package.

  9. The Role of Large-Scale Structure and Assembly in the Quenching of Star Formation in Cluster Galaxies at z 0.2

    NASA Astrophysics Data System (ADS)

    Moran, Sean; Smith, G.; Haines, C.; Egami, E.; Hardegree-Ullman, E.; Heckman, T.

    2010-01-01

    We present results from LoCuSS, the Local Cluster Substructure Survey, on the distribution and abundance of cluster galaxies showing signatures of recently quenched star formation, within a sample of 15 z 0.2 clusters. Combining LoCuSS' wide-field UV through NIR photometry with weak-lensing derived mass maps for these clusters, we identify passive galaxies that have undergone recent quenching via both rapid (100Myr) and slow (1Gyr) mechanisms. By studying their abundance in a statistically significant sample of z 0.2 clusters, we explore how the effectiveness of environmental quenching of star formation varies as a function of the level of cluster substructure, in addition to global cluster characteristics such as mass or X-ray luminosity and temperature, with the aim of understanding the role that pre-processing of galaxies within groups and filaments plays in the overall buildup of the morphology-density and SFR-density relations. We find that clusters with large levels of substructure indicative of recent assembly or cluster-cluster mergers host a higher fraction of galaxies with signs of recent ram-pressure stripping by the hot intra-cluster gas. In addition, we find that the fraction of post-starburst galaxies increases with cluster mass (M500), but fractions of optically-selected AGN and GALEX-defined "Green Valley" galaxies show the opposite trend, being most abundant in rather low-mass clusters. These trends suggest a picture where quenching of star formation occurs most vigorously in actively assembling structures, with comparatively little activity in the most massive structures where most of the nearby large-scale structure has already been accreted and Virialized into the main cluster body.

  10. Space-filling, multifractal, localized thermal spikes in Si, Ge and ZnO

    NASA Astrophysics Data System (ADS)

    Ahmad, Shoaib; Abbas, Muhammad Sabtain; Yousuf, Muhammad; Javeed, Sumera; Zeeshan, Sumaira; Yaqub, Kashif

    2018-04-01

    The mechanism responsible for the emission of clusters from heavy ion irradiated solids is proposed to be thermal spikes. Collision cascade-based theories describe atomic sputtering but cannot explain the consistently observed experimental evidence for significant cluster emission. Statistical thermodynamic arguments for thermal spikes are employed here for qualitative and quantitative estimation of the thermal spike-induced cluster emission from Si, Ge and ZnO. The evolving cascades and spikes in elemental and molecular semiconducting solids are shown to have fractal characteristics. Power law potential is used to calculate the fractal dimension. With the loss of recoiling particles' energy the successive branching ratios get smaller. The fractal dimension is shown to be dependent upon the exponent of the power law interatomic potential D = 1/2m. Each irradiating ion has the probability of initiating a space-filling, multifractal thermal spike that may sublime a localized region near the surface by emitting clusters in relative ratios that depend upon the energies of formation of respective surface vacancies.

  11. A stereoscopic system for viewing the temporal evolution of brain activity clusters in response to linguistic stimuli

    NASA Astrophysics Data System (ADS)

    Forbes, Angus; Villegas, Javier; Almryde, Kyle R.; Plante, Elena

    2014-03-01

    In this paper, we present a novel application, 3D+Time Brain View, for the stereoscopic visualization of functional Magnetic Resonance Imaging (fMRI) data gathered from participants exposed to unfamiliar spoken languages. An analysis technique based on Independent Component Analysis (ICA) is used to identify statistically significant clusters of brain activity and their changes over time during different testing sessions. That is, our system illustrates the temporal evolution of participants' brain activity as they are introduced to a foreign language through displaying these clusters as they change over time. The raw fMRI data is presented as a stereoscopic pair in an immersive environment utilizing passive stereo rendering. The clusters are presented using a ray casting technique for volume rendering. Our system incorporates the temporal information and the results of the ICA into the stereoscopic 3D rendering, making it easier for domain experts to explore and analyze the data.

  12. Fast gene ontology based clustering for microarray experiments.

    PubMed

    Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa

    2008-11-21

    Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.

  13. Spatio-Temporal Trends of Fire in Slash and Burn Agriculture Landscape: A Case Study from Nagaland, India

    NASA Astrophysics Data System (ADS)

    Padalia, H.; Mondal, P. P.

    2014-11-01

    Increasing incidences of fire from land conversion and residue burning in tropics is the major concern in global warming. Spatial and temporal monitoring of trends of fire incidences is, therefore, significant in order to determine contribution of carbon emissions from slash and burn agriculture. In this study, we analyzed time-series Terra / Aqua MODIS satellite hotspot products from 2001 to 2013 to derive intra- and inter-annual trends in fire incidences in Nagaland state, located in the Indo-Burma biodiversity hotspot. Time-series regression was applied to MODIS fire products at variable spatial scales in GIS. Significance of change in fire frequency at each grid level was tested using t statistic. Spatial clustering of higher or lower fire incidences across study area was determined using Getis-OrdGi statistic. Maximum fire incidences were encountered in moist mixed deciduous forests (46%) followed by secondary moist bamboo brakes (30%). In most parts of the study area fire incidences peaked during March while in warmer parts (e.g. Mon district dominated by indigenous people) fire activity starts as early as during November and peaks in January. Regression trend analysis captured noticeable areas with statistically significant positive (e.g. Mokokchung, Wokha, Mon, Tuensang and Kiphire districts) and negative (e.g. Kohima and north-western part of Mokokchung district) inter-annual fire frequency trends based on area-based aggregation of fire occurrences at different grid sizes. Localization of spatial clusters of high fire incidences was observed in Mokokchung, Wokha, Mon,Tuensang and Kiphire districts.

  14. Reducing the use of sugar in public schools: a randomized cluster trial.

    PubMed

    Souza, Rita Adriana Gomes de; Mediano, Mauro Felippe Felix; Souza, Amanda de Moura; Sichieri, Rosely

    2013-08-01

    To test the efficacy of nutritional guidelines for school lunch cooks aiming to reduce added sugar in school meals and their own sugar intake. A controlled randomized cluster trial was carried out in twenty public schools in the municipality of Niteroi in Rio de Janeiro, Southeastern Brazil, from March to December 2007. A nutrition educational program was implemented in the schools in question through messages, activities and printed educational materials encouraging reduced levels of added sugar in school meals and in the school lunch cooks' own intake. The reduced availability of added sugar in schools was evaluated using spreadsheets including data on the monthly use of food item supplies. The cooks' individual food intake was evaluated by a Food Frequency Questionnaire. Anthropometric measurements were taken according to standardized techniques and variation in weight was measured throughout the duration of the study. There was a more marked reduction in the intervention schools compared to the control schools (-6.0 kg versus 0.34 kg), but no statistically significant difference (p = 0.21), although the study power was low. Both groups of school lunch cooks showed a reduction in the consumption of sweets and sweetened beverages, but the difference in sugar intake was not statistically significant. Weight loss and a reduction in total energy consumption occurred in both groups, but the difference between them was not statistically significant, and there was no alteration in the percentages of adequacy of macronutrients in relation to energy consumption. The strategy of reducing the use and consumption of sugar by school lunch cooks from public schools could not be proved to be effective.

  15. The effectiveness of a structured education pulmonary rehabilitation programme for improving the health status of people with moderate and severe chronic obstructive pulmonary disease in primary care: the PRINCE cluster randomised trial.

    PubMed

    Casey, Dympna; Murphy, Kathy; Devane, Declan; Cooney, Adeline; McCarthy, Bernard; Mee, Lorraine; Newell, John; O'Shea, Eamon; Scarrott, Carl; Gillespie, Paddy; Kirwan, Collette; Murphy, Andrew W

    2013-10-01

    To evaluate the effectiveness of a structured education pulmonary rehabilitation programme on the health status of people with chronic obstructive pulmonary disease (COPD). Two-arm, cluster randomised controlled trial. 32 general practices in the Republic of Ireland. 350 participants with a diagnosis of moderate or severe COPD. Experimental group received a structured education pulmonary rehabilitation programme, delivered by the practice nurse and physiotherapist. Control group received usual care. Health status as measured by the Chronic Respiratory Questionnaire (CRQ) at baseline and at 12-14 weeks postcompletion of the programme. Participants allocated to the intervention group had statistically significant higher mean change total CRQ scores (adjusted mean difference (MD) 1.11, 95% CI 0.35 to 1.87). However, the CI does not exclude a smaller difference than the one that was prespecified as clinically important. Participants allocated to the intervention group also had statistically significant higher mean CRQ Dyspnoea scores after intervention (adjusted MD 0.49, 95% CI 0.20 to 0.78) and CRQ Physical scores (adjusted MD 0.37, 95% CI 0.14 to 0.60). However, CIs for both the CRQ Dyspnoea and CRQ Physical subscales do not exclude smaller differences as prespecified as clinically important. No other statistically significant differences between groups were seen. A primary care based structured education pulmonary rehabilitation programme is feasible and may increase local accessibility to people with moderate and severe COPD. ISRCTN52403063.

  16. Spatial clusters of suicide in the municipality of São Paulo 1996-2005: an ecological study.

    PubMed

    Bando, Daniel H; Moreira, Rafael S; Pereira, Julio C R; Barrozo, Ligia V

    2012-08-23

    In a classical study, Durkheim mapped suicide rates, wealth, and low family density and realized that they clustered in northern France. Assessing others variables, such as religious society, he constructed a framework for the analysis of the suicide, which still allows international comparisons using the same basic methodology. The present study aims to identify possible significantly clusters of suicide in the city of São Paulo, and then, verify their statistical associations with socio-economic and cultural characteristics. A spatial scan statistical test was performed to analyze the geographical pattern of suicide deaths of residents in the city of São Paulo by Administrative District, from 1996 to 2005. Relative risks and high and/or low clusters were calculated accounting for gender and age as co-variates, were analyzed using spatial scan statistics to identify geographical patterns. Logistic regression was used to estimate associations with socioeconomic variables, considering, the spatial cluster of high suicide rates as the response variable. Drawing from Durkheim's original work, current World Health Organization (WHO) reports and recent reviews, the following independent variables were considered: marital status, income, education, religion, and migration. The mean suicide rate was 4.1/100,000 inhabitant-years. Against this baseline, two clusters were identified: the first, of increased risk (RR=1.66), comprising 18 districts in the central region; the second, of decreased risk (RR=0.78), including 14 districts in the southern region. The downtown area toward the southwestern region of the city displayed the highest risk for suicide, and though the overall risk may be considered low, the rate climbs up to an intermediate level in this region. One logistic regression analysis contrasted the risk cluster (18 districts) against the other remaining 78 districts, testing the effects of socioeconomic-cultural variables. The following categories of proportion of persons within the clusters were identified as risk factors: singles (OR=2.36), migrants (OR=1.50), Catholics (OR=1.37) and higher income (OR=1.06). In a second logistic model, likewise conceived, the following categories of proportion of persons were identified as protective factors: married (OR=0.49) and Evangelical (OR=0.60). This risk/ protection profile is in accordance with the interpretation that, as a social phenomenon, suicide is related to social isolation. Thus, the classical framework put forward by Durkheim seems to still hold, even though its categorical expression requires re-interpretation.

  17. Effects of a Worksite Tobacco Control Intervention in India: The Mumbai Worksite Tobacco Control Study, a Cluster Randomized Trial

    PubMed Central

    Sorensen, Glorian; Pednekar, Mangesh; Cordeira, Laura Shulman; Pawar, Pratibha; Nagler, Eve; Stoddard, Anne M.; Kim, Hae-Young; Gupta, Prakash C.

    2016-01-01

    Objectives We assessed a worksite intervention designed to promote tobacco control among manufacturing workers in Greater Mumbai, India. Methods We used a cluster-randomized design to test an integrated health promotion/health protection intervention, which addressed changes at the management and worker levels. Between July 2012 and July 2013, we recruited 20 worksites on a rolling basis and randomly assigned them to intervention or delayed-intervention control conditions. The follow-up survey was conducted between December 2013 and November 2014. Results The difference in 30-day quit rates between intervention and control conditions was statistically significant for production workers (OR=2.25, P=0.03), although not for the overall sample (OR=1.70; P=0.12). The intervention resulted in a doubling of the 6-month cessation rates among workers in the intervention worksites compared to those in the control, for production workers (OR=2.29; P=0.07) and for the overall sample (OR=1.81; P=0.13), but the difference did not reach statistical significance. Conclusions These findings demonstrate the potential impact of a tobacco control intervention that combined tobacco control and health protection programming within Indian manufacturing worksites. PMID:26883793

  18. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    PubMed

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Three estimates of the association between linear growth failure and cognitive ability.

    PubMed

    Cheung, Y B; Lam, K F

    2009-09-01

    To compare three estimators of association between growth stunting as measured by height-for-age Z-score and cognitive ability in children, and to examine the extent statistical adjustment for covariates is useful for removing confounding due to socio-economic status. Three estimators, namely random-effects, within- and between-cluster estimators, for panel data were used to estimate the association in a survey of 1105 pairs of siblings who were assessed for anthropometry and cognition. Furthermore, a 'combined' model was formulated to simultaneously provide the within- and between-cluster estimates. Random-effects and between-cluster estimators showed strong association between linear growth and cognitive ability, even after adjustment for a range of socio-economic variables. In contrast, the within-cluster estimator showed a much more modest association: For every increase of one Z-score in linear growth, cognitive ability increased by about 0.08 standard deviation (P < 0.001). The combined model verified that the between-cluster estimate was significantly larger than the within-cluster estimate (P = 0.004). Residual confounding by socio-economic situations may explain a substantial proportion of the observed association between linear growth and cognition in studies that attempt to control the confounding by means of multivariable regression analysis. The within-cluster estimator provides more convincing and modest results about the strength of association.

  20. Multilocus microsatellite typing shows three different genetic clusters of Leishmania major in Iran.

    PubMed

    Mahnaz, Tashakori; Al-Jawabreh, Amer; Kuhls, Katrin; Schönian, Gabriele

    2011-10-01

    Ten polymorphic microsatellite markers were used to analyse 25 strains of Leishmania major collected from cutaneous leishmaniasis cases in different endemic areas in Iran. Nine of the markers were polymorphic, revealing 21 different genotypes. The data displayed significant microsatellite polymorphism with rare allelic heterozygosity. Bayesian statistic and distance based analyses identified three genetic clusters among the 25 strains analysed. Cluster I represented mainly strains isolated in the west and south-west of Iran, with the exception of four strains originating from central Iran. Cluster II comprised strains from the central part of Iran, and cluster III included only strains from north Iran. The geographical distribution of L. major in Iran was supported by comparing the microsatellite profiles of the 25 Iranian strains to those of 105 strains collected in 19 Asian and African countries. The Iranian clusters I and II were separated from three previously described populations comprising strains from Africa, the Middle East and Central Asia whereas cluster III grouped together with the Central Asian population. The considerable genetic variability of L. major might be related to the existence of different populations of Phlebotomus papatasi and/or to differences in reservoir host abundance in different parts of Iran. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  1. Luminosity Function of Faint Globular Clusters in M87

    NASA Astrophysics Data System (ADS)

    Waters, Christopher Z.; Zepf, Stephen E.; Lauer, Tod R.; Baltz, Edward A.; Silk, Joseph

    2006-10-01

    We present the luminosity function to very faint magnitudes for the globular clusters in M87, based on a 30 orbit Hubble Space Telescope (HST) WFPC2 imaging program. The very deep images and corresponding improved false source rejection allow us to probe the mass function further beyond the turnover than has been done before. We compare our luminosity function to those that have been observed in the past, and confirm the similarity of the turnover luminosity between M87 and the Milky Way. We also find with high statistical significance that the M87 luminosity function is broader than that of the Milky Way. We discuss how determining the mass function of the cluster system to low masses can constrain theoretical models of the dynamical evolution of globular cluster systems. Our mass function is consistent with the dependence of mass loss on the initial cluster mass given by classical evaporation, and somewhat inconsistent with newer proposals that have a shallower mass dependence. In addition, the rate of mass loss is consistent with standard evaporation models, and not with the much higher rates proposed by some recent studies of very young cluster systems. We also find that the mass-size relation has very little slope, indicating that there is almost no increase in the size of a cluster with increasing mass.

  2. X-ray spectral observations of clusters of galaxies undergoing merger events

    NASA Astrophysics Data System (ADS)

    Henriksen, Mark J.

    1993-09-01

    We have analyzed the HEAO 1 A2 observations of two clusters whose optical and X-ray isophotes are suggestive of merging subclusters, A119 and A754, and find evidence of nonisothermal X-ray emission from both clusters. The X-ray spectrum of both clusters, when fitted with a single isothermal model, shows residual soft X-ray emission. There is a statistically significant reduction in chi-squared (98 percent probability based on the F-test) when a second temperature component is added. If the asymmetric isophotes seen in the soft X-ray image are indicative of merging subclusters, then our analysis of the Einstein IPC spectra and Solid State Spectrometer observations of A754, which provide some spatial and spectral resolution, suggests that the two temperature components seen in the HEAO 1 A2 spectra are associated with gas trapped in the subcluster potential wells. The implied subcluster isothermal masses suggest that a more massive cluster is accreting a less massive companion in A754. The present observations cannot rule out the alternative possibility that the cooler gas is associated with the outer cluster atmosphere rather than individual subclusters, as appears to be the case for A119. Astro D observations will be necessary to distinguish between these two possibilities for both clusters.

  3. Intracluster age gradients in numerous young stellar clusters

    NASA Astrophysics Data System (ADS)

    Getman, K. V.; Feigelson, E. D.; Kuhn, M. A.; Bate, M. R.; Broos, P. S.; Garmire, G. P.

    2018-05-01

    The pace and pattern of star formation leading to rich young stellar clusters is quite uncertain. In this context, we analyse the spatial distribution of ages within 19 young (median t ≲ 3 Myr on the Siess et al. time-scale), morphologically simple, isolated, and relatively rich stellar clusters. Our analysis is based on young stellar object (YSO) samples from the Massive Young Star-Forming Complex Study in Infrared and X-ray and Star Formation in Nearby Clouds surveys, and a new estimator of pre-main sequence (PMS) stellar ages, AgeJX, derived from X-ray and near-infrared photometric data. Median cluster ages are computed within four annular subregions of the clusters. We confirm and extend the earlier result of Getman et al. (2014): 80 per cent of the clusters show age trends where stars in cluster cores are younger than in outer regions. Our cluster stacking analyses establish the existence of an age gradient to high statistical significance in several ways. Time-scales vary with the choice of PMS evolutionary model; the inferred median age gradient across the studied clusters ranges from 0.75 to 1.5 Myr pc-1. The empirical finding reported in the present study - late or continuing formation of stars in the cores of star clusters with older stars dispersed in the outer regions - has a strong foundation with other observational studies and with the astrophysical models like the global hierarchical collapse model of Vázquez-Semadeni et al.

  4. Nursing home care quality: a cluster analysis.

    PubMed

    Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit

    2017-02-13

    Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ 2 tests and one-way between-groups ANOVA were performed to characterise the clusters ( p<0.05). Findings Two clusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.

  5. A First Estimate of the X-Ray Binary Frequency as a Function of Star Cluster Mass in a Single Galactic System

    NASA Astrophysics Data System (ADS)

    Clark, D. M.; Eikenberry, S. S.; Brandl, B. R.; Wilson, J. C.; Carson, J. C.; Henderson, C. P.; Hayward, T. L.; Barry, D. J.; Ptak, A. F.; Colbert, E. J. M.

    2008-05-01

    We use the previously identified 15 infrared star cluster counterparts to X-ray point sources in the interacting galaxies NGC 4038/4039 (the Antennae) to study the relationship between total cluster mass and X-ray binary number. This significant population of X-Ray/IR associations allows us to perform, for the first time, a statistical study of X-ray point sources and their environments. We define a quantity, η, relating the fraction of X-ray sources per unit mass as a function of cluster mass in the Antennae. We compute cluster mass by fitting spectral evolutionary models to Ks luminosity. Considering that this method depends on cluster age, we use four different age distributions to explore the effects of cluster age on the value of η and find it varies by less than a factor of 4. We find a mean value of η for these different distributions of η = 1.7 × 10-8 M-1⊙ with ση = 1.2 × 10-8 M-1⊙. Performing a χ2 test, we demonstrate η could exhibit a positive slope, but that it depends on the assumed distribution in cluster ages. While the estimated uncertainties in η are factors of a few, we believe this is the first estimate made of this quantity to "order of magnitude" accuracy. We also compare our findings to theoretical models of open and globular cluster evolution, incorporating the X-ray binary fraction per cluster.

  6. Critical Analysis of Cluster Models and Exchange-Correlation Functionals for Calculating Magnetic Shielding in Molecular Solids.

    PubMed

    Holmes, Sean T; Iuliucci, Robbie J; Mueller, Karl T; Dybowski, Cecil

    2015-11-10

    Calculations of the principal components of magnetic-shielding tensors in crystalline solids require the inclusion of the effects of lattice structure on the local electronic environment to obtain significant agreement with experimental NMR measurements. We assess periodic (GIPAW) and GIAO/symmetry-adapted cluster (SAC) models for computing magnetic-shielding tensors by calculations on a test set containing 72 insulating molecular solids, with a total of 393 principal components of chemical-shift tensors from 13C, 15N, 19F, and 31P sites. When clusters are carefully designed to represent the local solid-state environment and when periodic calculations include sufficient variability, both methods predict magnetic-shielding tensors that agree well with experimental chemical-shift values, demonstrating the correspondence of the two computational techniques. At the basis-set limit, we find that the small differences in the computed values have no statistical significance for three of the four nuclides considered. Subsequently, we explore the effects of additional DFT methods available only with the GIAO/cluster approach, particularly the use of hybrid-GGA functionals, meta-GGA functionals, and hybrid meta-GGA functionals that demonstrate improved agreement in calculations on symmetry-adapted clusters. We demonstrate that meta-GGA functionals improve computed NMR parameters over those obtained by GGA functionals in all cases, and that hybrid functionals improve computed results over the respective pure DFT functional for all nuclides except 15N.

  7. The Influence of the Host Plant Is the Major Ecological Determinant of the Presence of Nitrogen-Fixing Root Nodule Symbiont Cluster II Frankia Species in Soil

    PubMed Central

    Battenberg, Kai; Wren, Jannah A.; Hillman, Janell; Edwards, Joseph; Huang, Liujing

    2016-01-01

    ABSTRACT The actinobacterial genus Frankia establishes nitrogen-fixing root nodule symbioses with specific hosts within the nitrogen-fixing plant clade. Of four genetically distinct subgroups of Frankia, cluster I, II, and III strains are capable of forming effective nitrogen-fixing symbiotic associations, while cluster IV strains generally do not. Cluster II Frankia strains have rarely been detected in soil devoid of host plants, unlike cluster I or III strains, suggesting a stronger association with their host. To investigate the degree of host influence, we characterized the cluster II Frankia strain distribution in rhizosphere soil in three locations in northern California. The presence/absence of cluster II Frankia strains at a given site correlated significantly with the presence/absence of host plants on the site, as determined by glutamine synthetase (glnA) gene sequence analysis, and by microbiome analysis (16S rRNA gene) of a subset of host/nonhost rhizosphere soils. However, the distribution of cluster II Frankia strains was not significantly affected by other potential determinants such as host-plant species, geographical location, climate, soil pH, or soil type. Rhizosphere soil microbiome analysis showed that cluster II Frankia strains occupied only a minute fraction of the microbiome even in the host-plant-present site and further revealed no statistically significant difference in the α-diversity or in the microbiome composition between the host-plant-present or -absent sites. Taken together, these data suggest that host plants provide a factor that is specific for cluster II Frankia strains, not a general growth-promoting factor. Further, the factor accumulates or is transported at the site level, i.e., beyond the host rhizosphere. IMPORTANCE Biological nitrogen fixation is a bacterial process that accounts for a major fraction of net new nitrogen input in terrestrial ecosystems. Transfer of fixed nitrogen to plant biomass is especially efficient via root nodule symbioses, which represent evolutionarily and ecologically specialized mutualistic associations. Frankia spp. (Actinobacteria), especially cluster II Frankia spp., have an extremely broad host range, yet comparatively little is known about the soil ecology of these organisms in relation to the host plants and their rhizosphere microbiomes. This study reveals a strong influence of the host plant on soil distribution of cluster II Frankia spp. PMID:27795313

  8. The Hubble Space Telescope Medium Deep Survey Cluster Sample: Methodology and Data

    NASA Astrophysics Data System (ADS)

    Ostrander, E. J.; Nichol, R. C.; Ratnatunga, K. U.; Griffiths, R. E.

    1998-12-01

    We present a new, objectively selected, sample of galaxy overdensities detected in the Hubble Space Telescope Medium Deep Survey (MDS). These clusters/groups were found using an automated procedure that involved searching for statistically significant galaxy overdensities. The contrast of the clusters against the field galaxy population is increased when morphological data are used to search around bulge-dominated galaxies. In total, we present 92 overdensities above a probability threshold of 99.5%. We show, via extensive Monte Carlo simulations, that at least 60% of these overdensities are likely to be real clusters and groups and not random line-of-sight superpositions of galaxies. For each overdensity in the MDS cluster sample, we provide a richness and the average of the bulge-to-total ratio of galaxies within each system. This MDS cluster sample potentially contains some of the most distant clusters/groups ever detected, with about 25% of the overdensities having estimated redshifts z > ~0.9. We have made this sample publicly available to facilitate spectroscopic confirmation of these clusters and help more detailed studies of cluster and galaxy evolution. We also report the serendipitous discovery of a new cluster close on the sky to the rich optical cluster Cl l0016+16 at z = 0.546. This new overdensity, HST 001831+16208, may be coincident with both an X-ray source and a radio source. HST 001831+16208 is the third cluster/group discovered near to Cl 0016+16 and appears to strengthen the claims of Connolly et al. of superclustering at high redshift.

  9. Cluster Stability Estimation Based on a Minimal Spanning Trees Approach

    NASA Astrophysics Data System (ADS)

    Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora

    2009-08-01

    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.

  10. The capacity limitations of orientation summary statistics

    PubMed Central

    Attarha, Mouna; Moore, Cathleen M.

    2015-01-01

    The simultaneous–sequential method was used to test the processing capacity of establishing mean orientation summaries. Four clusters of oriented Gabor patches were presented in the peripheral visual field. One of the clusters had a mean orientation that was tilted either left or right while the mean orientations of the other three clusters were roughly vertical. All four clusters were presented at the same time in the simultaneous condition whereas the clusters appeared in temporal subsets of two in the sequential condition. Performance was lower when the means of all four clusters had to be processed concurrently than when only two had to be processed in the same amount of time. The advantage for establishing fewer summaries at a given time indicates that the processing of mean orientation engages limited-capacity processes (Experiment 1). This limitation cannot be attributed to crowding, low target-distractor discriminability, or a limited-capacity comparison process (Experiments 2 and 3). In contrast to the limitations of establishing multiple summary representations, establishing a single summary representation unfolds without interference (Experiment 4). When interpreted in the context of recent work on the capacity of summary statistics, these findings encourage reevaluation of the view that early visual perception consists of summary statistic representations that unfold independently across multiple areas of the visual field. PMID:25810160

  11. Gap Shape Classification using Landscape Indices and Multivariate Statistics

    PubMed Central

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-01-01

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap. PMID:27901127

  12. Gap Shape Classification using Landscape Indices and Multivariate Statistics.

    PubMed

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-11-30

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks' lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.

  13. Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.

  14. Spatial clusters of daytime sleepiness and association with nighttime noise levels in a Swiss general population (GeoHypnoLaus).

    PubMed

    Joost, Stéphane; Haba-Rubio, José; Himsl, Rebecca; Vollenweider, Peter; Preisig, Martin; Waeber, Gérard; Marques-Vidal, Pedro; Heinzer, Raphaël; Guessous, Idris

    2018-05-31

    Daytime sleepiness is highly prevalent in the general adult population and has been linked to an increased risk of workplace and vehicle accidents, lower professional performance and poorer health. Despite the established relationship between noise and daytime sleepiness, little research has explored the individual-level spatial distribution of noise-related sleep disturbances. We assessed the spatial dependence of daytime sleepiness and tested whether clusters of individuals exhibiting higher daytime sleepiness were characterized by higher nocturnal noise levels than other clusters. Population-based cross-sectional study, in the city of Lausanne, Switzerland. Sleepiness was measured using the Epworth Sleepiness Scale (ESS) for 3697 georeferenced individuals from the CoLaus|PsyCoLaus cohort (period = 2009-2012). We used the sonBASE georeferenced database produced by the Swiss Federal Office for the Environment to characterize nighttime road traffic noise exposure throughout the city. We used the GeoDa software program to calculate the Getis-Ord G i * statistics for unadjusted and adjusted ESS in order to detect spatial clusters of high and low ESS values. Modeled nighttime noise exposure from road and rail traffic was compared across ESS clusters. Daytime sleepiness was not randomly distributed and showed a significant spatial dependence. The median nighttime traffic noise exposure was significantly different across the three ESS Getis cluster classes (p < 0.001). The mean nighttime noise exposure in the high ESS cluster class was 47.6, dB(A) 5.2 dB(A) higher than in low clusters (p < 0.001) and 2.1 dB(A) higher than in the neutral class (p < 0.001). These associations were independent of major potential confounders including body mass index and neighborhood income level. Clusters of higher daytime sleepiness in adults are associated with higher median nighttime noise levels. The identification of these clusters can guide tailored public health interventions. Copyright © 2018 The Authors. Published by Elsevier GmbH.. All rights reserved.

  15. A spatial scan statistic for compound Poisson data.

    PubMed

    Rosychuk, Rhonda J; Chang, Hsing-Ming

    2013-12-20

    The topic of spatial cluster detection gained attention in statistics during the late 1980s and early 1990s. Effort has been devoted to the development of methods for detecting spatial clustering of cases and events in the biological sciences, astronomy and epidemiology. More recently, research has examined detecting clusters of correlated count data associated with health conditions of individuals. Such a method allows researchers to examine spatial relationships of disease-related events rather than just incident or prevalent cases. We introduce a spatial scan test that identifies clusters of events in a study region. Because an individual case may have multiple (repeated) events, we base the test on a compound Poisson model. We illustrate our method for cluster detection on emergency department visits, where individuals may make multiple disease-related visits. Copyright © 2013 John Wiley & Sons, Ltd.

  16. Impact of educational intervention on implementation of tobacco counselling among oral health professionals: a cluster-randomized community trial.

    PubMed

    Amemori, Masamitsu; Virtanen, Jorma; Korhonen, Tellervo; Kinnunen, Taru H; Murtomaa, Heikki

    2013-04-01

    Tobacco use adversely affects oral health. Clinical guidelines recommend that oral health professionals promote tobacco abstinence and provide patients who use tobacco with brief tobacco use cessation counselling. Research shows that these guidelines are seldom implemented successfully. This study aimed to evaluate two interventions to enhance tobacco use prevention and cessation (TUPAC) counselling among oral health professionals in Finland. We used a cluster-randomized community trial to test educational and fee-for-service interventions in enhancing TUPAC counselling among a sample of dentists (n=73) and dental hygienists (n=22) in Finland. Educational intervention consisted of 1 day of training, including lectures, interactive sessions, multimedia demonstrations and a role play session with standard patient cases. Fee-for-service intervention consisted of monetary compensation for providing tobacco use prevention or cessation counselling. TUPAC counselling procedures provided were reported and measured using an electronic dental records system. In data analysis, intent-to-treat principles were followed at both individual and cluster levels. Descriptive analysis included chi-square and t-tests. A general linear model for repeated measures was used to compare the outcome measures by intervention group. Of 95 providers, 73 participated (76.8%). In preventive counselling, there was no statistically significant time effect or group-by-time interaction. In cessation counselling, statistically significant group-by-time interaction was found after a 6-month follow-up (F=2.31; P=0.007), indicating that counselling activity increased significantly in intervention groups. On average, dental hygienists showed greater activity in tobacco prevention (F=12.13; P=0.001) and cessation counselling (F=30.19; P<0.001) than did dentists. In addition, cessation counselling showed a statistically significant provider-by-group-by-time interaction (F=5.95; P<0.001), indicating that interventions to enhance cessation counselling were more effective among dental hygienists. Educational intervention yielded positive short-term effects on cessation counselling, but not on preventive counselling. Adding a fee-for-service to education failed to significantly improve TUPAC counselling performance. Other approaches than monetary incentives may be needed to enhance the effectiveness of educational intervention. Further studies with focus on how to achieve long-term changes in TUPAC counselling activity among oral health professionals are needed. © 2012 John Wiley & Sons A/S.

  17. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

  18. Unique relations between counterfactual thinking and DSM-5 PTSD symptom clusters.

    PubMed

    Mitchell, Melissa A; Contractor, Ateka A; Dranger, Paula; Shea, M Tracie

    2016-05-01

    Cognitive models of posttraumatic stress disorder (PTSD) propose that rumination about a trauma may increase particular symptom clusters. One type of rumination, termed counterfactual thinking (CFT), refers to thinking of alternative outcomes for an event. CFT centered on a trauma is thought to increase intrusions, negative alterations in mood and cognitions (NAMC), and marked alterations in arousal and reactivity (AAR). The theorized relations between CFT and specific symptom clusters have not been thoroughly investigated. Also, past work has not evaluated whether the relation is confounded by depressive symptoms, age, gender, or number of traumatic events experienced. The current study examined the unique associations between CFT and PTSD symptom clusters according to the Diagnostic and Statistical Manual of Mental Disorders (American Psychiatric Association, 2013) in 51 trauma-exposed treatment-seeking individuals. As predicted, CFT was associated with all PTSD symptom clusters. After controlling for common predictors of PTSD symptom severity (i.e., age, depressive symptoms, and number of traumatic life events endorsed), we found CFT to be significantly associated with the intrusion and avoidance symptom clusters but not the AAR or NAMC symptom clusters. Results from the present study provide further support for the role of rumination in specific PTSD symptom clusters above and beyond symptoms of depression, age, and number of traumatic life events endorsed. Future work may consider investigating interventions to reduce rumination in PTSD. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  19. Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.

    PubMed

    Seman, Ali; Sapawi, Azizian Mohd; Salleh, Mohd Zaki

    2015-06-01

    Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.

  20. Ecological tolerances of Miocene larger benthic foraminifera from Indonesia

    NASA Astrophysics Data System (ADS)

    Novak, Vibor; Renema, Willem

    2018-01-01

    To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.

  1. Interactive classification and content-based retrieval of tissue images

    NASA Astrophysics Data System (ADS)

    Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

    2002-11-01

    We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.

  2. Satisfaction with Life, Meaning in Life, Sad Childhood Experiences, and Psychological Symptoms among Turkish Students.

    PubMed

    Cömert, Itır Tarı; Özyeşil, Zümra Atalay; Burcu Özgülük, S

    2016-02-01

    The aim of the current study was to investigate the contributions of sad childhood experiences, depression, anxiety, and stress, existence of a sense of meaning, and pursuit of meaning in explaining life satisfaction of young adults in Turkey. The sample comprised 400 undergraduate students ( M age = 20.2 yr.) selected via random cluster sampling. There were no statistically significant differences between men and women in terms of their scores on depression, existence of meaning, pursuit of meaning, and life satisfaction scores. However, there were statistically significant differences between men and women on the sad childhood experiences, anxiety and stress. In heirarchical regression analysis, the model as a whole was significant. Depression and existence of meaning in life made unique significant contributions to the variance in satisfaction in life. Students with lower depression and with a sense of meaning in life tended to be more satisfied with life.

  3. Clustered Stomates in "Begonia": An Exercise in Data Collection & Statistical Analysis of Biological Space

    ERIC Educational Resources Information Center

    Lau, Joann M.; Korn, Robert W.

    2007-01-01

    In this article, the authors present a laboratory exercise in data collection and statistical analysis in biological space using clustered stomates on leaves of "Begonia" plants. The exercise can be done in middle school classes by students making their own slides and seeing imprints of cells, or at the high school level through collecting data of…

  4. A General Framework for Power Analysis to Detect the Moderator Effects in Two- and Three-Level Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Dong, Nianbo; Spybrook, Jessaca; Kelcey, Ben

    2016-01-01

    The purpose of this study is to propose a general framework for power analyses to detect the moderator effects in two- and three-level cluster randomized trials (CRTs). The study specifically aims to: (1) develop the statistical formulations for calculating statistical power, minimum detectable effect size (MDES) and its confidence interval to…

  5. The Ursa Major cluster of galaxies - III. Optical observations of dwarf galaxies and the luminosity function down to MR=-11

    NASA Astrophysics Data System (ADS)

    Trentham, Neil; Tully, R. Brent; Verheijen, Marc A. W.

    2001-07-01

    Results are presented of a deep optical survey of the Ursa Major cluster, a spiral-rich cluster of galaxies at a distance of 18.6Mpc which contains about 30 per cent of the light but only 5 per cent of the mass of the nearby Virgo cluster. Fields around known cluster members and a pattern of blind fields along the major and minor axes of the cluster were studied with mosaic CCD cameras on the Canada-France-Hawaii Telescope. The dynamical crossing time for the Ursa Major cluster is only slightly less than a Hubble time. Most galaxies in the local Universe exist in similar moderate-density environments. The Ursa Major cluster is therefore a good place to study the statistical properties of dwarf galaxies, since this structure is at an evolutionary stage representative of typical environments, yet has enough galaxies that reasonable counting statistics can be accumulated. The main observational results of our survey are as follows. (i) The galaxy luminosity function is flat, with a logarithmic slope α=-1.1 for -17

  6. Magnification Bias in Gravitational Arc Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Caminha, G. B.; Estrada, J.; Makler, M.

    2013-08-29

    The statistics of gravitational arcs in galaxy clusters is a powerful probe of cluster structure and may provide complementary cosmological constraints. Despite recent progresses, discrepancies still remain among modelling and observations of arc abundance, specially regarding the redshift distribution of strong lensing clusters. Besides, fast "semi-analytic" methods still have to incorporate the success obtained with simulations. In this paper we discuss the contribution of the magnification in gravitational arc statistics. Although lensing conserves surface brightness, the magnification increases the signal-to-noise ratio of the arcs, enhancing their detectability. We present an approach to include this and other observational effects in semi-analyticmore » calculations for arc statistics. The cross section for arc formation ({\\sigma}) is computed through a semi-analytic method based on the ratio of the eigenvalues of the magnification tensor. Using this approach we obtained the scaling of {\\sigma} with respect to the magnification, and other parameters, allowing for a fast computation of the cross section. We apply this method to evaluate the expected number of arcs per cluster using an elliptical Navarro--Frenk--White matter distribution. Our results show that the magnification has a strong effect on the arc abundance, enhancing the fraction of arcs, moving the peak of the arc fraction to higher redshifts, and softening its decrease at high redshifts. We argue that the effect of magnification should be included in arc statistics modelling and that it could help to reconcile arcs statistics predictions with the observational data.« less

  7. Text grouping in patent analysis using adaptive K-means clustering algorithm

    NASA Astrophysics Data System (ADS)

    Shanie, Tiara; Suprijadi, Jadi; Zulhanif

    2017-03-01

    Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.

  8. Altered activity of the sympathetic nervous system and changes in the balance of hypophyseal, pituitary and adrenal hormones in patients with cluster headache.

    PubMed

    Strittmatter, M; Hamann, G F; Grauer, M; Fischer, C; Blaes, F; Hoffmann, K H; Schimrigk, K

    1996-05-17

    Twelve patients (age 43.4 +/- 6.3 years) with episodic cluster headache (CH) were examined during the cluster period. Plasma norepinephrine levels in patients suffering from CH were significantly decreased compared with the control group (p < 0.01). There were also statistically significant correlations between norepinephrine levels and clinical features of the pain attacks including duration (r = 0.75, p < 0.05), intensity (r = 0.64, p < 0.05) and frequency (r = 0.68, p < 0.06), thereby suggesting a pathophysiological involvement of the sympathetic nervous system in CH. Increased plasma levels of plasmacortisol and ACTH in patients with CH, especially in the morning and in the evening, suggest an alteration of the feedback circuit involving the hypothalamus, the pituitary and the adrenal gland, an imbalance in the hormones related to these structures, as well as an alteration of the circadian rhythm. In addition, CH patients demonstrated significantly decreased levels of norepinephrine (p < 0.05), HVA (p < 0.01) and 5-HIAA (p < 0.01) in the cerebrospinal fluid (CSF) consistent with a central genesis of CH. These significant relationships between neurochemical parameters and the clinical patterns suggest a complex interplay between the hypothalamus, neuroendocrinological parameters, activity of the autonomic nervous system and the pain of CH.

  9. COMPARING MID-INFRARED GLOBULAR CLUSTER COLORS WITH POPULATION SYNTHESIS MODELS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barmby, P.; Jalilian, F. F.

    2012-04-15

    Several population synthesis models now predict integrated colors of simple stellar populations in the mid-infrared bands. To date, the models have not been extensively tested in this wavelength range. In a comparison of the predictions of several recent population synthesis models, the integrated colors are found to cover approximately the same range but to disagree in detail, for example, on the effects of metallicity. To test against observational data, globular clusters (GCs) are used as the closest objects to idealized groups of stars with a single age and single metallicity. Using recent mass estimates, we have compiled a sample ofmore » massive, old GCs in M31 which contain enough stars to guard against the stochastic effects of small-number statistics, and measured their integrated colors in the Spitzer/IRAC bands. Comparison of the cluster photometry in the IRAC bands with the model predictions shows that the models reproduce the cluster colors reasonably well, except for a small (not statistically significant) offset in [4.5] - [5.8]. In this color, models without circumstellar dust emission predict bluer values than are observed. Model predictions of colors formed from the V band and the IRAC 3.6 and 4.5 {mu}m bands are redder than the observed data at high metallicities and we discuss several possible explanations. In agreement with model predictions, V - [3.6] and V - [4.5] colors are found to have metallicity sensitivity similar to or slightly better than V - K{sub s}.« less

  10. Soil chemistry and pollution study of a closed landfill site at Ampar Tenang, Selangor, Malaysia.

    PubMed

    Mohd Adnan, Siti Nur Syahirah Binti; Yusoff, Sumiani; Piaw, Chua Yan

    2013-06-01

    A total of 20 landfills are located in State of Selangor, Malaysia. This includes the Ampar Tenang landfill site, which was closed on 26 January 2010. It was reported that the landfill has been upgraded to a level I type of sanitary classification. However, the dumpsite area is not being covered according to the classification. In addition, municipal solid waste was dumped directly on top of the unlined natural alluvium formation. This does not only contaminate surface and subsurface soils, but also initiates the potential risk of groundwater pollution. Based on previous studies, the Ampar Tenang soil has been proven to no longer be capable of preventing pollution migration. In this study, metal concentrations of soil samples up to 30 m depth were analyzed based on statistical analysis. It is very significant because research of this type has not been carried out before. The subsurface soils were significantly polluted by arsenic (As), lead (Pb), iron (Fe), copper (Cu) and aluminium (Al). As and Pb exceeded the safe limit values of 5.90 mg/kg and 31.00 mg/kg, respectively, based on Provincial Sediment Quality Guidelines for Metals and the Interim Sediment Quality Values. Furthermore, only Cu concentrations showed a significantly decreasing trend with increasing depth. Most metals were found on clay-type soils based on the cluster analysis method. Moreover, the analysis also differentiates two clusters: cluster I-Pb, As, zinc, Cu, manganese, calcium, sodium, magnesium, potassium and Fe; cluster II-Al. Different clustering may suggest a different contamination source of metals.

  11. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    ERIC Educational Resources Information Center

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  12. Matched Filter Stochastic Background Characterization for Hyperspectral Target Detection

    DTIC Science & Technology

    2005-09-30

    and Pre- Clustering MVN Test.....................126 4.2.3 Pre- Clustering Detection Results.................................................130...4.2.4 Pre- Clustering Target Influence..................................................134 4.2.5 Statistical Distance Exclusion and Low Contrast...al, 2001] Figure 2.7 ROC Curve Comparison of RX, K-Means, and Bayesian Pre- Clustering Applied to Anomaly Detection [Ashton, 1998] Figure 2.8 ROC

  13. Identifying seizure clusters in patients with epilepsy

    PubMed Central

    Lipton, R. B.; LeValley, A. J.; Hall, C. B.; Shinnar, S.

    2006-01-01

    Clinicians often encounter patients whose neurologic attacks appear to cluster. In a daily diary study, the authors explored whether clustering is a true phenomenon in epilepsy and can be identified in the clinical setting. Nearly half the subjects experienced at least one episode of three or more seizures in 24 hours; 20% also met a statistical clustering criterion. Utilizing the clinical definition of clustering should identify all seizure clusterers, and false positives can be determined with diary data. PMID:16247068

  14. Spatio-temporal pattern analysis for evaluation of the spread of human infections with avian influenza A(H7N9) virus in China, 2013-2014.

    PubMed

    Dong, Wen; Yang, Kun; Xu, Quanli; Liu, Lin; Chen, Juan

    2017-10-24

    A large number (n = 460) of A(H7N9) human infections have been reported in China from March 2013 through December 2014, and H7N9 outbreaks in humans became an emerging issue for China health, which have caused numerous disease outbreaks in domestic poultry and wild bird populations, and threatened human health severely. The aims of this study were to investigate the directional trend of the epidemic and to identify the significant presence of spatial-temporal clustering of influenza A(H7N9) human cases between March 2013 and December 2014. Three distinct epidemic phases of A(H7N9) human infections were identified in this study. In each phase, standard deviational ellipse analysis was conducted to examine the directional trend of disease spreading, and retrospective space-time permutation scan statistic was then used to identify the spatio-temporal cluster patterns of H7N9 outbreaks in humans. The ever-changing location and the increasing size of the three identified standard deviational ellipses showed that the epidemic moved from east to southeast coast, and hence to some central regions, with a future epidemiological trend of continue dispersing to more central regions of China, and a few new human cases might also appear in parts of the western China. Furthermore, A(H7N9) human infections were clustering in space and time in the first two phases with five significant spatio-temporal clusters (p < 0.05), but there was no significant cluster identified in phase III. There was a new epidemiologic pattern that the decrease in significant spatio-temporal cluster of A(H7N9) human infections was accompanied with an obvious spatial expansion of the outbreaks during the study period, and identification of the spatio-temporal patterns of the epidemic can provide valuable insights for better understanding the spreading dynamics of the disease in China.

  15. Quantifying opening-mode fracture spatial organization in horizontal wellbore image logs, core and outcrop: Application to Upper Cretaceous Frontier Formation tight gas sandstones, USA

    NASA Astrophysics Data System (ADS)

    Li, J. Z.; Laubach, S. E.; Gale, J. F. W.; Marrett, R. A.

    2018-03-01

    The Upper Cretaceous Frontier Formation is a naturally fractured gas-producing sandstone in Wyoming. Regionally, random and statistically more clustered than random patterns exist in the same upper to lower shoreface depositional facies. East-west- and north-south-striking regional fractures sampled using image logs and cores from three horizontal wells exhibit clustered patterns, whereas data collected from east-west-striking fractures in outcrop have patterns that are indistinguishable from random. Image log data analyzed with the correlation count method shows clusters ∼35 m wide and spaced ∼50 to 90 m apart as well as clusters up to 12 m wide with periodic inter-cluster spacings. A hierarchy of cluster sizes exists; organization within clusters is likely fractal. These rocks have markedly different structural and burial histories, so regional differences in degree of clustering are unsurprising. Clustered patterns correspond to fractures having core quartz deposition contemporaneous with fracture opening, circumstances that some models suggest might affect spacing patterns by interfering with fracture growth. Our results show that quantifying and identifying patterns as statistically more or less clustered than random delineates differences in fracture patterns that are not otherwise apparent but that may influence gas and water production, and therefore may be economically important.

  16. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  17. Illuminating the star clusters and satellite galaxies with multi-scale baryonic simulations

    NASA Astrophysics Data System (ADS)

    Maji, Moupiya; Zhu, Qirong; Li, Yuexing; Marinacci, Federico; Charlton, Jane; Hernquist, Lars; Knebe, Alexander

    2018-01-01

    Over the past decade, advances in computational architecture have made it possible for the first time to investigate some of the fundamental questions around the formation, evolution and assembly of the building blocks of the universe; star clusters and galaxies. In this talk, I will focus on two major questions: What is the origin of the observed universal lognormal mass function in globular clusters? What is the statistical distribution of the properties of satellite planes in a large sample of satellite systems?Observations of globular clusters show that they have universal lognormal mass functions with a characteristic peak at 2X105 MSun, although the origin of this peaked distribution is unclear. We investigate the formation of star clusters in interacting galaxies using baryonic simulations and found that massive clusters preferentially form in extremely high pressure gas clouds which reside in highly shocked regions produced by galaxy interactions. These massive clusters have quasi-lognormal initial mass functions with a peak around ~106MSun which may survive dynamical evolution and slowly evolve into the universal lognormal profiles observed today.The classical Milky Way (MW) satellites are observed to be distributed in a highly-flattened plane, called Disk of Satellites (DoS). However the significance, coherence and origin of DoS is highly debated. To understand this, we first analyze all MW satellites and find that a small sample size can artificially produce a highly anisotropic spatial distribution and a strong clustering of their angular momentum. Comparing a baryonic simulation of a MW-sized galaxy with its N-body counterpart we find that an anisotropic DoS can originate from baryonic processes. Furthermore, we explore the statistical distribution of DoS properties by analyzing 2591 satellite systems in the cosmological hydrodynamic simulation Illustris. We find that the DoS becomes more isotropic with increasing sample sizes and most (~90%) satellite systems have no clear coherent rotation. Their overall evolution indicate that the DoS may be part of large scale filamentary structure. Our results show that baryonic processes may be the key to solve many long standing theoretical problems.

  18. Voronoi distance based prospective space-time scans for point data sets: a dengue fever cluster analysis in a southeast Brazilian town

    PubMed Central

    2011-01-01

    Background The Prospective Space-Time scan statistic (PST) is widely used for the evaluation of space-time clusters of point event data. Usually a window of cylindrical shape is employed, with a circular or elliptical base in the space domain. Recently, the concept of Minimum Spanning Tree (MST) was applied to specify the set of potential clusters, through the Density-Equalizing Euclidean MST (DEEMST) method, for the detection of arbitrarily shaped clusters. The original map is cartogram transformed, such that the control points are spread uniformly. That method is quite effective, but the cartogram construction is computationally expensive and complicated. Results A fast method for the detection and inference of point data set space-time disease clusters is presented, the Voronoi Based Scan (VBScan). A Voronoi diagram is built for points representing population individuals (cases and controls). The number of Voronoi cells boundaries intercepted by the line segment joining two cases points defines the Voronoi distance between those points. That distance is used to approximate the density of the heterogeneous population and build the Voronoi distance MST linking the cases. The successive removal of edges from the Voronoi distance MST generates sub-trees which are the potential space-time clusters. Finally, those clusters are evaluated through the scan statistic. Monte Carlo replications of the original data are used to evaluate the significance of the clusters. An application for dengue fever in a small Brazilian city is presented. Conclusions The ability to promptly detect space-time clusters of disease outbreaks, when the number of individuals is large, was shown to be feasible, due to the reduced computational load of VBScan. Instead of changing the map, VBScan modifies the metric used to define the distance between cases, without requiring the cartogram construction. Numerical simulations showed that VBScan has higher power of detection, sensitivity and positive predicted value than the Elliptic PST. Furthermore, as VBScan also incorporates topological information from the point neighborhood structure, in addition to the usual geometric information, it is more robust than purely geometric methods such as the elliptic scan. Those advantages were illustrated in a real setting for dengue fever space-time clusters. PMID:21513556

  19. Where are Low Mass X-ray Binaries Formed?

    NASA Astrophysics Data System (ADS)

    Kundu, A.; Maccarone, T. J.; Zepf, S. E.

    2004-08-01

    Chandra images of nearby galaxies reveal large numbers of low mass X-ray binaries (LMXBs). As in the Galaxy, a significant fraction of these are associated with globular clusters. We exploit the LMXB-globular cluster link in order to probe both the physical properties of globular clusters that promote the formation of LMXBs within clusters with specific characteristics, and to study whether the non-cluster field LMXB population was originally formed in clusters and then released into the field. The large population of globular clusters around nearby galaxies and the range of properties such as age, metallicity and host galaxy environment spanned by these objects enables us to identify and probe the link between these characteristics and the formation of LMXBs. We present the results of our study of a large sample of elliptical and S0 galaxies which reveals among other things that bright LMXBs definitively prefer metal-rich cluster hosts and that this relationship is unlikely to be driven by age effects. The ancestry of the non-cluster field LMXBs is a matter of some debate with suggestions that they they might have formed in the field, or created in globular clusters and then subsequently released into the field either by being ejected from clusters by dynamical processes or as remnants of dynamically destroyed clusters. Each of these scenarios has a specific spatial signature that can be tested by our combined optical and X-ray study. Furthermore, these scenarios predict additional statistical variations that may be driven by the specific host galaxy environment. We present a detailed analysis of our sample galaxies and comment on the probability that the field sources were actually formed in clusters.

  20. Schizophrenia classification using functional network features

    NASA Astrophysics Data System (ADS)

    Rish, Irina; Cecchi, Guillermo A.; Heuton, Kyle

    2012-03-01

    This paper focuses on discovering statistical biomarkers (features) that are predictive of schizophrenia, with a particular focus on topological properties of fMRI functional networks. We consider several network properties, such as node (voxel) strength, clustering coefficients, local efficiency, as well as just a subset of pairwise correlations. While all types of features demonstrate highly significant statistical differences in several brain areas, and close to 80% classification accuracy, the most remarkable results of 93% accuracy are achieved by using a small subset of only a dozen of most-informative (lowest p-value) correlation features. Our results suggest that voxel-level correlations and functional network features derived from them are highly informative about schizophrenia and can be used as statistical biomarkers for the disease.

  1. Unusual clustering of coefficients of variation in published articles from a medical biochemistry department in India.

    PubMed

    Hudes, Mark L; McCann, Joyce C; Ames, Bruce N

    2009-03-01

    A simple statistical method is described to test whether data are consistent with minimum statistical variability expected in a biological experiment. The method is applied to data presented in data tables in a subset of 84 articles among more than 200 published by 3 investigators in a small medical biochemistry department at a major university in India and to 29 "control" articles selected by key word PubMed searches. Major conclusions include: 1) unusual clustering of coefficients of variation (CVs) was observed for data from the majority of articles analyzed that were published by the 3 investigators from 2000-2007; unusual clustering was not observed for data from any of their articles examined that were published between 1992 and 1999; and 2) among a group of 29 control articles retrieved by PubMed key word, title, or title/abstract searches, unusually clustered CVs were observed in 3 articles. Two of these articles were coauthored by 1 of the 3 investigators, and 1 was from the same university but a different department. We are unable to offer a statistical or biological explanation for the unusual clustering observed.

  2. Shifting Patterns of Aedes aegypti Fine Scale Spatial Clustering in Iquitos, Peru

    PubMed Central

    LaCon, Genevieve; Morrison, Amy C.; Astete, Helvio; Stoddard, Steven T.; Paz-Soldan, Valerie A.; Elder, John P.; Halsey, Eric S.; Scott, Thomas W.; Kitron, Uriel; Vazquez-Prokopec, Gonzalo M.

    2014-01-01

    Background Empiric evidence shows that Aedes aegypti abundance is spatially heterogeneous and that some areas and larval habitats produce more mosquitoes than others. There is a knowledge gap, however, with regards to the temporal persistence of such Ae. aegypti abundance hotspots. In this study, we used a longitudinal entomologic dataset from the city of Iquitos, Peru, to (1) quantify the spatial clustering patterns of adult Ae. aegypti and pupae counts per house, (2) determine overlap between clusters, (3) quantify the temporal stability of clusters over nine entomologic surveys spaced four months apart, and (4) quantify the extent of clustering at the household and neighborhood levels. Methodologies/Principal Findings Data from 13,662 household entomological visits performed in two Iquitos neighborhoods differing in Ae. aegypti abundance and dengue virus transmission was analyzed using global and local spatial statistics. The location and extent of Ae. aegypti pupae and adult hotspots (i.e., small groups of houses with significantly [p<0.05] high mosquito abundance) were calculated for each of the 9 entomologic surveys. The extent of clustering was used to quantify the probability of finding spatially correlated populations. Our analyses indicate that Ae. aegypti distribution was highly focal (most clusters do not extend beyond 30 meters) and that hotspots of high vector abundance were common on every survey date, but they were temporally unstable over the period of study. Conclusions/Significance Our findings have implications for understanding Ae. aegypti distribution and for the design of surveillance and control activities relying on household-level data. In settings like Iquitos, where there is a relatively low percentage of Ae. aegypti in permanent water-holding containers, identifying and targeting key premises will be significantly challenged by shifting hotspots of Ae. aegypti infestation. Focusing efforts in large geographic areas with historically high levels of transmission may be more effective than targeting Ae. aegypti hotspots. PMID:25102062

  3. Cluster mislocation in kinematic Sunyaev-Zel'dovich (kSZ) effect extraction

    NASA Astrophysics Data System (ADS)

    Calafut, Victoria Rose; Bean, Rachel; Yu, Byeonghee

    2018-01-01

    We investigate the impact of a variety of analysis assumptions that influence cluster identification and location on the kSZ pairwise momentum signal and covariance estimation. Photometric and spectroscopic galaxy tracers from SDSS, WISE, and DECaLs, spanning redshifts 0.05

  4. Mapping concentrations of posttraumatic stress and depression trajectories following Hurricane Ike

    PubMed Central

    Gruebner, Oliver; Lowe, Sarah R.; Tracy, Melissa; Joshi, Spruha; Cerdá, Magdalena; Norris, Fran H.; Subramanian, S. V.; Galea, Sandro

    2016-01-01

    We investigated geographic concentration in elevated risk for a range of postdisaster trajectories of chronic posttraumatic stress symptom (PTSS) and depression symptoms in a longitudinal study (N = 561) of a Hurricane Ike affected population in Galveston and Chambers counties, TX. Using an unadjusted spatial scan statistic, we detected clusters of elevated risk of PTSS trajectories, but not depression trajectories, on Galveston Island. We then tested for predictors of membership in each trajectory of PTSS and depression (e.g., demographic variables, trauma exposure, social support), not taking the geographic nature of the data into account. After adjusting for significant predictors in the spatial scan statistic, we noted that spatial clusters of PTSS persisted and additional clusters of depression trajectories emerged. This is the first study to show that longitudinal trajectories of postdisaster mental health problems may vary depending on the geographic location and the individual- and community-level factors present at these locations. Such knowledge is crucial to identifying vulnerable regions and populations within them, to provide guidance for early responders, and to mitigate mental health consequences through early detection of mental health needs in the population. As human-made disasters increase, our approach may be useful also in other regions in comparable settings worldwide. PMID:27558011

  5. Mapping concentrations of posttraumatic stress and depression trajectories following Hurricane Ike.

    PubMed

    Gruebner, Oliver; Lowe, Sarah R; Tracy, Melissa; Joshi, Spruha; Cerdá, Magdalena; Norris, Fran H; Subramanian, S V; Galea, Sandro

    2016-08-25

    We investigated geographic concentration in elevated risk for a range of postdisaster trajectories of chronic posttraumatic stress symptom (PTSS) and depression symptoms in a longitudinal study (N = 561) of a Hurricane Ike affected population in Galveston and Chambers counties, TX. Using an unadjusted spatial scan statistic, we detected clusters of elevated risk of PTSS trajectories, but not depression trajectories, on Galveston Island. We then tested for predictors of membership in each trajectory of PTSS and depression (e.g., demographic variables, trauma exposure, social support), not taking the geographic nature of the data into account. After adjusting for significant predictors in the spatial scan statistic, we noted that spatial clusters of PTSS persisted and additional clusters of depression trajectories emerged. This is the first study to show that longitudinal trajectories of postdisaster mental health problems may vary depending on the geographic location and the individual- and community-level factors present at these locations. Such knowledge is crucial to identifying vulnerable regions and populations within them, to provide guidance for early responders, and to mitigate mental health consequences through early detection of mental health needs in the population. As human-made disasters increase, our approach may be useful also in other regions in comparable settings worldwide.

  6. Prediction of CpG-island function: CpG clustering vs. sliding-window methods

    PubMed Central

    2010-01-01

    Background Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. Results We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. Conclusions The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands. PMID:20500903

  7. HICOSMO: cosmology with a complete sample of galaxy clusters - II. Cosmological results

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T. H.

    2017-10-01

    The X-ray bright, hot gas in the potential well of a galaxy cluster enables systematic X-ray studies of samples of galaxy clusters to constrain cosmological parameters. HIFLUGCS consists of the 64 X-ray brightest galaxy clusters in the Universe, building up a local sample. Here, we utilize this sample to determine, for the first time, individual hydrostatic mass estimates for all the clusters of the sample and, by making use of the completeness of the sample, we quantify constraints on the two interesting cosmological parameters, Ωm and σ8. We apply our total hydrostatic and gas mass estimates from the X-ray analysis to a Bayesian cosmological likelihood analysis and leave several parameters free to be constrained. We find Ωm = 0.30 ± 0.01 and σ8 = 0.79 ± 0.03 (statistical uncertainties, 68 per cent credibility level) using our default analysis strategy combining both a mass function analysis and the gas mass fraction results. The main sources of biases that we correct here are (1) the influence of galaxy groups (incompleteness in parent samples and differing behaviour of the Lx-M relation), (2) the hydrostatic mass bias, (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other physical effects (non-negligible neutrino mass). We find that galaxy groups introduce a strong bias, since their number density seems to be over predicted by the halo mass function. On the other hand, incorporating baryonic effects does not result in a significant change in the constraints. The total (uncorrected) systematic uncertainties (∼20 per cent) clearly dominate the statistical uncertainties on cosmological parameters for our sample.

  8. The spatio-temporal mapping of epileptic networks: Combination of EEG–fMRI and EEG source imaging

    PubMed Central

    Vulliemoz, S.; Thornton, R.; Rodionov, R.; Carmichael, D.W.; Guye, M.; Lhatoo, S.; McEvoy, A.W.; Spinelli, L.; Michel, C.M.; Duncan, J.S.; Lemieux, L.

    2009-01-01

    Simultaneous EEG–fMRI acquisitions in patients with epilepsy often reveal distributed patterns of Blood Oxygen Level Dependant (BOLD) change correlated with epileptiform discharges. We investigated if electrical source imaging (ESI) performed on the interictal epileptiform discharges (IED) acquired during fMRI acquisition could be used to study the dynamics of the networks identified by the BOLD effect, thereby avoiding the limitations of combining results from separate recordings. Nine selected patients (13 IED types identified) with focal epilepsy underwent EEG–fMRI. Statistical analysis was performed using SPM5 to create BOLD maps. ESI was performed on the IED recorded during fMRI acquisition using a realistic head model (SMAC) and a distributed linear inverse solution (LAURA). ESI could not be performed in one case. In 10/12 remaining studies, ESI at IED onset (ESIo) was anatomically close to one BOLD cluster. Interestingly, ESIo was closest to the positive BOLD cluster with maximal statistical significance in only 4/12 cases and closest to negative BOLD responses in 4/12 cases. Very small BOLD clusters could also have clinical relevance in some cases. ESI at later time frame (ESIp) showed propagation to remote sources co-localised with other BOLD clusters in half of cases. In concordant cases, the distance between maxima of ESI and the closest EEG–fMRI cluster was less than 33 mm, in agreement with previous studies. We conclude that simultaneous ESI and EEG–fMRI analysis may be able to distinguish areas of BOLD response related to initiation of IED from propagation areas. This combination provides new opportunities for investigating epileptic networks. PMID:19408351

  9. The gamma-ray pulsar population of globular clusters: implications for the GeV excess

    NASA Astrophysics Data System (ADS)

    Hooper, Dan; Linden, Tim

    2016-08-01

    It has been suggested that the GeV excess, observed from the region surrounding the Galactic Center, might originate from a population of millisecond pulsars that formed in globular clusters. With this in mind, we employ the publicly available Fermi data to study the gamma-ray emission from 157 globular clusters, identifying a statistically significant signal from 25 of these sources (ten of which are not found in existing gamma-ray catalogs). We combine these observations with the predicted pulsar formation rate based on the stellar encounter rate of each globular cluster to constrain the gamma-ray luminosity function of millisecond pulsars in the Milky Way's globular cluster system. We find that this pulsar population exhibits a luminosity function that is quite similar to those millisecond pulsars observed in the field of the Milky Way (i.e. the thick disk). After pulsars are expelled from a globular cluster, however, they continue to lose rotational kinetic energy and become less luminous, causing their luminosity function to depart from the steady-state distribution. Using this luminosity function and a model for the globular cluster disruption rate, we show that millisecond pulsars born in globular clusters can account for only a few percent or less of the observed GeV excess. Among other challenges, scenarios in which the entire GeV excess is generated from such pulsars are in conflict with the observed mass of the Milky Way's Central Stellar Cluster.

  10. COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS*

    PubMed Central

    Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A.

    2017-01-01

    Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 μg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP. PMID:28572869

  11. The gamma-ray pulsar population of globular clusters: implications for the GeV excess

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hooper, Dan; Linden, Tim, E-mail: dhooper@fnal.gov, E-mail: linden.70@osu.edu

    It has been suggested that the GeV excess, observed from the region surrounding the Galactic Center, might originate from a population of millisecond pulsars that formed in globular clusters. With this in mind, we employ the publicly available Fermi data to study the gamma-ray emission from 157 globular clusters, identifying a statistically significant signal from 25 of these sources (ten of which are not found in existing gamma-ray catalogs). We combine these observations with the predicted pulsar formation rate based on the stellar encounter rate of each globular cluster to constrain the gamma-ray luminosity function of millisecond pulsars in themore » Milky Way's globular cluster system. We find that this pulsar population exhibits a luminosity function that is quite similar to those millisecond pulsars observed in the field of the Milky Way (i.e. the thick disk). After pulsars are expelled from a globular cluster, however, they continue to lose rotational kinetic energy and become less luminous, causing their luminosity function to depart from the steady-state distribution. Using this luminosity function and a model for the globular cluster disruption rate, we show that millisecond pulsars born in globular clusters can account for only a few percent or less of the observed GeV excess. Among other challenges, scenarios in which the entire GeV excess is generated from such pulsars are in conflict with the observed mass of the Milky Way's Central Stellar Cluster.« less

  12. The gamma-ray pulsar population of globular clusters: Implications for the GeV excess

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hooper, Dan; Linden, Tim

    In this study, it has been suggested that the GeV excess, observed from the region surrounding the Galactic Center, might originate from a population of millisecond pulsars that formed in globular clusters. With this in mind, we employ the publicly available Fermi data to study the gamma-ray emission from 157 globular clusters, identifying a statistically significant signal from 25 of these sources (ten of which are not found in existing gamma-ray catalogs). We combine these observations with the predicted pulsar formation rate based on the stellar encounter rate of each globular cluster to constrain the gamma-ray luminosity function of millisecondmore » pulsars in the Milky Way's globular cluster system. We find that this pulsar population exhibits a luminosity function that is quite similar to those millisecond pulsars observed in the field of the Milky Way (i.e. the thick disk). After pulsars are expelled from a globular cluster, however, they continue to lose rotational kinetic energy and become less luminous, causing their luminosity function to depart from the steady-state distribution. Using this luminosity function and a model for the globular cluster disruption rate, we show that millisecond pulsars born in globular clusters can account for only a few percent or less of the observed GeV excess. Among other challenges, scenarios in which the entire GeV excess is generated from such pulsars are in conflict with the observed mass of the Milky Way's Central Stellar Cluster.« less

  13. The gamma-ray pulsar population of globular clusters: Implications for the GeV excess

    DOE PAGES

    Hooper, Dan; Linden, Tim

    2016-08-09

    In this study, it has been suggested that the GeV excess, observed from the region surrounding the Galactic Center, might originate from a population of millisecond pulsars that formed in globular clusters. With this in mind, we employ the publicly available Fermi data to study the gamma-ray emission from 157 globular clusters, identifying a statistically significant signal from 25 of these sources (ten of which are not found in existing gamma-ray catalogs). We combine these observations with the predicted pulsar formation rate based on the stellar encounter rate of each globular cluster to constrain the gamma-ray luminosity function of millisecondmore » pulsars in the Milky Way's globular cluster system. We find that this pulsar population exhibits a luminosity function that is quite similar to those millisecond pulsars observed in the field of the Milky Way (i.e. the thick disk). After pulsars are expelled from a globular cluster, however, they continue to lose rotational kinetic energy and become less luminous, causing their luminosity function to depart from the steady-state distribution. Using this luminosity function and a model for the globular cluster disruption rate, we show that millisecond pulsars born in globular clusters can account for only a few percent or less of the observed GeV excess. Among other challenges, scenarios in which the entire GeV excess is generated from such pulsars are in conflict with the observed mass of the Milky Way's Central Stellar Cluster.« less

  14. cluster trials v. 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mitchell, John; Castillo, Andrew

    2016-09-21

    This software contains a set of python modules – input, search, cluster, analysis; these modules read input files containing spatial coordinates and associated attributes which can be used to perform nearest neighbor search (spatial indexing via kdtree), cluster analysis/identification, and calculation of spatial statistics for analysis.

  15. Variable Screening for Cluster Analysis.

    ERIC Educational Resources Information Center

    Donoghue, John R.

    Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…

  16. Logo image clustering based on advanced statistics

    NASA Astrophysics Data System (ADS)

    Wei, Yi; Kamel, Mohamed; He, Yiwei

    2007-11-01

    In recent years, there has been a growing interest in the research of image content description techniques. Among those, image clustering is one of the most frequently discussed topics. Similar to image recognition, image clustering is also a high-level representation technique. However it focuses on the coarse categorization rather than the accurate recognition. Based on wavelet transform (WT) and advanced statistics, the authors propose a novel approach that divides various shaped logo images into groups according to the external boundary of each logo image. Experimental results show that the presented method is accurate, fast and insensitive to defects.

  17. Chapter two: Phenomenology of tsunamis II: scaling, event statistics, and inter-event triggering

    USGS Publications Warehouse

    Geist, Eric L.

    2012-01-01

    Observations related to tsunami catalogs are reviewed and described in a phenomenological framework. An examination of scaling relationships between earthquake size (as expressed by scalar seismic moment and mean slip) and tsunami size (as expressed by mean and maximum local run-up and maximum far-field amplitude) indicates that scaling is significant at the 95% confidence level, although there is uncertainty in how well earthquake size can predict tsunami size (R2 ~ 0.4-0.6). In examining tsunami event statistics, current methods used to estimate the size distribution of earthquakes and landslides and the inter-event time distribution of earthquakes are first reviewed. These methods are adapted to estimate the size and inter-event distribution of tsunamis at a particular recording station. Using a modified Pareto size distribution, the best-fit power-law exponents of tsunamis recorded at nine Pacific tide-gauge stations exhibit marked variation, in contrast to the approximately constant power-law exponent for inter-plate thrust earthquakes. With regard to the inter-event time distribution, significant temporal clustering of tsunami sources is demonstrated. For tsunami sources occurring in close proximity to other sources in both space and time, a physical triggering mechanism, such as static stress transfer, is a likely cause for the anomalous clustering. Mechanisms of earthquake-to-earthquake and earthquake-to-landslide triggering are reviewed. Finally, a modification of statistical branching models developed for earthquake triggering is introduced to describe triggering among tsunami sources.

  18. Spatial and temporal structure of typhoid outbreaks in Washington, D.C., 1906–1909: evaluating local clustering with the Gi* statistic

    PubMed Central

    Hinman, Sarah E; Blackburn, Jason K; Curtis, Andrew

    2006-01-01

    Background To better understand the distribution of typhoid outbreaks in Washington, D.C., the U.S. Public Health Service (PHS) conducted four investigations of typhoid fever. These studies included maps of cases reported between 1 May – 31 October 1906 – 1909. These data were entered into a GIS database and analyzed using Ripley's K-function followed by the Gi* statistic in yearly intervals to evaluate spatial clustering, the scale of clustering, and the temporal stability of these clusters. Results The Ripley's K-function indicated no global spatial autocorrelation. The Gi* statistic indicated clustering of typhoid at multiple scales across the four year time period, refuting the conclusions drawn in all four PHS reports concerning the distribution of cases. While the PHS reports suggested an even distribution of the disease, this study quantified both areas of localized disease clustering, as well as mobile larger regions of clustering. Thus, indicating both highly localized and periodic generalized sources of infection within the city. Conclusion The methodology applied in this study was useful for evaluating the spatial distribution and annual-level temporal patterns of typhoid outbreaks in Washington, D.C. from 1906 to 1909. While advanced spatial analyses of historical data sets must be interpreted with caution, this study does suggest that there is utility in these types of analyses and that they provide new insights into the urban patterns of typhoid outbreaks during the early part of the twentieth century. PMID:16566830

  19. Spatial and temporal structure of typhoid outbreaks in Washington, D.C., 1906-1909: evaluating local clustering with the Gi* statistic.

    PubMed

    Hinman, Sarah E; Blackburn, Jason K; Curtis, Andrew

    2006-03-27

    To better understand the distribution of typhoid outbreaks in Washington, D.C., the U.S. Public Health Service (PHS) conducted four investigations of typhoid fever. These studies included maps of cases reported between 1 May - 31 October 1906 - 1909. These data were entered into a GIS database and analyzed using Ripley's K-function followed by the Gi* statistic in yearly intervals to evaluate spatial clustering, the scale of clustering, and the temporal stability of these clusters. The Ripley's K-function indicated no global spatial autocorrelation. The Gi* statistic indicated clustering of typhoid at multiple scales across the four year time period, refuting the conclusions drawn in all four PHS reports concerning the distribution of cases. While the PHS reports suggested an even distribution of the disease, this study quantified both areas of localized disease clustering, as well as mobile larger regions of clustering. Thus, indicating both highly localized and periodic generalized sources of infection within the city. The methodology applied in this study was useful for evaluating the spatial distribution and annual-level temporal patterns of typhoid outbreaks in Washington, D.C. from 1906 to 1909. While advanced spatial analyses of historical data sets must be interpreted with caution, this study does suggest that there is utility in these types of analyses and that they provide new insights into the urban patterns of typhoid outbreaks during the early part of the twentieth century.

  20. Zonation in the deep benthic megafauna : Application of a general test.

    PubMed

    Gardiner, Frederick P; Haedrich, Richard L

    1978-01-01

    A test based on Maxwell-Boltzman statistics, instead of the formerly suggested but inappropriate Bose-Einstein statistics (Pielou and Routledge, 1976), examines the distribution of the boundaries of species' ranges distributed along a gradient, and indicates whether they are random or clustered (zoned). The test is most useful as a preliminary to the application of more instructive but less statistically rigorous methods such as cluster analysis. The test indicates zonation is marked in the deep benthic megafauna living between 200 and 3000 m, but below 3000 m little zonation may be found.

  1. Evaluating SPLASH-2 Applications Using MapReduce

    NASA Astrophysics Data System (ADS)

    Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu

    MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.

  2. Promoting the Development of Preschool Children's Emergent Literacy Skills: A Randomized Evaluation of a Literacy-Focused Curriculum and Two Professional Development Models

    ERIC Educational Resources Information Center

    Lonigan, Christopher J.; Farver, JoAnn M.; Phillips, Beth M.; Clancy-Menchetti, Jeanine

    2011-01-01

    To date, there have been few causally interpretable evaluations of the impacts of preschool curricula on the skills of children at-risk for academic difficulties, and even fewer studies have demonstrated statistically significant or educationally meaningful effects. In this cluster-randomized study, we evaluated the impacts of a literacy-focused…

  3. Laboratory-based prospective surveillance for community outbreaks of Shigella spp. in Argentina.

    PubMed

    Viñas, María R; Tuduri, Ezequiel; Galar, Alicia; Yih, Katherine; Pichel, Mariana; Stelling, John; Brengi, Silvina P; Della Gaspera, Anabella; van der Ploeg, Claudia; Bruno, Susana; Rogé, Ariel; Caffer, María I; Kulldorff, Martin; Galas, Marcelo

    2013-01-01

    To implement effective control measures, timely outbreak detection is essential. Shigella is the most common cause of bacterial diarrhea in Argentina. Highly resistant clones of Shigella have emerged, and outbreaks have been recognized in closed settings and in whole communities. We hereby report our experience with an evolving, integrated, laboratory-based, near real-time surveillance system operating in six contiguous provinces of Argentina during April 2009 to March 2012. To detect localized shigellosis outbreaks timely, we used the prospective space-time permutation scan statistic algorithm of SaTScan, embedded in WHONET software. Twenty three laboratories sent updated Shigella data on a weekly basis to the National Reference Laboratory. Cluster detection analysis was performed at several taxonomic levels: for all Shigella spp., for serotypes within species and for antimicrobial resistance phenotypes within species. Shigella isolates associated with statistically significant signals (clusters in time/space with recurrence interval ≥365 days) were subtyped by pulsed field gel electrophoresis (PFGE) using PulseNet protocols. In three years of active surveillance, our system detected 32 statistically significant events, 26 of them identified before hospital staff was aware of any unexpected increase in the number of Shigella isolates. Twenty-six signals were investigated by PFGE, which confirmed a close relationship among the isolates for 22 events (84.6%). Seven events were investigated epidemiologically, which revealed links among the patients. Seventeen events were found at the resistance profile level. The system detected events of public health importance: infrequent resistance profiles, long-lasting and/or re-emergent clusters and events important for their duration or size, which were reported to local public health authorities. The WHONET-SaTScan system may serve as a model for surveillance and can be applied to other pathogens, implemented by other networks, and scaled up to national and international levels for early detection and control of outbreaks.

  4. Identification of stress responsive genes by studying specific relationships between mRNA and protein abundance.

    PubMed

    Morimoto, Shimpei; Yahara, Koji

    2018-03-01

    Protein expression is regulated by the production and degradation of mRNAs and proteins but the specifics of their relationship are controversial. Although technological advances have enabled genome-wide and time-series surveys of mRNA and protein abundance, recent studies have shown paradoxical results, with most statistical analyses being limited to linear correlation, or analysis of variance applied separately to mRNA and protein datasets. Here, using recently analyzed genome-wide time-series data, we have developed a statistical analysis framework for identifying which types of genes or biological gene groups have significant correlation between mRNA and protein abundance after accounting for potential time delays. Our framework stratifies all genes in terms of the extent of time delay, conducts gene clustering in each stratum, and performs a non-parametric statistical test of the correlation between mRNA and protein abundance in a gene cluster. Consequently, we revealed stronger correlations than previously reported between mRNA and protein abundance in two metabolic pathways. Moreover, we identified a pair of stress responsive genes ( ADC17 and KIN1 ) that showed a highly similar time series of mRNA and protein abundance. Furthermore, we confirmed robustness of the analysis framework by applying it to another genome-wide time-series data and identifying a cytoskeleton-related gene cluster (keratin 18, keratin 17, and mitotic spindle positioning) that shows similar correlation. The significant correlation and highly similar changes of mRNA and protein abundance suggests a concerted role of these genes in cellular stress response, which we consider provides an answer to the question of the specific relationships between mRNA and protein in a cell. In addition, our framework for studying the relationship between mRNAs and proteins in a cell will provide a basis for studying specific relationships between mRNA and protein abundance after accounting for potential time delays.

  5. SEARCHING FOR BULK MOTIONS IN THE INTRACLUSTER MEDIUM OF MASSIVE, MERGING CLUSTERS WITH CHANDRA CCD DATA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Ang; Yu, Heng; Tozzi, Paolo

    2016-04-10

    We search for bulk motions in the intracluster medium (ICM) of massive clusters showing evidence of an ongoing or recent major merger with spatially resolved spectroscopy in Chandra CCD data. We identify a sample of six merging clusters with >150 ks Chandra exposure in the redshift range 0.1 < z < 0.3. By performing X-ray spectral analysis of projected ICM regions selected according to their surface brightness, we obtain the projected redshift maps for all of these clusters. After performing a robust analysis of the statistical and systematic uncertainties in the measured X-ray redshift z{sub X}, we check whether or not themore » global z{sub X} distribution differs from that expected when the ICM is at rest. We find evidence of significant bulk motions at more than 3σ in A2142 and A115, and less than 2σ in A2034 and A520. Focusing on single regions, we identify significant localized velocity differences in all of the merger clusters. We also perform the same analysis on two relaxed clusters with no signatures of recent mergers, finding no signs of bulk motions, as expected. Our results indicate that deep Chandra CCD data enable us to identify the presence of bulk motions at the level of v{sub BM} > 1000 km s{sup −1} in the ICM of massive merging clusters at 0.1 < z < 0.3. Although the CCD spectral resolution is not sufficient for a detailed analysis of the ICM dynamics, Chandra CCD data constitute a key diagnostic tool complementing X-ray bolometers on board future X-ray missions.« less

  6. Prostate cancer incidence and tumor severity in Georgia: descriptive epidemiology, racial disparity, and geographic trends.

    PubMed

    Wagner, Sara E; Bauer, Sarah E; Bayakly, A Rana; Vena, John E

    2013-01-01

    Limited research has been conducted to describe the geographical clustering and distribution of prostate cancer (PrCA) incidence in Georgia (GA). This study describes and compares the temporal and geographic trends of PrCA incidence in GA with a specific focus on racial disparities. GA Comprehensive Cancer Registry PrCA incidence data were obtained for 1998-2008. Directly standardized age-adjusted PrCA incidence rates per 100,000 were analyzed by race, stage, grade, and county. County-level hotspots of PrCA incidence were analyzed with the Getis-Ord Gi* statistic in a geographic information system; a census tract-level cluster analysis was performed with a Discrete Poisson model and implemented in SaTScan(®) software. Significant (p < 0.05) hotspots of PrCA incidence were observed in nine southwestern counties and six centrally located counties among men of both races. Six significant (p < 0.1) clusters of PrCA incidence rates were detected for men of both races in north and northwest central Georgia. When stratified by race, clusters among white and black men were similar, although centroids were slightly shifted. Most notably, a large (122 km radius) cluster in northwest central Georgia was detected only in whites, and two smaller clusters (0-32 km radii) were detected in Southwest Georgia only in black men. Clusters of high-grade and late-stage tumors were identified primarily in the northern portion of the state among men of both races. This study revealed a pattern of higher incidence and more advanced disease in northern and northwest central Georgia, highlighting geographic patterns that need more research and investigation of possible environmental determinants.

  7. The nongravitational interactions of dark matter in colliding galaxy clusters.

    PubMed

    Harvey, David; Massey, Richard; Kitching, Thomas; Taylor, Andy; Tittley, Eric

    2015-03-27

    Collisions between galaxy clusters provide a test of the nongravitational forces acting on dark matter. Dark matter's lack of deceleration in the "bullet cluster" collision constrained its self-interaction cross section σ(DM)/m < 1.25 square centimeters per gram (cm(2)/g) [68% confidence limit (CL)] (σ(DM), self-interaction cross section; m, unit mass of dark matter) for long-ranged forces. Using the Chandra and Hubble Space Telescopes, we have now observed 72 collisions, including both major and minor mergers. Combining these measurements statistically, we detect the existence of dark mass at 7.6σ significance. The position of the dark mass has remained closely aligned within 5.8 ± 8.2 kiloparsecs of associated stars, implying a self-interaction cross section σ(DM)/m < 0.47 cm(2)/g (95% CL) and disfavoring some proposed extensions to the standard model. Copyright © 2015, American Association for the Advancement of Science.

  8. Effect of Stagger on the Vibroacoustic Loads from Clustered Rockets

    NASA Technical Reports Server (NTRS)

    Rojo, Raymundo; Tinney, Charles E.; Ruf, Joseph H.

    2016-01-01

    The effect of stagger startup on the vibro-acoustic loads that form during the end- effects-regime of clustered rockets is studied using both full-scale (hot-gas) and laboratory scale (cold gas) data. Both configurations comprise three nozzles with thrust optimized parabolic contours that undergo free shock separated flow and restricted shock separated flow as well as an end-effects regime prior to flowing full. Acoustic pressure waveforms recorded at the base of the nozzle clusters are analyzed using various statistical metrics as well as time-frequency analysis. The findings reveal a significant reduction in end- effects-regime loads when engine ignition is staggered. However, regardless of stagger, both the skewness and kurtosis of the acoustic pressure time derivative elevate to the same levels during the end-effects-regime event thereby demonstrating the intermittence and impulsiveness of the acoustic waveforms that form during engine startup.

  9. Kinematics and dynamics of the MKW/AWM poor clusters

    NASA Technical Reports Server (NTRS)

    Beers, Timothy C.; Kriessler, Jeffrey R.; Bird, Christina M.; Huchra, John P.

    1995-01-01

    We report 472 new redshifts for 416 galaxies in the regions of the 23 poor clusters of galaxies originally identified by Morgan, Kayser, and White (MKW), and Albert, White, and Morgan (AWM). Eighteen of the poor clusters now have 10 or more available redshifts within 1.5/h Mpc of the central galaxy; 11 clusters have at least 20 available redshifts. Based on the 21 clusters for which we have sufficient velocity information, the median velocity scale is 336 km/s, a factor of 2 smaller than found for rich clusters. Several of the poor clusters exhibit complex velocity distributions due to the presence of nearby clumps of galaxies. We check on the velocity of the dominant galaxy in each poor cluster relative to the remaining cluster members. Significantly high relative velocities of the dominant galaxy are found in only 4 of 21 poor clusters, 3 of which we suspect are due to contamination of the parent velocity distribution. Several statistical tests indicate that the D/cD galaxies are at the kinematic centers of the parent poor cluster velocity distributions. Mass-to-light ratios for 13 of the 15 poor clusters for which we have the required data are in the range 50 less than or = M/L(sub B(0)) less than or = 200 solar mass/solar luminosity. The complex nature of the regions surrounding many of the poor clusters suggests that these groupings may represent an early epoch of cluster formation. For example, the poor clusters MKW7 and MKWS are shown to be gravitationally bound and likely to merge to form a richer cluster within the next several Gyrs. Eight of the nine other poor clusters for which simple two-body dynamical models can be carried out are consistent with being bound to other clumps in their vicinity. Additional complex systems with more than two gravitationally bound clumps are observed among the poor clusters.

  10. PyClone: statistical inference of clonal population structure in cancer.

    PubMed

    Roth, Andrew; Khattra, Jaswinder; Yap, Damian; Wan, Adrian; Laks, Emma; Biele, Justina; Ha, Gavin; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

    2014-04-01

    We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

  11. Occurrence of Radio Minihalos in a Mass-Limited Sample of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Giacintucci, Simona; Markevitch, Maxim; Cassano, Rossella; Venturi, Tiziana; Clarke, Tracy E.; Brunetti, Gianfranco

    2017-01-01

    We investigate the occurrence of radio minihalos-diffuse radio sources of unknown origin observed in the cores of some galaxy clusters-in a statistical sample of 58 clusters drawn from the Planck Sunyaev-Zeldovich cluster catalog using a mass cut (M(sub 500) greater than 6 x 10(exp 14) solar mass). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present. Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores-at least 12 out of 15 (80%)-in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or "warm cores." These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.

  12. [Alcohol consumption and positive alcohol expectancies in young adults: a typological approach using TwoStep cluster].

    PubMed

    Vautier, S; Jmel, S; Fourio, C; Moncany, D

    2007-09-01

    The present study investigates the heterogeneity of the population of young adult drinkers with respect to alcohol consumption and Positive Alcohol Expectancies (PAEs). Based on the positive relationship between both kinds of variables, PAE is commonly viewed as a potential motivational factor of alcoholic addiction. Empirical analyses based on the regression of alcohol consumption on PAEs suppose that the observations are statistically homogeneous with respect to the level of alcohol consumption, however. We explored the existence of moderate drinkers with a high PAE profile, and abusive drinkers with a low PAE profile. 1,017 young adult drinkers, mean age=23 +/- 2.84, with various educational levels, comprising 506 males and 511 females, were recruited as voluntary participants in a survey by undergraduate psychology students from the University of Toulouse Le Mirail. They completed a French version of the Alcohol Use Disorders Identifiction Test (AUDIT) and a French adaptation of the Alcohol Expectancy Questionnaire (AEQ). Three levels of alcohol consumption were defined using the AUDIT score, and six composite scores were obtained by averaging the relevant item-scores from the AEQ. The AEQ scores were interpreted as measurement of six kinds of PAEs, namely Global positive change, Sexual enhancement, Social and physical pleasure, Social assertiveness, Relaxation, and Arousal/Power. The TwoStep cluster methodology was used to explore the data. This methodology is convenient to deal with a mix of quantitative and qualitative variables, and it provides a classification model which is optimized through the use of an information criterion as Schwarz's Bayesian Information Criterion (BIC). The automatic clustering suggested five clusters, whose stability was ascertained until 75% of the sample size. Low drinkers (n=527) were split into one cluster of low PAEs (I1) and, interestingly, one cluster of high PAEs (I3, 46%). High drinkers (n=344) were split into one cluster of intermediate PAEs (II4) and one cluster of high PAEs (II5, 52%). Interestingly again, abusive drinkers (n=146) remained a single group (III2), exhibiting high PAEs. Clusters I3 and III3 comprised a significant proportion of males. Constraining the algorithm to find 6 clusters did not affect class III2, but split low drinkers into three clusters. Although the present results should be considered cautiously because of the novelty of TwoStep cluster methodology, they suggest a group of moderate drinkers with high PAEs. Also, abusive drinkers express high PAEs (except for 2 cases). Statistical homogeneity of moderate drinkers with respect to PAE variables appears as a dubious assumption.

  13. Protoplanetary disc truncation mechanisms in stellar clusters: comparing external photoevaporation and tidal encounters

    NASA Astrophysics Data System (ADS)

    Winter, A. J.; Clarke, C. J.; Rosotti, G.; Ih, J.; Facchini, S.; Haworth, T. J.

    2018-04-01

    Most stars form and spend their early life in regions of enhanced stellar density. Therefore the evolution of protoplanetary discs (PPDs) hosted by such stars are subject to the influence of other members of the cluster. Physically, PPDs might be truncated either by photoevaporation due to ultraviolet flux from massive stars, or tidal truncation due to close stellar encounters. Here we aim to compare the two effects in real cluster environments. In this vein we first review the properties of well studied stellar clusters with a focus on stellar number density, which largely dictates the degree of tidal truncation, and far ultraviolet (FUV) flux, which is indicative of the rate of external photoevaporation. We then review the theoretical PPD truncation radius due to an arbitrary encounter, additionally taking into account the role of eccentric encounters that play a role in hot clusters with a 1D velocity dispersion σv ≳ 2 km/s. Our treatment is then applied statistically to varying local environments to establish a canonical threshold for the local stellar density (nc ≳ 104 pc-3) for which encounters can play a significant role in shaping the distribution of PPD radii over a timescale ˜3 Myr. By combining theoretical mass loss rates due to FUV flux with viscous spreading in a PPD we establish a similar threshold for which a massive disc is completely destroyed by external photoevaporation. Comparing these thresholds in local clusters we find that if either mechanism has a significant impact on the PPD population then photoevaporation is always the dominating influence.

  14. Malaria control and prevention towards elimination: data from an eleven-year surveillance in Shandong Province, China.

    PubMed

    Kong, Xiangli; Liu, Xin; Tu, Hong; Xu, Yan; Niu, Jianbing; Wang, Yongbin; Zhao, Changlei; Kou, Jingxuan; Feng, Jun

    2017-01-31

    Shandong Province experienced a declining malaria trend of local-acquired transmission, but the increasing imported malaria remains a challenge. Therefore, understanding the epidemiological characteristics of malaria and the control and elimination strategy and interventions is needed for better planning to achieve the overall elimination goal in Shandong Province. A retrospective study was conducted and all individual cases from a web-based reporting system were reviewed and analysed to explore malaria-endemic characteristics in Shandong from 2005 to 2015. Annual malaria incidence reported in 2005-2015 were geo-coded and matched to the county-level. Spatial cluster analysis was performed to evaluate any identified spatial disease clusters for statistical significance. The space-time cluster was detected with high rates through the retrospective space-time analysis scanning using the discrete Poisson model. The overall malaria incidence decreased to a low level during 2005-2015. In total, 1564 confirmed malaria cases were reported, 27.1% of which (n = 424) were indigenous cases. Most of the indigenous case (n = 339, 80.0%) occurred from June to October. However, the number and scale of imported cases have been increased but no significant difference was observed during months. Shandong is endemic for both Plasmodium vivax (n = 730) and Plasmodium falciparum (n = 674). The disease is mainly distributed in Southern (n = 710) and Eastern region (n = 424) of Shandong, such as Jinning (n = 214 [13.7%]), Weihai (n = 151 [9.7%]), and Yantai (n = 107 [6.8%]). Furthermore, the spatial cluster analysis of malaria cases from 2005 to 2015 indicated that the diseased was not randomly distributed. For indigenous cases, a total of 15 and 2 high-risk counties were determined from 2005 to 2009 (control phase) and from 2010 to 2015 (elimination phase), respectively. For imported cases, a total of 26 and 29 high-risk counties were determined from 2005 to 2009 (control phase) and from 2010 to 2015 (elimination phase), respectively. The method of spatial scan statistics identified different 13 significant spatial clusters between 2005 and 2015. The space-time clustering analysis determined that the most likely cluster included 14 and 19 counties for indigenous and imported, respectively. In order to cope with the requirements of malaria elimination phase, the surveillance system should be strengthened particularity on the frequent migration regions as well as the effective multisectoral cooperation and coordination mechanisms. Specific response packages should be tailored among different types of cities and capacity building should also be improved mainly focus on the emergence response and case management. Fund guarantees for scientific research should be maintained both during the elimination and post-elimination phase to consolidate the achievements of malaria elimination.

  15. A study on phenomenology of Dhat syndrome in men in a general medical setting

    PubMed Central

    Prakash, Sathya; Sharan, Pratap; Sood, Mamta

    2016-01-01

    Background: “Dhat syndrome” is believed to be a culture-bound syndrome of the Indian subcontinent. Although many studies have been performed, many have methodological limitations and there is a lack of agreement in many areas. Aims: The aim is to study the phenomenology of “Dhat syndrome” in men and to explore the possibility of subtypes within this entity. Settings and Design: It is a cross-sectional descriptive study conducted at a sex and marriage counseling clinic of a tertiary care teaching hospital in Northern India. Materials and Methods: An operational definition and assessment instrument for “Dhat syndrome” was developed after taking all concerned stakeholders into account and review of literature. It was applied on 100 patients along with socio-demographic profile, Hamilton Depression Rating Scale, Hamilton Anxiety Rating Scale, Mini International Neuropsychiatric Interview, and Postgraduate Institute Neuroticism Scale. Statistical Analysis: For statistical analysis, descriptive statistics, group comparisons, and Pearson's product moment correlations were carried out. Factor analysis and cluster analysis were done to determine the factor structure and subtypes of “Dhat syndrome.” Results: A diagnostic and assessment instrument for “Dhat syndrome” has been developed and the phenomenology in 100 patients has been described. Both the health beliefs scale and associated symptoms scale demonstrated a three-factor structure. The patients with “Dhat syndrome” could be categorized into three clusters based on severity. Conclusions: There appears to be a significant agreement among various stakeholders on the phenomenology of “Dhat syndrome” although some differences exist. “Dhat syndrome” could be subtyped into three clusters based on severity. PMID:27385844

  16. Spatial dependency of V. cholera prevalence on open space refuse dumps in Kumasi, Ghana: a spatial statistical modelling

    PubMed Central

    Osei, Frank B; Duker, Alfred A

    2008-01-01

    Background Cholera has persisted in Ghana since its introduction in the early 70's. From 1999 to 2005, the Ghana Ministry of Health officially reported a total of 26,924 cases and 620 deaths to the WHO. Etiological studies suggest that the natural habitat of V. cholera is the aquatic environment. Its ability to survive within and outside the aquatic environment makes cholera a complex health problem to manage. Once the disease is introduced in a population, several environmental factors may lead to prolonged transmission and secondary cases. An important environmental factor that predisposes individuals to cholera infection is sanitation. In this study, we exploit the importance of two main spatial measures of sanitation in cholera transmission in an urban city, Kumasi. These are proximity and density of refuse dumps within a community. Results A spatial statistical modelling carried out to determine the spatial dependency of cholera prevalence on refuse dumps show that, there is a direct spatial relationship between cholera prevalence and density of refuse dumps, and an inverse spatial relationship between cholera prevalence and distance to refuse dumps. A spatial scan statistics also identified four significant spatial clusters of cholera; a primary cluster with greater than expected cholera prevalence, and three secondary clusters with lower than expected cholera prevalence. A GIS based buffer analysis also showed that the minimum distance within which refuse dumps should not be sited within community centres is 500 m. Conclusion The results suggest that proximity and density of open space refuse dumps play a contributory role in cholera infection in Kumasi. PMID:19087235

  17. The Mucciardi-Gose Clustering Algorithm and Its Applications in Automatic Pattern Recognition.

    DTIC Science & Technology

    A procedure known as the Mucciardi- Gose clustering algorithm, CLUSTR, for determining the geometrical or statistical relationships among groups of N...discussion of clustering algorithms is given; the particular advantages of the Mucciardi- Gose procedure are described. The mathematical basis for, and the

  18. Spatial spread of dengue in a non-endemic tropical city in northern Argentina.

    PubMed

    Gil, José F; Palacios, Maximiliano; Krolewiecki, Alejandro J; Cortada, Pedro; Flores, Rosana; Jaime, Cesar; Arias, Luis; Villalpando, Carlos; Alberti DÁmato, Anahí M; Nasser, Julio R; Aparicio, Juan P

    2016-06-01

    After more than eighty years dengue reemerged in Argentina in 1997. Since then, the largest epidemic in terms of geographical extent, magnitude and mortality, was recorded in 2009. In this report we analyzed the DEN-1 epidemic spread in Orán, a mid-size city in a non-endemic tropical area in Northern Argentina, and its correlation with demographic and socioeconomic factors. Cases were diagnosed by ELISA between January and June 2009. We applied a space-time and spatial scan statistic under a Poisson model. Possible association between dengue incidence and socio-economic variables was studied with the Spearman correlation test. The epidemic started from an imported case from Bolivia and space-time analysis detected two clusters: one on February and other in April (in the south and the northeast of the city respectively) with risk ratios of 25.24 and 4.07 (p<0.01). Subsequent cases spread widely around the city without significant space-temporal clustering. Maximum values of the entomological indices were observed in January, at the beginning of the epidemic (B=21.96; LH=8.39). No statistically significant association between socioeconomic variables and dengue incidence was found but positive correlation between population size and the number of cases (p<0.05) was detected. Two mechanisms may explain the observed pattern of epidemic spread in this non-endemic tropical city: a) Short range dispersal of mosquitoes and people generates clusters of cases and b) long-distance (within the city) human movement contributes to a quasi-random distribution of cases. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Cluster analysis of water-quality data for Lake Sakakawea, Audubon Lake, and McClusky Canal, central North Dakota, 1990-2003

    USGS Publications Warehouse

    Ryberg, Karen R.

    2006-01-01

    As a result of the Dakota Water Resources Act of 2000, the Bureau of Reclamation, U.S. Department of the Interior, identified eight water-supply alternatives (including a no-action alternative) to meet future water needs in portions of the Red River of the North (Red River) Basin. Of those alternatives, four include the interbasin transfer of water from the Missouri River Basin to the Red River Basin. Three of the interbasin transfer alternatives would use the McClusky Canal, located in central North Dakota, to transport the water. Therefore, the water quality of the McClusky Canal and the sources of its water, Lake Sakakawea and Audubon Lake, is of interest to water-quality stakeholders. The Bureau of Reclamation collected water-quality samples at 23 sites on Lake Sakakawea, Audubon Lake, and the McClusky Canal system from 1990 through 2003. Physical properties and water-quality constituents from these samples were summarized and analyzed by the U.S. Geological Survey using hierarchical agglomerative cluster analysis (HACA). HACA separated the samples into related clusters, or groups. These groups were examined for statistical significance and relation to structure of the McClusky Canal system. Statistically, the sample groupings found using HACA were significantly different from each other and appear to result from spatial and temporal water-quality differences corresponding with different sections of the canal and different operational conditions. Future operational changes of the canal system may justify additional water-quality sampling to characterize possible water-quality changes.

  20. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected Via the Sunyaev-Zel'dovich Effect

    NASA Technical Reports Server (NTRS)

    Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; hide

    2010-01-01

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zel'dovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148 GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives (sigma)8 = 0.851 +/- 0.115 and w = -1.14 +/- 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find (sigma)8 + 0.821 +/- 0.044 and w = -1.05 +/- 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernova which give (sigma)8 = 0.802 +/- 0.038 and w = -0.98 +/- 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  1. A flexible data-driven comorbidity feature extraction framework.

    PubMed

    Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

    2016-06-01

    Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Geospatial Distribution and Clustering of Chlamydia trachomatis in Communities Undergoing Mass Azithromycin Treatment

    PubMed Central

    Yohannan, Jithin; He, Bing; Wang, Jiangxia; Greene, Gregory; Schein, Yvette; Mkocha, Harran; Munoz, Beatriz; Quinn, Thomas C.; Gaydos, Charlotte; West, Sheila K.

    2014-01-01

    Purpose. We detected spatial clustering of households with Chlamydia trachomatis infection (CI) and active trachoma (AT) in villages undergoing mass treatment with azithromycin (MDA) over time. Methods. We obtained global positioning system (GPS) coordinates for all households in four villages in Kongwa District, Tanzania. Every 6 months for a period of 42 months, our team examined all children under 10 for AT, and tested for CI with ocular swabbing and Amplicor. Villages underwent four rounds of annual MDA. We classified households as having ≥1 child with CI (or AT) or having 0 children with CI (or AT). We calculated the difference in the K function between households with and without CI or AT to detect clustering at each time point. Results. Between 918 and 991 households were included over the 42 months of this analysis. At baseline, 306 households (32.59%) had ≥1 child with CI, which declined to 73 households (7.50%) at 42 months. We observed borderline clustering of households with CI at 12 months after one round of MDA and statistically significant clustering with growing cluster sizes between 18 and 24 months after two rounds of MDA. Clusters diminished in size at 30 months after 3 rounds of MDA. Active trachoma did not cluster at any time point. Conclusions. This study demonstrates that CI clusters after multiple rounds of MDA. Clusters of infection may increase in size if the annual antibiotic pressure is removed. The absence of growth after the three rounds suggests the start of control of transmission. PMID:24906862

  3. Seasonality and synchrony of reproduction in three species of nectarivorous Philippines bats.

    PubMed

    Heideman, Paul D; Utzurrum, Ruth C B

    2003-11-21

    Differences among species and among years in reproductive seasonality (the tendency for clusters of events to fall at approximately the same point in each year) and synchrony (amount of clustering of events within a year) have been intensively studied in bats, but are difficult to assess. Here, we use randomization methods with circular statistics to test for synchrony and seasonality of reproduction in three species of nectarivorous megachiropteran bats on Negros Island in the central Philippines. In Rousettus amplexicaudatus, estimated dates of birth were both highly synchronous and highly seasonal. In Macroglossus minimus, estimated births were seasonal and significantly clustered within years, but within each year births occurred over a broad period, indicating a low level of synchrony. In Eonycteris spelaea, estimated births were also seasonal and had statistically significant synchrony, with birth periods within years intermediate in synchrony between R. amplexicaudatus and M. minimus. All three species had a similar seasonal pattern, with two birth periods in each year, centered on March or April and August or September. In one species, R. amplexicaudatus, primigravid females (in their first pregnancy) produced their young in June and July, a birth period significantly different in timing from the two birth periods of older adult females. This more conservative pattern of young females may allow higher survival of parents and offspring at cost of a lost reproductive opportunity. There was weak evidence that in some years primigravid females of M. minimus might differ in timing from older adults. There were few significant differences in reproductive timing among different years, and those differences were generally less than two weeks, even during a severe drought in the severe el Niño of 1983. The results suggest that these species follow an obligately seasonal pattern of reproductive timing with very little phenotypic plasticity. The resampling methods were sensitive to differences in timing of under two weeks, in some cases, suggesting that these are useful methods for analyses of seasonality in wild populations of bats.

  4. Female labour force participation and suicide rates in the world.

    PubMed

    Chen, Ying-Yeh; Chen, Mengni; Lui, Carrie S M; Yip, Paul S F

    2017-12-01

    The current study aims to illustrate male to female suicide rate ratios in the world and explore the correlations between female labour force participation rates (FLPR) and suicide rates of both genders. Further, whether the relationship of FLPR and suicide rates vary according to the human capabilities of a given country are examined. Using suicide data obtained from the World Health Organization Statistical Information System, suicide gender ratios of 70 countries are illustrated. Based on the level of Human Development Index (HDI) and FLPR, the Bayesian Information Criteria (BIC) was used to determine the optimal number of clusters of those countries. Graphic illustrations of FLPR and gender-specific suicide rates, stratified by each cluster were presented, and Pearson's correlation coefficients calculated. Three clusters are identified, there was no correlation between FLPR and suicide rates in the first cluster where both the HDI and FLPR were the highest (Male: r = 0.29, P = 0.45; Female: r = 0.01, P = 0.97); whereas in Cluster 2, higher level of FLPR corresponded to lower suicide rates in both genders, although the statistical significance was only found in females (Male: r = -0.32, P = 0.15; Female: r = -0.48, P = 0.03). In Cluster 3 countries where HDI/FLPR were relatively lower, increased FLPR was associated with higher suicide rates for both genders (Male: r = 0.32, P = 0.04; Female: r = 0.32, P = 0.05). The relationship between egalitarian gender norms and suicide rates varies according to national context. A greater egalitarian gender norms may benefit both genders, but more so for women in countries equipped with better human capabilities. Although the beneficial effect may reach a plateau in countries with the highest HDI/FLPR; whereas in countries with relatively lower HDI/FLPR, increased FLPR were associated with higher suicide rates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Seizure clustering.

    PubMed

    Haut, Sheryl R

    2006-02-01

    Seizure clusters, also known as repetitive or serial seizures, occur commonly in epilepsy. Clustering implies that the occurrence of one seizure may influence the probability of a subsequent seizure; thus, the investigation of the clustering phenomenon yields insights into both specific mechanisms of seizure clustering and more general concepts of seizure occurrence. Seizure clustering has been defined clinically as a number of seizures per unit time and, statistically, as a deviation from a random distribution, or interseizure interval dependence. This review explores the pathophysiology, epidemiology, and clinical implications of clustering, as well as other periodic patterns of seizure occurrence. Risk factors for experiencing clusters and potential precipitants of clustering are also addressed.

  6. A quantitative study of the clustering of polycyclic aromatic hydrocarbons at high temperatures.

    PubMed

    Totton, Tim S; Misquitta, Alston J; Kraft, Markus

    2012-03-28

    The clustering of polycyclic aromatic hydrocarbon (PAH) molecules is investigated in the context of soot particle inception and growth using an isotropic potential developed from the benchmark PAHAP potential. This potential is used to estimate equilibrium constants of dimerisation for five representative PAH molecules based on a statistical mechanics model. Molecular dynamics simulations are also performed to study the clustering of homomolecular systems at a range of temperatures. The results from both sets of calculations demonstrate that at flame temperatures pyrene (C(16)H(10)) dimerisation cannot be a key step in soot particle formation and that much larger molecules (e.g. circumcoronene, C(54)H(18)) are required to form small clusters at flame temperatures. The importance of using accurate descriptions of the intermolecular interactions is demonstrated by comparing results to those calculated with a popular literature potential with an order of magnitude variation in the level of clustering observed. By using an accurate intermolecular potential we are able to show that physical binding of PAH molecules based on van der Waals interactions alone can only be a viable soot inception mechanism if concentrations of large PAH molecules are significantly higher than currently thought.

  7. Clustering Analysis of Antibiograms and Antibiogram Types of Streptococcus agalactiae Strains from Tilapia in China.

    PubMed

    Liu, Chan; Feng, Juan; Zhang, Defeng; Xie, Yundan; Li, Anxing; Wang, Jiangyong; Su, Youlu

    2018-05-11

    In view of the changing antibiotic-resistance profiles of Streptococcus agalactiae from tilapia in China, antimicrobial susceptibilities of 75 S. agalactiae strains were determined by the disc diffusion method, and cluster analyses of the antibiograms and antibiogram types were performed. All strains displayed multidrug resistance (MDR). The antimicrobial-resistance rates were highest (>90%) to aminoglycosides, sulfonamides, pipemidic acid, and norfloxacin, followed by penicillin, ampicillin, and ciprofloxacin (26.7-38.7%); those to furadantin, lincomycin, erythromycin, ofloxacin, tetracycline, and florfenicol were low (<10%), and no resistance to vancomycin, cefalexin, cefoxitin, amoxicillin, medemycin, doxitard, oxytetracycline, rifampin, chloramphenicol, or thiamphenicol was detected. Statistical analysis showed that the resistance rate to ciprofloxacin increased significantly in 2016 (p = 0.009), whereas that to trimethoprim/sulfamethoxazole decreased (p = 0.017). Cluster analyses identified that the strains had 23 antibiogram types (A-W) and clustered in five groups (Groups I-V). The strains with higher antimicrobial resistance mainly clustered in Groups I and II. Our results show that the antibiograms varied with time and by location and that antibiogram types are constantly updating and expanding. Effective measures must be taken to reduce the antimicrobial resistance and spread of MDR strains.

  8. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).

  9. Spatial Analysis of Dengue Seroprevalence and Modeling of Transmission Risk Factors in a Dengue Hyperendemic City of Venezuela

    PubMed Central

    Vincenti-Gonzalez, Maria F.; Grillet, María-Eugenia; Velasco-Salas, Zoraida I.; Lizarazo, Erley F.; Amarista, Manuel A.; Sierra, Gloria M.; Comach, Guillermo

    2017-01-01

    Background Dengue virus (DENV) transmission is spatially heterogeneous. Hence, to stratify dengue prevalence in space may be an efficacious strategy to target surveillance and control efforts in a cost-effective manner particularly in Venezuela where dengue is hyperendemic and public health resources are scarce. Here, we determine hot spots of dengue seroprevalence and the risk factors associated with these clusters using local spatial statistics and a regression modeling approach. Methodology/Principal Findings From August 2010 to January 2011, a community-based cross-sectional study of 2012 individuals in 840 households was performed in high incidence neighborhoods of a dengue hyperendemic city in Venezuela. Local spatial statistics conducted at household- and block-level identified clusters of recent dengue seroprevalence (39 hot spot households and 9 hot spot blocks) in all neighborhoods. However, no clusters were found for past dengue seroprevalence. Clustering of infection was detected at a very small scale (20-110m) suggesting a high disease focal aggregation. Factors associated with living in a hot spot household were occupation (being a domestic worker/housewife (P = 0.002), lower socio-economic status (living in a shack (P<0.001), sharing a household with <7 people (P = 0.004), promoting potential vector breeding sites (storing water in containers (P = 0.024), having litter outdoors (P = 0.002) and mosquito preventive measures (such as using repellent, P = 0.011). Similarly, low socio-economic status (living in crowded conditions, P<0.001), having an occupation of domestic worker/housewife (P = 0.012) and not using certain preventive measures against mosquitoes (P<0.05) were directly associated with living in a hot spot block. Conclusions/Significance Our findings contribute to a better comprehension of the spatial dynamics of dengue by assessing the relationship between disease clusters and their risk factors. These results can inform health authorities in the design of surveillance and control activities. Focalizing dengue control measures during epidemic and inter-epidemic periods to disease high risk zones at household and neighborhood-level may significantly reduce virus transmission in comparison to random interventions. PMID:28114342

  10. Classification of frailty using the Kihon checklist: A cluster analysis of older adults in urban areas.

    PubMed

    Kera, Takeshi; Kawai, Hisashi; Yoshida, Hideyo; Hirano, Hirohiko; Kojima, Motonaga; Fujiwara, Yoshinori; Ihara, Kazushige; Obuchi, Shuichi

    2017-01-01

    Frailty is an important predictor of the need for long-term care and hospitalization. Our aim was to categorize frailty in community-dwelling older adults. The present study was carried out in 2011-2013, and consisted of 1380 individuals over 65 years of age. Participants completed the Kihon checklist, which is widely used to assess frailty in Japan, and their physical, cognitive and social function was evaluated. Non-hierarchical cluster analysis was used to statistically categorize frailty. The optimum number of clusters was determined as the point at which the external reference values (instrumental activity of daily living score, grip power, 10-m walk time, body mass index, portable fall risk index, occlusal force and Mini-Mental State Examination score) differed. According to the Kihon checklist, 369 (26.7%) of the 1380 study participants were considered frail. When the cluster number was increased from two to six, the scores in each subdomain of the Kihon checklist significantly differed. The estimated minimum number of clusters was five, and each of the five cluster groups had distinct characteristics. The numbers of participants in cluster groups 1-5 were 105, 78, 62, 71 and 53, respectively. We identified five types of frailty in community-dwelling older adults in Japan: "experience of falling," "pre-frailty," "oral frailty," "housebound" and "severe frailty." Geriatr Gerontol Int 2017; 17: 69-77. © 2016 Japan Geriatrics Society.

  11. Residential cancer cluster investigation nearby a Superfund Study Area with trichloroethylene contamination.

    PubMed

    Press, David J; McKinley, Meg; Deapen, Dennis; Clarke, Christina A; Gomez, Scarlett Lin

    2016-05-01

    Trichloroethylene (TCE) is an industrial solvent associated with liver cancer, kidney cancer, and non-Hodgkin's lymphoma (NHL). It is unclear whether an excess of TCE-associated cancers have occurred surrounding the Middlefield-Ellis-Whisman Superfund site in Mountain View, California. We conducted a population-based cancer cluster investigation comparing the incidence of NHL, liver, and kidney cancers in the neighborhood of interest to the incidence among residents in the surrounding four-county region. Case counts and address information were obtained using routinely collected data from the Greater Bay Area Cancer Registry, part of the Surveillance, Epidemiology, and End Results program. Population denominators were obtained from the 1990, 2000, and 2010 US censuses. Standardized incidence ratios (SIRs) with two-sided 99 % confidence intervals (CIs) were calculated for time intervals surrounding the US Censuses. There were no statistically significant differences between the neighborhood of interest and the larger region for cancers of the liver or kidney. A statistically significant elevation was observed for NHL during one of the three time periods evaluated (1996-2005: SIR = 1.8, 99 % CI 1.1-2.8). No statistically significant NHL elevation existed in the earlier 1988-1995 (SIR = 1.3, 99 % CI 0.5-2.6) or later 2006-2011 (SIR = 1.3, 99 % CI 0.6-2.4) periods. There is no evidence of an increased incidence of liver or kidney cancer, and there is a lack of evidence of a consistent, sustained, or more recent elevation in NHL occurrence in this neighborhood. This evaluation included existing cancer registry data, which cannot speak to specific exposures incurred by past or current residents of this neighborhood.

  12. Residential cancer cluster investigation nearby a Superfund Study Area with trichloroethylene contamination

    PubMed Central

    McKinley, Meg; Deapen, Dennis; Clarke, Christina A.; Gomez, Scarlett Lin

    2017-01-01

    Purpose Trichloroethylene (TCE) is an industrial solvent associated with liver cancer, kidney cancer, and non- Hodgkin’s lymphoma (NHL). It is unclear whether an excess of TCE-associated cancers have occurred surrounding the Middlefield–Ellis–Whisman Superfund site in Mountain View, California. We conducted a population- based cancer cluster investigation comparing the incidence of NHL, liver, and kidney cancers in the neighborhood of interest to the incidence among residents in the surrounding four-county region. Methods Case counts and address information were obtained using routinely collected data from the Greater Bay Area Cancer Registry, part of the Surveillance, Epidemiology, and End Results program. Population denominators were obtained from the 1990, 2000, and 2010 US censuses. Standardized incidence ratios (SIRs) with two- sided 99 % confidence intervals (CIs) were calculated for time intervals surrounding the US Censuses. Results There were no statistically significant differences between the neighborhood of interest and the larger region for cancers of the liver or kidney. A statistically significant elevation was observed for NHL during one of the three time periods evaluated (1996–2005: SIR = 1.8, 99 % CI 1.1–2.8). No statistically significant NHL elevation existed in the earlier 1988–1995 (SIR = 1.3, 99 % CI 0.5–2.6) or later 2006–2011 (SIR = 1.3, 99 % CI 0.6–2.4) periods. Conclusion There is no evidence of an increased incidence of liver or kidney cancer, and there is a lack of evidence of a consistent, sustained, or more recent elevation in NHL occurrence in this neighborhood. This evaluation included existing cancer registry data, which cannot speak to specific exposures incurred by past or current residents of this neighborhood. PMID:26983615

  13. Microparticles (CD146) and Arterial Stiffness Versus Carotid Intima Media Thickness as an Early Predictors of Vascular Affection in Systemic Lupus Patients.

    PubMed

    Nassef, Sahar; El Guindey, Hala; Fawzy, Mary; Nasser, Amal; Reffai, Rasha; Shemiy, Doa

    2016-03-01

    This study aims to evaluate cluster of differentiation 146 (CD146) and pulse wave velocity (PWV) as non-invasive methods for prediction of early vascular affection in systemic lupus erythematosus (SLE) patients without symptoms of vascular disease, to detect the outcome and reproducibility of these methods, and to correlate CD146 and PWV with lipid profile, intima media thickness (IMT), and ankle brachial index. Thirty female SLE patients (mean age 26.6±6.6 years; range 15 to 35 years) fulfilling the American College of Rheumatology 1997 revised criteria for SLE classification, and 15 age and sex matched healthy controls were included. All participants were performed full clinical assessments including measurement of Systemic Lupus Erythematosus Disease Activity Index, lipid profile, CD146, carotid IMT, PWV, and rise time as an indication of how fast the waveform rises. Cluster of differentiation 146 levels were elevated in patients with SLE compared to controls (p<0.001). There was a statistically significant difference between patients and controls in the femoral, lower thigh, and ankle rise time. There was a statistically significant correlation between IMT and ages of patients, Systemic Lupus Erythematosus Disease Activity Index, and brachial-below knee PWV, while there was no correlation between IMT and disease duration, lipid profile, brachial-femoral PWV, and brachial-ankle PWV. There was statistically significant correlations between brachial-femoral PWV and serum cholesterol level, and between brachial-ankle PWV and low density lipoprotein cholesterol. Our results showed that SLE vascular affection is more pronounced in small arteries. Also, elevated CD146 and brachial-femoral PWV are useful early markers of vascular affection in SLE as well as rise time may be a marker for arterial stiffness.

  14. Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

    PubMed

    Lukashin, A V; Fuchs, R

    2001-05-01

    Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.

  15. Type-Ia Supernova Rates to Redshift 2.4 from Clash: The Cluster Lensing and Supernova Survey with Hubble

    NASA Technical Reports Server (NTRS)

    Graur, O.; Rodney, S. A.; Maoz, D.; Riess, A. G.; Jha, S. W.; Postman, M.; Dahlen, T.; Holoien, T. W.-S.; McCully, C.; Patel, B.; hide

    2014-01-01

    We present the supernova (SN) sample and Type-Ia SN (SN Ia) rates from the Cluster Lensing And Supernova survey with Hubble (CLASH). Using the Advanced Camera for Surveys and the Wide Field Camera 3 on the Hubble Space Telescope (HST), we have imaged 25 galaxy-cluster fields and parallel fields of non-cluster galaxies. We report a sample of 27 SNe discovered in the parallel fields. Of these SNe, approximately 13 are classified as SN Ia candidates, including four SN Ia candidates at redshifts z greater than 1.2.We measure volumetric SN Ia rates to redshift 1.8 and add the first upper limit on the SN Ia rate in the range z greater than 1.8 and less than 2.4. The results are consistent with the rates measured by the HST/ GOODS and Subaru Deep Field SN surveys.We model these results together with previous measurements at z less than 1 from the literature. The best-fitting SN Ia delay-time distribution (DTD; the distribution of times that elapse between a short burst of star formation and subsequent SN Ia explosions) is a power law with an index of 1.00 (+0.06(0.09))/(-0.06(0.10)) (statistical) (+0.12/-0.08) (systematic), where the statistical uncertainty is a result of the 68% and 95% (in parentheses) statistical uncertainties reported for the various SN Ia rates (from this work and from the literature), and the systematic uncertainty reflects the range of possible cosmic star-formation histories. We also test DTD models produced by an assortment of published binary population synthesis (BPS) simulations. The shapes of all BPS double-degenerate DTDs are consistent with the volumetric SN Ia measurements, when the DTD models are scaled up by factors of 3-9. In contrast, all BPS single-degenerate DTDs are ruled out by the measurements at greater than 99% significance level.

  16. Examining the Effectiveness of Discriminant Function Analysis and Cluster Analysis in Species Identification of Male Field Crickets Based on Their Calling Songs

    PubMed Central

    Jaiswara, Ranjana; Nandi, Diptarup; Balakrishnan, Rohini

    2013-01-01

    Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA) and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6–7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species classification and identification. PMID:24086666

  17. Type-Ia supernova rates to redshift 2.4 from clash: The cluster lensing and supernova survey with Hubble

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Graur, O.; Rodney, S. A.; Riess, A. G.

    2014-03-01

    We present the supernova (SN) sample and Type-Ia SN (SN Ia) rates from the Cluster Lensing And Supernova survey with Hubble (CLASH). Using the Advanced Camera for Surveys and the Wide Field Camera 3 on the Hubble Space Telescope (HST), we have imaged 25 galaxy-cluster fields and parallel fields of non-cluster galaxies. We report a sample of 27 SNe discovered in the parallel fields. Of these SNe, ∼13 are classified as SN Ia candidates, including four SN Ia candidates at redshifts z > 1.2. We measure volumetric SN Ia rates to redshift 1.8 and add the first upper limit onmore » the SN Ia rate in the range 1.8 < z < 2.4. The results are consistent with the rates measured by the HST/GOODS and Subaru Deep Field SN surveys. We model these results together with previous measurements at z < 1 from the literature. The best-fitting SN Ia delay-time distribution (DTD; the distribution of times that elapse between a short burst of star formation and subsequent SN Ia explosions) is a power law with an index of −1.00{sub −0.06(0.10)}{sup +0.06(0.09)} (statistical){sub −0.08}{sup +0.12} (systematic), where the statistical uncertainty is a result of the 68% and 95% (in parentheses) statistical uncertainties reported for the various SN Ia rates (from this work and from the literature), and the systematic uncertainty reflects the range of possible cosmic star-formation histories. We also test DTD models produced by an assortment of published binary population synthesis (BPS) simulations. The shapes of all BPS double-degenerate DTDs are consistent with the volumetric SN Ia measurements, when the DTD models are scaled up by factors of 3-9. In contrast, all BPS single-degenerate DTDs are ruled out by the measurements at >99% significance level.« less

  18. Circulation Clusters--An Empirical Approach to Decentralization of Academic Libraries.

    ERIC Educational Resources Information Center

    McGrath, William E.

    1986-01-01

    Discusses the issue of centralization or decentralization of academic library collections, and describes a statistical analysis of book circulation at the University of Southwestern Louisiana that yielded subject area clusters as a compromise solution to the problem. Applications of the cluster model for all types of library catalogs are…

  19. Method of identifying clusters representing statistical dependencies in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.

  20. Cluster Analysis of Minnesota School Districts. A Research Report.

    ERIC Educational Resources Information Center

    Cleary, James

    The term "cluster analysis" refers to a set of statistical methods that classify entities with similar profiles of scores on a number of measured dimensions, in order to create empirically based typologies. A 1980 Minnesota House Research Report employed cluster analysis to categorize school districts according to their relative mixtures…

  1. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    PubMed

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.

  2. Classification of Forefoot Plantar Pressure Distribution in Persons with Diabetes: A Novel Perspective for the Mechanical Management of Diabetic Foot?

    PubMed Central

    Deschamps, Kevin; Matricali, Giovanni Arnoldo; Roosen, Philip; Desloovere, Kaat; Bruyninckx, Herman; Spaepen, Pieter; Nobels, Frank; Tits, Jos; Flour, Mieke; Staes, Filip

    2013-01-01

    Background The aim of this study was to identify groups of subjects with similar patterns of forefoot loading and verify if specific groups of patients with diabetes could be isolated from non-diabetics. Methodology/Principal Findings Ninety-seven patients with diabetes and 33 control participants between 45 and 70 years were prospectively recruited in two Belgian Diabetic Foot Clinics. Barefoot plantar pressure measurements were recorded and subsequently analysed using a semi-automatic total mapping technique. Kmeans cluster analysis was applied on relative regional impulses of six forefoot segments in order to pursue a classification for the control group separately, the diabetic group separately and both groups together. Cluster analysis led to identification of three distinct groups when considering only the control group. For the diabetic group, and the computation considering both groups together, four distinct groups were isolated. Compared to the cluster analysis of the control group an additional forefoot loading pattern was identified. This group comprised diabetic feet only. The relevance of the reported clusters was supported by ANOVA statistics indicating significant differences between different regions of interest and different clusters. Conclusion/s Significance There seems to emerge a new era in diabetic foot medicine which embraces the classification of diabetic patients according to their biomechanical profile. Classification of the plantar pressure distribution has the potential to provide a means to determine mechanical interventions for the prevention and/or treatment of the diabetic foot. PMID:24278219

  3. A Fast Implementation of the ISOCLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

    2003-01-01

    Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.

  4. Universal partitioning of the hierarchical fold network of 50-residue segments in proteins

    PubMed Central

    Ito, Jun-ichi; Sonobe, Yuki; Ikeda, Kazuyoshi; Tomii, Kentaro; Higo, Junichi

    2009-01-01

    Background Several studies have demonstrated that protein fold space is structured hierarchically and that power-law statistics are satisfied in relation between the numbers of protein families and protein folds (or superfamilies). We examined the internal structure and statistics in the fold space of 50 amino-acid residue segments taken from various protein folds. We used inter-residue contact patterns to measure the tertiary structural similarity among segments. Using this similarity measure, the segments were classified into a number (Kc) of clusters. We examined various Kc values for the clustering. The special resolution to differentiate the segment tertiary structures increases with increasing Kc. Furthermore, we constructed networks by linking structurally similar clusters. Results The network was partitioned persistently into four regions for Kc ≥ 1000. This main partitioning is consistent with results of earlier studies, where similar partitioning was reported in classifying protein domain structures. Furthermore, the network was partitioned naturally into several dozens of sub-networks (i.e., communities). Therefore, intra-sub-network clusters were mutually connected with numerous links, although inter-sub-network ones were rarely done with few links. For Kc ≥ 1000, the major sub-networks were about 40; the contents of the major sub-networks were conserved. This sub-partitioning is a novel finding, suggesting that the network is structured hierarchically: Segments construct a cluster, clusters form a sub-network, and sub-networks constitute a region. Additionally, the network was characterized by non-power-law statistics, which is also a novel finding. Conclusion Main findings are: (1) The universe of 50 residue segments found here was characterized by non-power-law statistics. Therefore, the universe differs from those ever reported for the protein domains. (2) The 50-residue segments were partitioned persistently and universally into some dozens (ca. 40) of major sub-networks, irrespective of the number of clusters. (3) These major sub-networks encompassed 90% of all segments. Consequently, the protein tertiary structure is constructed using the dozens of elements (sub-networks). PMID:19454039

  5. Event Networks and the Identification of Crime Pattern Motifs

    PubMed Central

    2015-01-01

    In this paper we demonstrate the use of network analysis to characterise patterns of clustering in spatio-temporal events. Such clustering is of both theoretical and practical importance in the study of crime, and forms the basis for a number of preventative strategies. However, existing analytical methods show only that clustering is present in data, while offering little insight into the nature of the patterns present. Here, we show how the classification of pairs of events as close in space and time can be used to define a network, thereby generalising previous approaches. The application of graph-theoretic techniques to these networks can then offer significantly deeper insight into the structure of the data than previously possible. In particular, we focus on the identification of network motifs, which have clear interpretation in terms of spatio-temporal behaviour. Statistical analysis is complicated by the nature of the underlying data, and we provide a method by which appropriate randomised graphs can be generated. Two datasets are used as case studies: maritime piracy at the global scale, and residential burglary in an urban area. In both cases, the same significant 3-vertex motif is found; this result suggests that incidents tend to occur not just in pairs, but in fact in larger groups within a restricted spatio-temporal domain. In the 4-vertex case, different motifs are found to be significant in each case, suggesting that this technique is capable of discriminating between clustering patterns at a finer granularity than previously possible. PMID:26605544

  6. General Practice Clinical Data Help Identify Dementia Hotspots: A Novel Geospatial Analysis Approach.

    PubMed

    Bagheri, Nasser; Wangdi, Kinley; Cherbuin, Nicolas; Anstey, Kaarin J

    2018-01-01

    We have a poor understanding of whether dementia clusters geographically, how this occurs, and how dementia may relate to socio-demographic factors. To shed light on these important questions, this study aimed to compute a dementia risk score for individuals to assess spatial variation of dementia risk, identify significant clusters (hotspots), and explore their association with socioeconomic status. We used clinical records from 16 general practices (468 Statistical Area level 1 s, N = 14,746) from the city of west Adelaide, Australia for the duration of 1 January 2012 to 31 December 2014. Dementia risk was estimated using The Australian National University-Alzheimer's Disease Risk Index. Hotspot analyses were applied to examine potential clusters in dementia risk at small area level. Significant hotspots were observed in eastern and southern areas while coldspots were observed in the western area within the study perimeter. Additionally, significant hotspots were observed in low socio-economic communities. We found dementia risk scores increased with age, sex (female), high cholesterol, no physical activity, living alone (widow, divorced, separated, or never married), and co-morbidities such as diabetes and depression. Similarly, smoking was associated with a lower dementia risk score. The identification of dementia risk clusters may provide insight into possible geographical variations in risk factors for dementia and quantify these risks at the community level. As such, this research may enable policy makers to tailor early prevention strategies to the correct individuals within their precise locations.

  7. An investigation of war trauma types, symptom clusters, and risk-factors associated with post-traumatic stress disorder: where does gender fit?

    PubMed

    Farhood, Laila; Fares, Souha; Hamady, Carmen

    2018-05-25

    The female-male ratio in the prevalence of post-traumatic stress disorder (PTSD) is approximately 2:1. Gender differences in experienced trauma types, PTSD symptom clusters, and PTSD risk factors are unclear. We aimed to address this gap using a cross-sectional design. A sample of 991 civilians (522 women, 469 men) from South Lebanon was randomly selected in 2007, after the 2006 war. Trauma types were grouped into disaster and accident, loss, chronic disease, non-malignant disease, and violence. PTSD symptom clusters involved re-experiencing, avoidance, negative cognitions and mood, and arousal. These were assessed using parts I and IV of the Arabic version of the Harvard Trauma Questionnaire (HTQ). Risk factors were assessed using data from a social support and life events questionnaire in multiple regression models. Females were twice as likely as males to score above PTSD threshold (24.3 vs. 10.4%, p ˂ 0.001). Total scores on all trauma types were similar across genders. Females scored higher on all symptom clusters (p < 0.001). Social support, social life events, witnessed traumas, and domestic violence significantly were associated with PTSD in both genders. Social support, social life events, witnessed traumas and domestic violence were significantly associated with PTSD in both genders. Conversely, gender difference in experienced traumas was not statistically significant. These findings accentuate the need to re-consider the role of gender in the assessment and treatment of PTSD.

  8. Comparison between volatility return intervals of the S&P 500 index and two common models

    NASA Astrophysics Data System (ADS)

    Vodenska-Chitkushev, I.; Wang, F. Z.; Weber, P.; Yamasaki, K.; Havlin, S.; Stanley, H. E.

    2008-01-01

    We analyze the S&P 500 index data for the 13-year period, from January 1, 1984 to December 31, 1996, with one data point every 10 min. For this database, we study the distribution and clustering of volatility return intervals, which are defined as the time intervals between successive volatilities above a certain threshold q. We find that the long memory in the volatility leads to a clustering of above-median as well as below-median return intervals. In addition, it turns out that the short return intervals form larger clusters compared to the long return intervals. When comparing the empirical results to the ARMA-FIGARCH and fBm models for volatility, we find that the fBm model predicts scaling better than the ARMA-FIGARCH model, which is consistent with the argument that both ARMA-FIGARCH and fBm capture the long-term dependence in return intervals to a certain extent, but only fBm accounts for the scaling. We perform the Student's t-test to compare the empirical data with the shuffled records, ARMA-FIGARCH and fBm. We analyze separately the clusters of above-median return intervals and the clusters of below-median return intervals for different thresholds q. We find that the empirical data are statistically different from the shuffled data for all thresholds q. Our results also suggest that the ARMA-FIGARCH model is statistically different from the S&P 500 for intermediate q for both above-median and below-median clusters, while fBm is statistically different from S&P 500 for small and large q for above-median clusters and for small q for below-median clusters. Neither model can fully explain the entire regime of q studied.

  9. Degree-based statistic and center persistency for brain connectivity analysis.

    PubMed

    Yoo, Kwangsun; Lee, Peter; Chung, Moo K; Sohn, William S; Chung, Sun Ju; Na, Duk L; Ju, Daheen; Jeong, Yong

    2017-01-01

    Brain connectivity analyses have been widely performed to investigate the organization and functioning of the brain, or to observe changes in neurological or psychiatric conditions. However, connectivity analysis inevitably introduces the problem of mass-univariate hypothesis testing. Although, several cluster-wise correction methods have been suggested to address this problem and shown to provide high sensitivity, these approaches fundamentally have two drawbacks: the lack of spatial specificity (localization power) and the arbitrariness of an initial cluster-forming threshold. In this study, we propose a novel method, degree-based statistic (DBS), performing cluster-wise inference. DBS is designed to overcome the above-mentioned two shortcomings. From a network perspective, a few brain regions are of critical importance and considered to play pivotal roles in network integration. Regarding this notion, DBS defines a cluster as a set of edges of which one ending node is shared. This definition enables the efficient detection of clusters and their center nodes. Furthermore, a new measure of a cluster, center persistency (CP) was introduced. The efficiency of DBS with a known "ground truth" simulation was demonstrated. Then they applied DBS to two experimental datasets and showed that DBS successfully detects the persistent clusters. In conclusion, by adopting a graph theoretical concept of degrees and borrowing the concept of persistence from algebraic topology, DBS could sensitively identify clusters with centric nodes that would play pivotal roles in an effect of interest. DBS is potentially widely applicable to variable cognitive or clinical situations and allows us to obtain statistically reliable and easily interpretable results. Hum Brain Mapp 38:165-181, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  10. Pupal productivity in rainy and dry seasons: findings from the impact survey of a randomised controlled trial of dengue prevention in Guerrero, Mexico.

    PubMed

    Jiménez-Alejo, Abel; Morales-Pérez, Arcadio; Nava-Aguilera, Elizabeth; Flores-Moreno, Miguel; Apreza-Aguilar, Sinahí; Carranza-Alcaraz, Wilhelm; Cortés-Guzmán, Antonio Juan; Fernández-Salas, Ildefonso; Ledogar, Robert J; Cockcroft, Anne; Andersson, Neil

    2017-05-30

    The follow-up survey of a cluster-randomised controlled trial of evidence-based community mobilisation for dengue control in Nicaragua and Mexico included entomological information from the 2012 rainy and dry seasons. We used data from the Mexican arm of the trial to assess the impact of the community action on pupal production of the dengue vector Aedes aegypti in both rainy and dry seasons. Trained field workers inspected household water containers in 90 clusters and collected any pupae or larvae present for entomological examination. We calculated indices of pupae per person and pupae per household, and traditional entomological indices of container index, household index and Breteau index, and compared these between rainy and dry seasons and between intervention and control clusters, using a cluster t-test to test significance of differences. In 11,933 houses in the rainy season, we inspected 40,323 containers and found 7070 Aedes aegypti pupae. In the dry season, we inspected 43,461 containers and counted 6552 pupae. All pupae and entomological indices were lower in the intervention clusters (IC) than in control clusters (CC) in both the rainy season (RS) and the dry season (DS): pupae per container 0.12 IC and 0.24 CC in RS, and 0.10 IC and 0.20 CC in DS; pupae per household 0.46 IC and 0.82 CC in RS, and 0.41 IC and 0.83 CC in DS; pupae per person 0.11 IC and 0.19 CC in RS, and 0.10 IC and 0.20 CC in DS; household index 16% IC and 21% CC in RS, and 12.1% IC and 17.9% CC in DS; container index 7.5% IC and 11.5% CC in RS, and 4.6% IC and 7.1% CC in DS; Breteau index 27% IC and 36% CC in RS, and 19% IC and 29% CC in DS. All differences between the intervention and control clusters were statistically significant, taking into account clustering. The trial intervention led to significant decreases in pupal and conventional entomological indices in both rainy and dry seasons. ISRCTN27581154 .

  11. White Matter Tract Integrity in Alzheimer's Disease vs. Late Onset Bipolar Disorder and Its Correlation with Systemic Inflammation and Oxidative Stress Biomarkers.

    PubMed

    Besga, Ariadna; Chyzhyk, Darya; Gonzalez-Ortega, Itxaso; Echeveste, Jon; Graña-Lecuona, Marina; Graña, Manuel; Gonzalez-Pinto, Ana

    2017-01-01

    Background: Late Onset Bipolar Disorder (LOBD) is the development of Bipolar Disorder (BD) at an age above 50 years old. It is often difficult to differentiate from other aging dementias, such as Alzheimer's Disease (AD), because they share cognitive and behavioral impairment symptoms. Objectives: We look for WM tract voxel clusters showing significant differences when comparing of AD vs. LOBD, and its correlations with systemic blood plasma biomarkers (inflammatory, neurotrophic factors, and oxidative stress). Materials: A sample of healthy controls (HC) ( n = 19), AD patients ( n = 35), and LOBD patients ( n = 24) was recruited at the Alava University Hospital. Blood plasma samples were obtained at recruitment time and analyzed to extract the inflammatory, oxidative stress, and neurotrophic factors. Several modalities of MRI were acquired for each subject, Methods: Fractional anisotropy (FA) coefficients are obtained from diffusion weighted imaging (DWI). Tract based spatial statistics (TBSS) finds FA skeleton clusters of WM tract voxels showing significant differences for all possible contrasts between HC, AD, and LOBD. An ANOVA F -test over all contrasts is carried out. Results of F -test are used to mask TBSS detected clusters for the AD > LOBD and LOBD > AD contrast to select the image clusters used for correlation analysis. Finally, Pearson's correlation coefficients between FA values at cluster sites and systemic blood plasma biomarker values are computed. Results: The TBSS contrasts with by ANOVA F -test has identified strongly significant clusters in the forceps minor, inferior longitudinal fasciculus, inferior fronto-occipital fasciculus, and cingulum gyrus. The correlation analysis of these tract clusters found strong negative correlation of AD with the nerve growth factor (NGF) and brain derived neurotrophic factor (BDNF) blood biomarkers. Negative correlation of AD and positive correlation of LOBD with inflammation biomarker IL6 was also found. Conclusion: TBSS voxel clusters tract atlas localizations are consistent with greater behavioral impairment and mood disorders in LOBD than in AD. Correlation analysis confirms that neurotrophic factors (i.e., NGF, BDNF) play a great role in AD while are absent in LOBD pathophysiology. Also, correlation results of IL1 and IL6 suggest stronger inflammatory effects in LOBD than in AD.

  12. Could the clinical interpretability of subgroups detected using clustering methods be improved by using a novel two-stage approach?

    PubMed

    Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice

    2015-01-01

    Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.

  13. Indications of Intermediate-scale Anisotropy of Cosmic Rays with Energy Greater Than 57 EeV in the Northern Sky Measured with the Surface Detector of the Telescope Array Experiment

    NASA Astrophysics Data System (ADS)

    Abbasi, R. U.; Abe, M.; Abu-Zayyad, T.; Allen, M.; Anderson, R.; Azuma, R.; Barcikowski, E.; Belz, J. W.; Bergman, D. R.; Blake, S. A.; Cady, R.; Chae, M. J.; Cheon, B. G.; Chiba, J.; Chikawa, M.; Cho, W. R.; Fujii, T.; Fukushima, M.; Goto, T.; Hanlon, W.; Hayashi, Y.; Hayashida, N.; Hibino, K.; Honda, K.; Ikeda, D.; Inoue, N.; Ishii, T.; Ishimori, R.; Ito, H.; Ivanov, D.; Jui, C. C. H.; Kadota, K.; Kakimoto, F.; Kalashev, O.; Kasahara, K.; Kawai, H.; Kawakami, S.; Kawana, S.; Kawata, K.; Kido, E.; Kim, H. B.; Kim, J. H.; Kim, J. H.; Kitamura, S.; Kitamura, Y.; Kuzmin, V.; Kwon, Y. J.; Lan, J.; Lim, S. I.; Lundquist, J. P.; Machida, K.; Martens, K.; Matsuda, T.; Matsuyama, T.; Matthews, J. N.; Minamino, M.; Mukai, K.; Myers, I.; Nagasawa, K.; Nagataki, S.; Nakamura, T.; Nonaka, T.; Nozato, A.; Ogio, S.; Ogura, J.; Ohnishi, M.; Ohoka, H.; Oki, K.; Okuda, T.; Ono, M.; Oshima, A.; Ozawa, S.; Park, I. H.; Pshirkov, M. S.; Rodriguez, D. C.; Rubtsov, G.; Ryu, D.; Sagawa, H.; Sakurai, N.; Sampson, A. L.; Scott, L. M.; Shah, P. D.; Shibata, F.; Shibata, T.; Shimodaira, H.; Shin, B. K.; Smith, J. D.; Sokolsky, P.; Springer, R. W.; Stokes, B. T.; Stratton, S. R.; Stroman, T. A.; Suzawa, T.; Takamura, M.; Takeda, M.; Takeishi, R.; Taketa, A.; Takita, M.; Tameda, Y.; Tanaka, H.; Tanaka, K.; Tanaka, M.; Thomas, S. B.; Thomson, G. B.; Tinyakov, P.; Tkachev, I.; Tokuno, H.; Tomida, T.; Troitsky, S.; Tsunesada, Y.; Tsutsumi, K.; Uchihori, Y.; Udo, S.; Urban, F.; Vasiloff, G.; Wong, T.; Yamane, R.; Yamaoka, H.; Yamazaki, K.; Yang, J.; Yashiro, K.; Yoneda, Y.; Yoshida, S.; Yoshii, H.; Zollinger, R.; Zundel, Z.

    2014-08-01

    We have searched for intermediate-scale anisotropy in the arrival directions of ultrahigh-energy cosmic rays with energies above 57 EeV in the northern sky using data collected over a 5 yr period by the surface detector of the Telescope Array experiment. We report on a cluster of events that we call the hotspot, found by oversampling using 20° radius circles. The hotspot has a Li-Ma statistical significance of 5.1σ, and is centered at R.A. = 146.°7, decl. = 43.°2. The position of the hotspot is about 19° off of the supergalactic plane. The probability of a cluster of events of 5.1σ significance, appearing by chance in an isotropic cosmic-ray sky, is estimated to be 3.7 × 10-4 (3.4σ).

  14. Alerts in electronic medical records to promote a colorectal cancer screening programme: a cluster randomised controlled trial in primary care

    PubMed Central

    Guiriguet, Carolina; Muñoz-Ortiz, Laura; Burón, Andrea; Rivero, Irene; Grau, Jaume; Vela-Vallespín, Carmen; Vilarrubí, Mercedes; Torres, Miquel; Hernández, Cristina; Méndez-Boo, Leonardo; Toràn, Pere; Caballeria, Llorenç; Macià, Francesc; Castells, Antoni

    2016-01-01

    Background Participation rates in colorectal cancer screening are below recommended European targets. Aim To evaluate the effectiveness of an alert in primary care electronic medical records (EMRs) to increase individuals’ participation in an organised, population-based colorectal cancer screening programme when compared with usual care. Design and setting Cluster randomised controlled trial in primary care centres of Barcelona, Spain. Method Participants were males and females aged 50–69 years, who were invited to the first round of a screening programme based on the faecal immunochemical test (FIT) (n = 41 042), and their primary care professional. The randomisation unit was the physician cluster (n = 130) and patients were blinded to the study group. The control group followed usual care as per the colorectal cancer screening programme. In the intervention group, as well as usual care, an alert to health professionals (cluster level) to promote screening was introduced in the individual’s primary care EMR for 1 year. The main outcome was colorectal cancer screening participation at individual participant level. Results In total, 67 physicians and 21 619 patients (intervention group) and 63 physicians and 19 423 patients (control group) were randomised. In the intention-to-treat analysis screening participation was 44.1% and 42.2% respectively (odds ratio 1.08, 95% confidence interval [CI] = 0.97 to 1.20, P = 0.146). However, in the per-protocol analysis screening uptake in the intervention group showed a statistically significant increase, after adjusting for potential confounders (OR, 1.11; 95% CI = 1.02 to 1.22; P = 0.018). Conclusion The use of an alert in an individual’s primary care EMR is associated with a statistically significant increased uptake of an organised, FIT-based colorectal cancer screening programme in patients attending primary care centres. PMID:27266861

  15. [Cluster analysis applicability to fitness evaluation of cosmonauts on long-term missions of the International space station].

    PubMed

    Egorov, A D; Stepantsov, V I; Nosovskiĭ, A M; Shipov, A A

    2009-01-01

    Cluster analysis was applied to evaluate locomotion training (running and running intermingled with walking) of 13 cosmonauts on long-term ISS missions by the parameters of duration (min), distance (m) and intensity (km/h). Based on the results of analyses, the cosmonauts were distributed into three steady groups of 2, 5 and 6 persons. Distance and speed showed a statistical rise (p < 0.03) from group 1 to group 3. Duration of physical locomotion training was not statistically different in the groups (p = 0.125). Therefore, cluster analysis is an adequate method of evaluating fitness of cosmonauts on long-term missions.

  16. Discovery of a young asteroid cluster associated with P/2012 F5 (Gibbs)

    NASA Astrophysics Data System (ADS)

    Novaković, Bojan; Hsieh, Henry H.; Cellino, Alberto; Micheli, Marco; Pedani, Marco

    2014-03-01

    We present the results of our search for a dynamical family around the active Asteroid P/2012 F5 (Gibbs). By applying the hierarchical clustering method, we discover an extremely compact 9-body cluster associated with P/2012 F5. The statistical significance of this newly discovered Gibbs cluster is estimated to be >99.9%, strongly suggesting that its members share a common origin. The cluster is located in a dynamically cold region of the outer main-belt at a proper semi-major axis of ∼3.005 AU, and all members are found to be dynamically stable over very long timescales. Backward numerical orbital integrations show that the age of the cluster is only 1.5 ± 0.1 Myr. Taxonomic classifications are unavailable for most of the cluster members, but SDSS spectrophotometry available for two cluster members indicate that both appear to be Q-type objects. We also estimate a lower limit of the size of the parent body to be about 10 km, and find that the impact event which produced the Gibbs cluster is intermediate between a cratering and a catastrophic collision. In addition, we search for new main-belt comets in the region of the Gibbs cluster by observing seven asteroids either belonging to the cluster, or being very close in the space of orbital proper elements. However, we do not detect any convincing evidence of the presence of a tail or coma in any our targets. Finally, we obtain optical images of P/2012 F5, and find absolute R-band and V-band magnitudes of HR = 17.0 ± 0.1 mag and HV = 17.4 ± 0.1 mag, respectively, corresponding to an upper limit on the diameter of the P/2012 F5 nucleus of ∼2 km.

  17. Pore-scale interfacial dynamics during gas-supersaturated water injection in porous media - on nucleation, growth and advection of disconnected fluid phases (Invited)

    NASA Astrophysics Data System (ADS)

    Or, D.; Ioannidis, M.

    2010-12-01

    Degassing and in situ development of a mobile gas bubbles occur when injecting supersaturated aqueous phase into water-saturated porous media. Supersaturated water injection (SWI) has potentially significant applications in remediation of soils contaminated by non-aqueous phase liquids and in enhanced oil recovery. Pore network simulations indicate the formation of a region near the injection boundary where gas phase nuclei are activated and grow by mass transfer from the flowing supersaturated aqueous phase. Ramified clusters of gas-filled pores develop which, owing to the low prevailing Bond number, grow laterally to a significant extent prior to the onset of mobilization, and are thus likely to coalesce. Gas cluster mobilization invariably results in fragmentation and stranding, such that a macroscopic region containing few tenuously connected large gas clusters is established. Beyond this region, gas phase nucleation and mass transfer from the aqueous phase are limited by diminishing supply of dissolved gas. New insights into SWI dynamics are obtained using rapid micro-visualization in transparent glass micromodels. Using high-speed imaging, we observe the nucleation, initial growth and subsequent fate (mobilization, fragmentation, collision, coalescence and stranding) of CO2 bubbles and clusters of gas-filled pores and analyze cluster population statistics. We find significant support for the development of invasion-percolation-like patterns, but also report on hitherto unaccounted for gas bubble behavior. Additionally, we report for the first time on the acoustic emission signature of SWI in porous media and relate it to the dynamics of bubble nucleation and growth. Finally, we identify the pore-scale mechanisms associated with the mobilization and subsequent recovery of a residual non-aqueous phase liquid due to gas bubble dynamics during SWI.

  18. Complex regional pain syndrome: evidence for warm and cold subtypes in a large prospective clinical sample.

    PubMed

    Bruehl, Stephen; Maihöfner, Christian; Stanton-Hicks, Michael; Perez, Roberto S G M; Vatine, Jean-Jacques; Brunner, Florian; Birklein, Frank; Schlereth, Tanja; Mackey, Sean; Mailis-Gagnon, Angela; Livshitz, Anatoly; Harden, R Norman

    2016-08-01

    Limited research suggests that there may be Warm complex regional pain syndrome (CRPS) and Cold CRPS subtypes, with inflammatory mechanisms contributing most strongly to the former. This study for the first time used an unbiased statistical pattern recognition technique to evaluate whether distinct Warm vs Cold CRPS subtypes can be discerned in the clinical population. An international, multisite study was conducted using standardized procedures to evaluate signs and symptoms in 152 patients with clinical CRPS at baseline, with 3-month follow-up evaluations in 112 of these patients. Two-step cluster analysis using automated cluster selection identified a 2-cluster solution as optimal. Results revealed a Warm CRPS patient cluster characterized by a warm, red, edematous, and sweaty extremity and a Cold CRPS patient cluster characterized by a cold, blue, and less edematous extremity. Median pain duration was significantly (P < 0.001) shorter in the Warm CRPS (4.7 months) than in the Cold CRPS subtype (20 months), with pain intensity comparable. A derived total inflammatory score was significantly (P < 0.001) elevated in the Warm CRPS group (compared with Cold CRPS) at baseline but diminished significantly (P < 0.001) over the follow-up period, whereas this score did not diminish in the Cold CRPS group (time × subtype interaction: P < 0.001). Results support the existence of a Warm CRPS subtype common in patients with acute (<6 months) CRPS and a relatively distinct Cold CRPS subtype most common in chronic CRPS. The pattern of clinical features suggests that inflammatory mechanisms contribute most prominently to the Warm CRPS subtype but that these mechanisms diminish substantially during the first year postinjury.

  19. Cluster-randomized trial of infant nutrition training for caries prevention.

    PubMed

    Chaffee, B W; Feldens, C A; Vítolo, M R

    2013-07-01

    The objective of this study was to estimate the caries impact of providing training in infant feeding guidelines to workers at Brazilian public primary care clinics. In a cluster-randomized controlled trial (n = 20 clinics), health care workers either were trained in guidelines for infant nutrition, stressing healthful complementary feeding, or were assigned to a 'usual practices' control, which allowed for maternal counseling at practitioner discretion. Training occurred once; the amount of counseling provided to mothers was not assessed. Eligible pregnant women were enrolled to follow health outcomes in their children. Early childhood caries (ECC) was measured at age three years (n = 458 children). The overall reductions in ECC (relative risk, 0.92; 95%CI, 0.75, 1.12) and severe ECC (RR, 0.87; 95%CI, 0.64, 1.19) were not statistically significant. There was a protective effect among mothers who remained exclusively at the same health center (S-ECC RR, 0.68; 95%CI, 0.47, 0.99) and among those naming the health center as their principal source of feeding advice (S-ECC RR, 0.53; 95%CI, 0.29, 0.97). Health care worker training did not yield a statistically significant reduction in caries overall, although caries was reduced among children of mothers more connected to their health centers.

  20. Cluster detection methods applied to the Upper Cape Cod cancer data.

    PubMed

    Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann

    2005-09-15

    A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

Top