Sample records for cluster sampling methodology

  1. Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

    DTIC Science & Technology

    2017-07-01

    principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

  2. Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials

    PubMed Central

    Martin, James; Taljaard, Monica; Girling, Alan; Hemming, Karla

    2016-01-01

    Background Stepped-wedge cluster randomised trials (SW-CRT) are increasingly being used in health policy and services research, but unless they are conducted and reported to the highest methodological standards, they are unlikely to be useful to decision-makers. Sample size calculations for these designs require allowance for clustering, time effects and repeated measures. Methods We carried out a methodological review of SW-CRTs up to October 2014. We assessed adherence to reporting each of the 9 sample size calculation items recommended in the 2012 extension of the CONSORT statement to cluster trials. Results We identified 32 completed trials and 28 independent protocols published between 1987 and 2014. Of these, 45 (75%) reported a sample size calculation, with a median of 5.0 (IQR 2.5–6.0) of the 9 CONSORT items reported. Of those that reported a sample size calculation, the majority, 33 (73%), allowed for clustering, but just 15 (33%) allowed for time effects. There was a small increase in the proportions reporting a sample size calculation (from 64% before to 84% after publication of the CONSORT extension, p=0.07). The type of design (cohort or cross-sectional) was not reported clearly in the majority of studies, but cohort designs seemed to be most prevalent. Sample size calculations in cohort designs were particularly poor with only 3 out of 24 (13%) of these studies allowing for repeated measures. Discussion The quality of reporting of sample size items in stepped-wedge trials is suboptimal. There is an urgent need for dissemination of the appropriate guidelines for reporting and methodological development to match the proliferation of the use of this design in practice. Time effects and repeated measures should be considered in all SW-CRT power calculations, and there should be clarity in reporting trials as cohort or cross-sectional designs. PMID:26846897

  3. Methods for sample size determination in cluster randomized trials

    PubMed Central

    Rutterford, Clare; Copas, Andrew; Eldridge, Sandra

    2015-01-01

    Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515

  4. MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

    PubMed

    Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2014-01-01

    A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients.

    PubMed

    Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid

    2015-01-01

    In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.

  6. Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys

    PubMed Central

    Hund, Lauren; Bedrick, Edward J.; Pagano, Marcello

    2015-01-01

    Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis. PMID:26125967

  7. Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys.

    PubMed

    Hund, Lauren; Bedrick, Edward J; Pagano, Marcello

    2015-01-01

    Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis.

  8. Cluster Stability Estimation Based on a Minimal Spanning Trees Approach

    NASA Astrophysics Data System (ADS)

    Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora

    2009-08-01

    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.

  9. Eye-gaze determination of user intent at the computer interface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldberg, J.H.; Schryver, J.C.

    1993-12-31

    Determination of user intent at the computer interface through eye-gaze monitoring can significantly aid applications for the disabled, as well as telerobotics and process control interfaces. Whereas current eye-gaze control applications are limited to object selection and x/y gazepoint tracking, a methodology was developed here to discriminate a more abstract interface operation: zooming-in or out. This methodology first collects samples of eve-gaze location looking at controlled stimuli, at 30 Hz, just prior to a user`s decision to zoom. The sample is broken into data frames, or temporal snapshots. Within a data frame, all spatial samples are connected into a minimummore » spanning tree, then clustered, according to user defined parameters. Each cluster is mapped to one in the prior data frame, and statistics are computed from each cluster. These characteristics include cluster size, position, and pupil size. A multiple discriminant analysis uses these statistics both within and between data frames to formulate optimal rules for assigning the observations into zooming, zoom-out, or no zoom conditions. The statistical procedure effectively generates heuristics for future assignments, based upon these variables. Future work will enhance the accuracy and precision of the modeling technique, and will empirically test users in controlled experiments.« less

  10. The XMM Cluster Survey: X-ray analysis methodology

    NASA Astrophysics Data System (ADS)

    Lloyd-Davies, E. J.; Romer, A. Kathy; Mehrtens, Nicola; Hosmer, Mark; Davidson, Michael; Sabirli, Kivanc; Mann, Robert G.; Hilton, Matt; Liddle, Andrew R.; Viana, Pedro T. P.; Campbell, Heather C.; Collins, Chris A.; Dubois, E. Naomi; Freeman, Peter; Harrison, Craig D.; Hoyle, Ben; Kay, Scott T.; Kuwertz, Emma; Miller, Christopher J.; Nichol, Robert C.; Sahlén, Martin; Stanford, S. A.; Stott, John P.

    2011-11-01

    The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters using all publicly available data in the XMM-Newton Science Archive. Its main aims are to measure cosmological parameters and trace the evolution of X-ray scaling relations. In this paper we describe the data processing methodology applied to the 5776 XMM observations used to construct the current XCS source catalogue. A total of 3675 > 4σ cluster candidates with >50 background-subtracted X-ray counts are extracted from a total non-overlapping area suitable for cluster searching of 410 deg2. Of these, 993 candidates are detected with >300 background-subtracted X-ray photon counts, and we demonstrate that robust temperature measurements can be obtained down to this count limit. We describe in detail the automated pipelines used to perform the spectral and surface brightness fitting for these candidates, as well as to estimate redshifts from the X-ray data alone. A total of 587 (122) X-ray temperatures to a typical accuracy of <40 (<10) per cent have been measured to date. We also present the methodology adopted for determining the selection function of the survey, and show that the extended source detection algorithm is robust to a range of cluster morphologies by inserting mock clusters derived from hydrodynamical simulations into real XMMimages. These tests show that the simple isothermal β-profiles is sufficient to capture the essential details of the cluster population detected in the archival XMM observations. The redshift follow-up of the XCS cluster sample is presented in a companion paper, together with a first data release of 503 optically confirmed clusters.

  11. Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review.

    PubMed

    Rutterford, Clare; Taljaard, Monica; Dixon, Stephanie; Copas, Andrew; Eldridge, Sandra

    2015-06-01

    To assess the quality of reporting and accuracy of a priori estimates used in sample size calculations for cluster randomized trials (CRTs). We reviewed 300 CRTs published between 2000 and 2008. The prevalence of reporting sample size elements from the 2004 CONSORT recommendations was evaluated and a priori estimates compared with those observed in the trial. Of the 300 trials, 166 (55%) reported a sample size calculation. Only 36 of 166 (22%) reported all recommended descriptive elements. Elements specific to CRTs were the worst reported: a measure of within-cluster correlation was specified in only 58 of 166 (35%). Only 18 of 166 articles (11%) reported both a priori and observed within-cluster correlation values. Except in two cases, observed within-cluster correlation values were either close to or less than a priori values. Even with the CONSORT extension for cluster randomization, the reporting of sample size elements specific to these trials remains below that necessary for transparent reporting. Journal editors and peer reviewers should implement stricter requirements for authors to follow CONSORT recommendations. Authors should report observed and a priori within-cluster correlation values to enable comparisons between these over a wider range of trials. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  12. The Hubble Space Telescope Medium Deep Survey Cluster Sample: Methodology and Data

    NASA Astrophysics Data System (ADS)

    Ostrander, E. J.; Nichol, R. C.; Ratnatunga, K. U.; Griffiths, R. E.

    1998-12-01

    We present a new, objectively selected, sample of galaxy overdensities detected in the Hubble Space Telescope Medium Deep Survey (MDS). These clusters/groups were found using an automated procedure that involved searching for statistically significant galaxy overdensities. The contrast of the clusters against the field galaxy population is increased when morphological data are used to search around bulge-dominated galaxies. In total, we present 92 overdensities above a probability threshold of 99.5%. We show, via extensive Monte Carlo simulations, that at least 60% of these overdensities are likely to be real clusters and groups and not random line-of-sight superpositions of galaxies. For each overdensity in the MDS cluster sample, we provide a richness and the average of the bulge-to-total ratio of galaxies within each system. This MDS cluster sample potentially contains some of the most distant clusters/groups ever detected, with about 25% of the overdensities having estimated redshifts z > ~0.9. We have made this sample publicly available to facilitate spectroscopic confirmation of these clusters and help more detailed studies of cluster and galaxy evolution. We also report the serendipitous discovery of a new cluster close on the sky to the rich optical cluster Cl l0016+16 at z = 0.546. This new overdensity, HST 001831+16208, may be coincident with both an X-ray source and a radio source. HST 001831+16208 is the third cluster/group discovered near to Cl 0016+16 and appears to strengthen the claims of Connolly et al. of superclustering at high redshift.

  13. What Constitutes Adoption of the Web: A Methodological Problem in Assessing Adoption of the World Wide Web for Electronic Commerce.

    ERIC Educational Resources Information Center

    White, Marilyn Domas; Abels, Eileen G.; Gordon-Murnane, Laura

    1998-01-01

    Reports on methodological developments in a project to assess the adoption of the Web by publishers of business information for electronic commerce. Describes the approach used on a sample of 20 business publishers to identify five clusters of publishers ranging from traditionalist to innovator. Distinguishes between adopters and nonadopters of…

  14. A hierarchical clustering methodology for the estimation of toxicity.

    PubMed

    Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M

    2008-01-01

    ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.

  15. An imbalance in cluster sizes does not lead to notable loss of power in cross-sectional, stepped-wedge cluster randomised trials with a continuous outcome.

    PubMed

    Kristunas, Caroline A; Smith, Karen L; Gray, Laura J

    2017-03-07

    The current methodology for sample size calculations for stepped-wedge cluster randomised trials (SW-CRTs) is based on the assumption of equal cluster sizes. However, as is often the case in cluster randomised trials (CRTs), the clusters in SW-CRTs are likely to vary in size, which in other designs of CRT leads to a reduction in power. The effect of an imbalance in cluster size on the power of SW-CRTs has not previously been reported, nor what an appropriate adjustment to the sample size calculation should be to allow for any imbalance. We aimed to assess the impact of an imbalance in cluster size on the power of a cross-sectional SW-CRT and recommend a method for calculating the sample size of a SW-CRT when there is an imbalance in cluster size. The effect of varying degrees of imbalance in cluster size on the power of SW-CRTs was investigated using simulations. The sample size was calculated using both the standard method and two proposed adjusted design effects (DEs), based on those suggested for CRTs with unequal cluster sizes. The data were analysed using generalised estimating equations with an exchangeable correlation matrix and robust standard errors. An imbalance in cluster size was not found to have a notable effect on the power of SW-CRTs. The two proposed adjusted DEs resulted in trials that were generally considerably over-powered. We recommend that the standard method of sample size calculation for SW-CRTs be used, provided that the assumptions of the method hold. However, it would be beneficial to investigate, through simulation, what effect the maximum likely amount of inequality in cluster sizes would be on the power of the trial and whether any inflation of the sample size would be required.

  16. On the Distribution of Orbital Poles of Milky Way Satellites

    NASA Astrophysics Data System (ADS)

    Palma, Christopher; Majewski, Steven R.; Johnston, Kathryn V.

    2002-01-01

    In numerous studies of the outer Galactic halo some evidence for accretion has been found. If the outer halo did form in part or wholly through merger events, we might expect to find coherent streams of stars and globular clusters following orbits similar to those of their parent objects, which are assumed to be present or former Milky Way dwarf satellite galaxies. We present a study of this phenomenon by assessing the likelihood of potential descendant ``dynamical families'' in the outer halo. We conduct two analyses: one that involves a statistical analysis of the spatial distribution of all known Galactic dwarf satellite galaxies (DSGs) and globular clusters, and a second, more specific analysis of those globular clusters and DSGs for which full phase space dynamical data exist. In both cases our methodology is appropriate only to members of descendant dynamical families that retain nearly aligned orbital poles today. Since the Sagittarius dwarf (Sgr) is considered a paradigm for the type of merger/tidal interaction event for which we are searching, we also undertake a case study of the Sgr system and identify several globular clusters that may be members of its extended dynamical family. In our first analysis, the distribution of possible orbital poles for the entire sample of outer (Rgc>8 kpc) halo globular clusters is tested for statistically significant associations among globular clusters and DSGs. Our methodology for identifying possible associations is similar to that used by Lynden-Bell & Lynden-Bell, but we put the associations on a more statistical foundation. Moreover, we study the degree of possible dynamical clustering among various interesting ensembles of globular clusters and satellite galaxies. Among the ensembles studied, we find the globular cluster subpopulation with the highest statistical likelihood of association with one or more of the Galactic DSGs to be the distant, outer halo (Rgc>25 kpc), second-parameter globular clusters. The results of our orbital pole analysis are supported by the great circle cell count methodology of Johnston, Hernquist, & Bolte. The space motions of the clusters Pal 4, NGC 6229, NGC 7006, and Pyxis are predicted to be among those most likely to show the clusters to be following stream orbits, since these clusters are responsible for the majority of the statistical significance of the association between outer halo, second-parameter globular clusters and the Milky Way DSGs. In our second analysis, we study the orbits of the 41 globular clusters and six Milky Way-bound DSGs having measured proper motions to look for objects with both coplanar orbits and similar angular momenta. Unfortunately, the majority of globular clusters with measured proper motions are inner halo clusters that are less likely to retain memory of their original orbit. Although four potential globular cluster/DSG associations are found, we believe three of these associations involving inner halo clusters to be coincidental. While the present sample of objects with complete dynamical data is small and does not include many of the globular clusters that are more likely to have been captured by the Milky Way, the methodology we adopt will become increasingly powerful as more proper motions are measured for distant Galactic satellites and globular clusters, and especially as results from the Space Interferometry Mission (SIM) become available.

  17. Regional health care planning: a methodology to cluster facilities using community utilization patterns

    PubMed Central

    2013-01-01

    Background Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state’s Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. Methods The clustering methodology employs a 2-step K-means + Ward’s clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Results Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Conclusions Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units. PMID:23964905

  18. Regional health care planning: a methodology to cluster facilities using community utilization patterns.

    PubMed

    Delamater, Paul L; Shortridge, Ashton M; Messina, Joseph P

    2013-08-22

    Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state's Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. The clustering methodology employs a 2-step K-means + Ward's clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units.

  19. Comparing the performance of cluster random sampling and integrated threshold mapping for targeting trachoma control, using computer simulation.

    PubMed

    Smith, Jennifer L; Sturrock, Hugh J W; Olives, Casey; Solomon, Anthony W; Brooker, Simon J

    2013-01-01

    Implementation of trachoma control strategies requires reliable district-level estimates of trachomatous inflammation-follicular (TF), generally collected using the recommended gold-standard cluster randomized surveys (CRS). Integrated Threshold Mapping (ITM) has been proposed as an integrated and cost-effective means of rapidly surveying trachoma in order to classify districts according to treatment thresholds. ITM differs from CRS in a number of important ways, including the use of a school-based sampling platform for children aged 1-9 and a different age distribution of participants. This study uses computerised sampling simulations to compare the performance of these survey designs and evaluate the impact of varying key parameters. Realistic pseudo gold standard data for 100 districts were generated that maintained the relative risk of disease between important sub-groups and incorporated empirical estimates of disease clustering at the household, village and district level. To simulate the different sampling approaches, 20 clusters were selected from each district, with individuals sampled according to the protocol for ITM and CRS. Results showed that ITM generally under-estimated the true prevalence of TF over a range of epidemiological settings and introduced more district misclassification according to treatment thresholds than did CRS. However, the extent of underestimation and resulting misclassification was found to be dependent on three main factors: (i) the district prevalence of TF; (ii) the relative risk of TF between enrolled and non-enrolled children within clusters; and (iii) the enrollment rate in schools. Although in some contexts the two methodologies may be equivalent, ITM can introduce a bias-dependent shift as prevalence of TF increases, resulting in a greater risk of misclassification around treatment thresholds. In addition to strengthening the evidence base around choice of trachoma survey methodologies, this study illustrates the use of a simulated approach in addressing operational research questions for trachoma but also other NTDs.

  20. Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects

    PubMed Central

    Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

    2017-01-01

    An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies. PMID:28584874

  1. Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.

    PubMed

    Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

    2017-01-01

    An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carmichael, Joshua Daniel; Carr, Christina; Pettit, Erin C.

    We apply a fully autonomous icequake detection methodology to a single day of high-sample rate (200 Hz) seismic network data recorded from the terminus of Taylor Glacier, ANT that temporally coincided with a brine release episode near Blood Falls (May 13, 2014). We demonstrate a statistically validated procedure to assemble waveforms triggered by icequakes into populations of clusters linked by intra-event waveform similarity. Our processing methodology implements a noise-adaptive power detector coupled with a complete-linkage clustering algorithm and noise-adaptive correlation detector. This detector-chain reveals a population of 20 multiplet sequences that includes ~150 icequakes and produces zero false alarms onmore » the concurrent, diurnally variable noise. Our results are very promising for identifying changes in background seismicity associated with the presence or absence of brine release episodes. We thereby suggest that our methodology could be applied to longer time periods to establish a brine-release monitoring program for Blood Falls that is based on icequake detections.« less

  3. A Hierarchical Clustering Methodology for the Estimation of Toxicity

    EPA Science Inventory

    A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...

  4. Construcción de un catálogo de cúmulos de galaxias en proceso de colisión

    NASA Astrophysics Data System (ADS)

    de los Ríos, M.; Domínguez, M. J.; Paz, D.

    2015-08-01

    In this work we present first results of the identification of colliding galaxy clusters in galaxy catalogs with redshift measurements (SDSS, 2DF), and introduce the methodology. We calibrated a method by studying the merger trees of clusters in a mock catalog based on a full-blown semi-analytic model of galaxy formation on top of the Millenium cosmological simulation. We also discuss future actions for studding our sample of colliding galaxy clusters, including x-ray observations and mass reconstruction obtained by using weak gravitational lenses.

  5. Irregular Breakfast Eating and Associated Health Behaviors: A Pilot Study among College Students

    ERIC Educational Resources Information Center

    Thiagarajah, Krisha; Torabi, Mohammad R.

    2009-01-01

    The purpose of this study was to examine prevalence of eating breakfast and associated health compromising behaviors. This study utilized a cross-sectional survey methodology. A purposive cluster sampling technique was utilized to collect data from a representative sample of college students in a Midwestern university in the U.S. A total of 1,257…

  6. A flexible data-driven comorbidity feature extraction framework.

    PubMed

    Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

    2016-06-01

    Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Increase in iodine deficiency disorder due to inadequate sustainability of supply of iodized salt in District Solan, Himachal Pradesh.

    PubMed

    Kapil, Umesh; Pandey, R M; Jain, Vandana; Kabra, Madhulika; Sareen, Neha; Bhadoria, Ajeet Singh; Vijay, Jyoti; Nigam, Sukirty; Khenduja, Preetika

    2013-12-01

    Himachal Pradesh is a known endemic area for iodine deficiency disorders. A study was conducted in district Solan with the objective of assessing the prevalence of iodine deficiency disorders in school-age children. Thirty clusters were selected by using the probability-proportionate-to-size cluster sampling methodology. Clinical examination of the thyroid of 1898 children in the age-group of 6-12 years was conducted. Urine and salt samples were collected. The total goiter rate was found to be 15.4%. Median urinary iodine excretion level was 62.5 μg/l. Only 39% of the salt samples had iodine content of ≥15 ppm. Mild iodine deficiency was present in the subjects studied.

  8. Sample size calculations for the design of cluster randomized trials: A summary of methodology.

    PubMed

    Gao, Fei; Earnest, Arul; Matchar, David B; Campbell, Michael J; Machin, David

    2015-05-01

    Cluster randomized trial designs are growing in popularity in, for example, cardiovascular medicine research and other clinical areas and parallel statistical developments concerned with the design and analysis of these trials have been stimulated. Nevertheless, reviews suggest that design issues associated with cluster randomized trials are often poorly appreciated and there remain inadequacies in, for example, describing how the trial size is determined and the associated results are presented. In this paper, our aim is to provide pragmatic guidance for researchers on the methods of calculating sample sizes. We focus attention on designs with the primary purpose of comparing two interventions with respect to continuous, binary, ordered categorical, incidence rate and time-to-event outcome variables. Issues of aggregate and non-aggregate cluster trials, adjustment for variation in cluster size and the effect size are detailed. The problem of establishing the anticipated magnitude of between- and within-cluster variation to enable planning values of the intra-cluster correlation coefficient and the coefficient of variation are also described. Illustrative examples of calculations of trial sizes for each endpoint type are included. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. A two-stage cluster sampling method using gridded population data, a GIS, and Google Earth(TM) imagery in a population-based mortality survey in Iraq.

    PubMed

    Galway, Lp; Bell, Nathaniel; Sae, Al Shatari; Hagopian, Amy; Burnham, Gilbert; Flaxman, Abraham; Weiss, Wiliam M; Rajaratnam, Julie; Takaro, Tim K

    2012-04-27

    Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.

  10. A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq

    PubMed Central

    2012-01-01

    Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings. PMID:22540266

  11. Further observations on comparison of immunization coverage by lot quality assurance sampling and 30 cluster sampling.

    PubMed

    Singh, J; Jain, D C; Sharma, R S; Verghese, T

    1996-06-01

    Lot Quality Assurance Sampling (LQAS) and standard EPI methodology (30 cluster sampling) were used to evaluate immunization coverage in a Primary Health Center (PHC) where coverage levels were reported to be more than 85%. Of 27 sub-centers (lots) evaluated by LQAS, only 2 were accepted for child coverage, whereas none was accepted for tetanus toxoid (TT) coverage in mothers. LQAS data were combined to obtain an estimate of coverage in the entire population; 41% (95% CI 36-46) infants were immunized appropriately for their ages, while 42% (95% CI 37-47) of their mothers had received a second/ booster dose of TT. TT coverage in 149 contemporary mothers sampled in EPI survey was also 42% (95% CI 31-52). Although results by the two sampling methods were consistent with each other, a big gap was evident between reported coverage (in children as well as mothers) and survey results. LQAS was found to be operationally feasible, but it cost 40% more and required 2.5 times more time than the EPI survey. LQAS therefore, is not a good substitute for current EPI methodology to evaluate immunization coverage in a large administrative area. However, LQAS has potential as method to monitor health programs on a routine basis in small population sub-units, especially in areas with high and heterogeneously distributed immunization coverage.

  12. The K-selected Butcher-Oemler Effect

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stanford, S A; De Propris, R; Dickinson, M

    2004-03-02

    We investigate the Butcher-Oemler effect using samples of galaxies brighter than observed frame K* + 1.5 in 33 clusters at 0.1 {approx}< z {approx}< 0.9. We attempt to duplicate as closely as possible the methodology of Butcher & Oemler. Apart from selecting in the K-band, the most important difference is that we use a brightness limit fixed at 1.5 magnitudes below an observed frame K* rather than the nominal limit of rest frame M(V ) = -20 used by Butcher & Oemler. For an early type galaxy at z = 0.1 our sample cutoff is 0.2 magnitudes brighter than restmore » frame M(V ) = -20, while at z = 0.9 our cutoff is 0.9 magnitudes brighter. If the blue galaxies tend to be faint, then the difference in magnitude limits should result in our measuring lower blue fractions. A more minor difference from the Butcher & Oemler methodology is that the area covered by our galaxy samples has a radius of 0.5 or 0.7 Mpc at all redshifts rather than R{sub 30}, the radius containing 30% of the cluster population. In practice our field sizes are generally similar to those used by Butcher & Oemler. We find the fraction of blue galaxies in our K-selected samples to be lower on average than that derived from several optically selected samples, and that it shows little trend with redshift. However, at the redshifts z < 0.6 where our sample overlaps with that of Butcher & Oemler, the difference in fB as determined from our K-selected samples and those of Butcher & Oemler is much reduced. The large scatter in the measured f{sub B}, even in small redshift ranges, in our study indicates that determining the f{sub B} for a much larger sample of clusters from K-selected galaxy samples is important. As a test of our methods, our data allow us to construct optically-selected samples down to rest frame M(V ) = -20, as used by Butcher & Oemler, for four clusters that are common between our sample and that of Butcher & Oemler. For these rest V selected samples, we find similar fractions of blue galaxies to Butcher & Oemler, while the K selected samples for the same 4 clusters yield blue fractions which are typically half as large. This comparison indicates that selecting in the K-band is the primary difference between our study and previous optically-based studies of the Butcher & Oemler effect. Selecting in the observed K-band is more nearly a process of selecting galaxies by their mass than is the case for optically-selected samples. Our results suggest that the Butcher-Oemler effect is at least partly due to low mass galaxies whose optical luminosities are boosted. These lower mass galaxies could evolve into the rich dwarf population observed in nearby clusters.« less

  13. A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.

    PubMed

    Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip

    2014-11-01

    This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.

  14. Cluster Sampling Bias in Government-Sponsored Evaluations: A Correlational Study of Employment and Welfare Pilots in England.

    PubMed

    Vaganay, Arnaud

    2016-01-01

    For pilot or experimental employment programme results to apply beyond their test bed, researchers must select 'clusters' (i.e. the job centres delivering the new intervention) that are reasonably representative of the whole territory. More specifically, this requirement must account for conditions that could artificially inflate the effect of a programme, such as the fluidity of the local labour market or the performance of the local job centre. Failure to achieve representativeness results in Cluster Sampling Bias (CSB). This paper makes three contributions to the literature. Theoretically, it approaches the notion of CSB as a human behaviour. It offers a comprehensive theory, whereby researchers with limited resources and conflicting priorities tend to oversample 'effect-enhancing' clusters when piloting a new intervention. Methodologically, it advocates for a 'narrow and deep' scope, as opposed to the 'wide and shallow' scope, which has prevailed so far. The PILOT-2 dataset was developed to test this idea. Empirically, it provides evidence on the prevalence of CSB. In conditions similar to the PILOT-2 case study, investigators (1) do not sample clusters with a view to maximise generalisability; (2) do not oversample 'effect-enhancing' clusters; (3) consistently oversample some clusters, including those with higher-than-average client caseloads; and (4) report their sampling decisions in an inconsistent and generally poor manner. In conclusion, although CSB is prevalent, it is still unclear whether it is intentional and meant to mislead stakeholders about the expected effect of the intervention or due to higher-level constraints or other considerations.

  15. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

    PubMed Central

    Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416

  16. WAIS-III index score profiles in the Canadian standardization sample.

    PubMed

    Lange, Rael T

    2007-01-01

    Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.

  17. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    PubMed

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  18. Tests for informative cluster size using a novel balanced bootstrap scheme.

    PubMed

    Nevalainen, Jaakko; Oja, Hannu; Datta, Somnath

    2017-07-20

    Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

    PubMed

    Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

    2016-08-31

    Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.

  20. Patterning ecological risk of pesticide contamination at the river basin scale.

    PubMed

    Faggiano, Leslie; de Zwart, Dick; García-Berthou, Emili; Lek, Sovan; Gevrey, Muriel

    2010-05-01

    Ecological risk assessment was conducted to determine the risk posed by pesticide mixtures to the Adour-Garonne river basin (south-western France). The objectives of this study were to assess the general state of this basin with regard to pesticide contamination using a risk assessment procedure and to detect patterns in toxic mixture assemblages through a self-organizing map (SOM) methodology in order to identify the locations at risk. Exposure assessment, risk assessment with species sensitivity distribution, and mixture toxicity rules were used to compute six relative risk predictors for different toxic modes of action: the multi-substance potentially affected fraction of species depending on the toxic mode of action of compounds found in the mixture (msPAF CA(TMoA) values). Those predictors computed for the 131 sampling sites assessed in this study were then patterned through the SOM learning process. Four clusters of sampling sites exhibiting similar toxic assemblages were identified. In the first cluster, which comprised 83% of the sampling sites, the risk caused by pesticide mixture toward aquatic species was weak (mean msPAF value for those sites<0.0036%), while in another cluster the risk was significant (mean msPAF<1.09%). GIS mapping allowed an interesting spatial pattern of the distribution of sampling sites for each cluster to be highlighted with a significant and highly localized risk in the French department called "Lot et Garonne". The combined use of the SOM methodology, mixture toxicity modelling and a clear geo-referenced representation of results not only revealed the general state of the Adour-Garonne basin with regard to contamination by pesticides but also enabled to analyze the spatial pattern of toxic mixture assemblage in order to prioritize the locations at risk and to detect the group of compounds causing the greatest risk at the basin scale. Copyright 2010 Elsevier B.V. All rights reserved.

  1. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    PubMed Central

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591

  2. Linking Teacher Competences to Organizational Citizenship Behaviour: The Role of Empowerment

    ERIC Educational Resources Information Center

    Kasekende, Francis; Munene, John C.; Otengei, Samson Omuudu; Ntayi, Joseph Mpeera

    2016-01-01

    Purpose: The purpose of this paper is to examine relationship between teacher competences and organizational citizenship behavior (OCB) with empowerment as a mediating factor. Design/methodology/approach: The study took a cross-sectional descriptive and analytical design. Using cluster and random sampling procedures, data were obtained from 383…

  3. Image of Turkish Basic Schools: A Reflection from the Province of Ankara

    ERIC Educational Resources Information Center

    Eres, Figen

    2011-01-01

    The purpose of this study was to investigate the organizational image of basic schools in Turkey, a rapidly developing nation that has been investing significantly in education. Participants were 730 residents of Ankara province in the Golbasi district. The participants were selected using a cluster sampling methodology. Data were collected…

  4. Cluster randomized trials utilizing primary care electronic health records: methodological issues in design, conduct, and analysis (eCRT Study).

    PubMed

    Gulliford, Martin C; van Staa, Tjeerd P; McDermott, Lisa; McCann, Gerard; Charlton, Judith; Dregan, Alex

    2014-06-11

    There is growing interest in conducting clinical and cluster randomized trials through electronic health records. This paper reports on the methodological issues identified during the implementation of two cluster randomized trials using the electronic health records of the Clinical Practice Research Datalink (CPRD). Two trials were completed in primary care: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke. The paper draws on documentary records and trial datasets to report on the methodological experience with respect to research ethics and research governance approval, general practice recruitment and allocation, sample size calculation and power, intervention implementation, and trial analysis. We obtained research governance approvals from more than 150 primary care organizations in England, Wales, and Scotland. There were 104 CPRD general practices recruited to the antibiotic trial and 106 to the stroke trial, with the target number of practices being recruited within six months. Interventions were installed into practice information systems remotely over the internet. The mean number of participants per practice was 5,588 in the antibiotic trial and 110 in the stroke trial, with the coefficient of variation of practice sizes being 0.53 and 0.56 respectively. Outcome measures showed substantial correlations between the 12 months before, and after intervention, with coefficients ranging from 0.42 for diastolic blood pressure to 0.91 for proportion of consultations with antibiotics prescribed, defining practice and participant eligibility for analysis requires careful consideration. Cluster randomized trials may be performed efficiently in large samples from UK general practices using the electronic health records of a primary care database. The geographical dispersal of trial sites presents a difficulty for research governance approval and intervention implementation. Pretrial data analyses should inform trial design and analysis plans. Current Controlled Trials ISRCTN 47558792 and ISRCTN 35701810 (both registered on 17 March 2010).

  5. Cluster randomized trials utilizing primary care electronic health records: methodological issues in design, conduct, and analysis (eCRT Study)

    PubMed Central

    2014-01-01

    Background There is growing interest in conducting clinical and cluster randomized trials through electronic health records. This paper reports on the methodological issues identified during the implementation of two cluster randomized trials using the electronic health records of the Clinical Practice Research Datalink (CPRD). Methods Two trials were completed in primary care: one aimed to reduce inappropriate antibiotic prescribing for acute respiratory infection; the other aimed to increase physician adherence with secondary prevention interventions after first stroke. The paper draws on documentary records and trial datasets to report on the methodological experience with respect to research ethics and research governance approval, general practice recruitment and allocation, sample size calculation and power, intervention implementation, and trial analysis. Results We obtained research governance approvals from more than 150 primary care organizations in England, Wales, and Scotland. There were 104 CPRD general practices recruited to the antibiotic trial and 106 to the stroke trial, with the target number of practices being recruited within six months. Interventions were installed into practice information systems remotely over the internet. The mean number of participants per practice was 5,588 in the antibiotic trial and 110 in the stroke trial, with the coefficient of variation of practice sizes being 0.53 and 0.56 respectively. Outcome measures showed substantial correlations between the 12 months before, and after intervention, with coefficients ranging from 0.42 for diastolic blood pressure to 0.91 for proportion of consultations with antibiotics prescribed, defining practice and participant eligibility for analysis requires careful consideration. Conclusions Cluster randomized trials may be performed efficiently in large samples from UK general practices using the electronic health records of a primary care database. The geographical dispersal of trial sites presents a difficulty for research governance approval and intervention implementation. Pretrial data analyses should inform trial design and analysis plans. Trial registration Current Controlled Trials ISRCTN 47558792 and ISRCTN 35701810 (both registered on 17 March 2010). PMID:24919485

  6. The Very Small Scale Clustering of SDSS-II and SDSS-III Galaxies

    NASA Astrophysics Data System (ADS)

    Piscionere, Jennifer

    2015-01-01

    We measure the angular clustering of galaxies from the Sloan Digital Sky Survey Data Release 7 in order to probe the spatial distribution of satellite galaxies within their dark matter halos. Specifically, we measure the angular correlation function on very small scales (7 - 320‧‧) in a range of luminosity threshold samples (absolute r-band magnitudes of -18 up to -21) that are constructed from the subset of SDSS that has been spectroscopically observed more than once (the so-called plate overlap region). We choose to measure angular clustering in this reduced survey footprint in order to minimize the effects of fiber collision incompleteness, which are otherwise substantial on these small scales. We model our clustering measurements using a fully numerical halo model that populates dark matter halos in N-body simulations to create realistic mock galaxy catalogs. The model has free parameters that specify both the number and spatial distribution of galaxies within their host halos. We adopt a flexible density profile for the spatial distribution of satellite galaxies that is similar to the dark matter Navarro-Frenk-White (NFW) profile, except that the inner slope is allowed to vary. We find that the angular clustering of our most luminous samples (Mr < -20 and -21) suggests that luminous satellite galaxies have substantially steeper inner density profiles than NFW. Lower luminosity samples are less constraining, however, and are consistent with satellite galaxies having shallow density profiles. Our results confirm the findings of Watson et al. (2012) while using different clustering measurements and modeling methodology. With the new SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS; Dawson et al., 2013), we can measure how the same class of galaxy evolves over time. The BOSS CMASS sample is of roughly constant stellar mass and number density out to z ˜ 0.6. The clustering of these samples appears to evolve very little with redshift, and each of the samples exhibit flattening of wp at roughly the same comoving distance of 100kpc.

  7. Cluster Sampling Bias in Government-Sponsored Evaluations: A Correlational Study of Employment and Welfare Pilots in England

    PubMed Central

    2016-01-01

    For pilot or experimental employment programme results to apply beyond their test bed, researchers must select ‘clusters’ (i.e. the job centres delivering the new intervention) that are reasonably representative of the whole territory. More specifically, this requirement must account for conditions that could artificially inflate the effect of a programme, such as the fluidity of the local labour market or the performance of the local job centre. Failure to achieve representativeness results in Cluster Sampling Bias (CSB). This paper makes three contributions to the literature. Theoretically, it approaches the notion of CSB as a human behaviour. It offers a comprehensive theory, whereby researchers with limited resources and conflicting priorities tend to oversample ‘effect-enhancing’ clusters when piloting a new intervention. Methodologically, it advocates for a ‘narrow and deep’ scope, as opposed to the ‘wide and shallow’ scope, which has prevailed so far. The PILOT-2 dataset was developed to test this idea. Empirically, it provides evidence on the prevalence of CSB. In conditions similar to the PILOT-2 case study, investigators (1) do not sample clusters with a view to maximise generalisability; (2) do not oversample ‘effect-enhancing’ clusters; (3) consistently oversample some clusters, including those with higher-than-average client caseloads; and (4) report their sampling decisions in an inconsistent and generally poor manner. In conclusion, although CSB is prevalent, it is still unclear whether it is intentional and meant to mislead stakeholders about the expected effect of the intervention or due to higher-level constraints or other considerations. PMID:27504823

  8. Quantity, topics, methods and findings of randomised controlled trials published by German university departments of general practice - systematic review.

    PubMed

    Heinmüller, Stefan; Schneider, Antonius; Linde, Klaus

    2016-04-23

    Academic infrastructures and networks for clinical research in primary care receive little funding in Germany. We aimed to provide an overview of the quantity, topics, methods and findings of randomised controlled trials published by German university departments of general practice. We searched Scopus (last search done in April 2015), publication lists of institutes and references of included articles. We included randomised trials published between January 2000 and December 2014 with a first or last author affiliated with a German university department of general practice or family medicine. Risk of bias was assessed with the Cochrane tool, and study findings were quantified using standardised mean differences (SMDs). Thirty-three trials met the inclusion criteria. Seventeen were cluster-randomised trials, with a majority investigating interventions aimed at improving processes compared with usual care. Sample sizes varied between 6 and 606 clusters and 168 and 7807 participants. The most frequent methodological problem was risk of selection bias due to recruitment of individuals after randomisation of clusters. Effects of interventions over usual care were mostly small (SMD <0.3). Sixteen trials randomising individual participants addressed a variety of treatment and educational interventions. Sample sizes varied between 20 and 1620 participants. The methodological quality of the trials was highly variable. Again, effects of experimental interventions over controls were mostly small. Despite limited funding, German university institutes of general practice or family medicine are increasingly performing randomised trials. Cluster-randomised trials on practice improvement are a focus, but problems with allocation concealment are frequent.

  9. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

    PubMed

    Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.

  10. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets

    PubMed Central

    Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225

  11. Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

    PubMed

    Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

    2009-02-01

    Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.

  12. Using Theory of Planned Behavior to Predict Healthy Eating among Danish Adolescents

    ERIC Educational Resources Information Center

    Gronhoj, Alice; Bech-Larsen, Tino; Chan, Kara; Tsang, Lennon

    2013-01-01

    Purpose: The purpose of the study was to apply the theory of planned behavior to predict Danish adolescents' behavioral intention for healthy eating. Design/methodology/approach: A cluster sample survey of 410 students aged 11 to 16 years studying in Grade 6 to Grade 10 was conducted in Denmark. Findings: Perceived behavioral control followed by…

  13. Subtyping adolescents with bulimia nervosa.

    PubMed

    Chen, Eunice Y; Le Grange, Daniel

    2007-12-01

    Cluster analyses of eating disorder patients have yielded a "dietary-depressive" subtype, typified by greater negative affect, and a "dietary" subtype, typified by dietary restraint. This study aimed to replicate these findings in an adolescent sample with bulimia nervosa (BN) from a randomized controlled trial and to examine the validity and reliability of this methodology. In the sample of BN adolescents (N=80), cluster analysis revealed a "dietary-depressive" subtype (37.5%) and a "dietary" subtype (62.5%) using the Beck Depression Inventory, Rosenberg Self-Esteem Scale and Eating Disorder Examination Restraint subscale. The "dietary-depressive" subtype compared to the "dietary" subtype was significantly more likely to: (1) report co-occurring disorders, (2) greater eating and weight concerns, and (3) less vomiting abstinence at post-treatment (all p's<.05). The cluster analysis based on "dietary" and "dietary-depressive" subtypes appeared to have concurrent validity, yielding more distinct groups than subtyping by vomiting frequency. In order to assess the reliability of the subtyping scheme, a larger sample of adolescents with mixed eating and weight disorders in an outpatient eating disorder clinic (N=149) was subtyped, yielding similar subtypes. These results support the validity and reliability of the subtyping strategy in two adolescent samples.

  14. Analyzing simulation-based PRA data through traditional and topological clustering: A BWR station blackout case study

    DOE PAGES

    Maljovec, D.; Liu, S.; Wang, B.; ...

    2015-07-14

    Here, dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP and MELCOR) with simulation controller codes (e.g., RAVEN and ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic and operating procedures) and stochastic (e.g., component failures and parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated,more » where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other for enhanced structural understanding of the data.« less

  15. [Alcohol consumption and positive alcohol expectancies in young adults: a typological approach using TwoStep cluster].

    PubMed

    Vautier, S; Jmel, S; Fourio, C; Moncany, D

    2007-09-01

    The present study investigates the heterogeneity of the population of young adult drinkers with respect to alcohol consumption and Positive Alcohol Expectancies (PAEs). Based on the positive relationship between both kinds of variables, PAE is commonly viewed as a potential motivational factor of alcoholic addiction. Empirical analyses based on the regression of alcohol consumption on PAEs suppose that the observations are statistically homogeneous with respect to the level of alcohol consumption, however. We explored the existence of moderate drinkers with a high PAE profile, and abusive drinkers with a low PAE profile. 1,017 young adult drinkers, mean age=23 +/- 2.84, with various educational levels, comprising 506 males and 511 females, were recruited as voluntary participants in a survey by undergraduate psychology students from the University of Toulouse Le Mirail. They completed a French version of the Alcohol Use Disorders Identifiction Test (AUDIT) and a French adaptation of the Alcohol Expectancy Questionnaire (AEQ). Three levels of alcohol consumption were defined using the AUDIT score, and six composite scores were obtained by averaging the relevant item-scores from the AEQ. The AEQ scores were interpreted as measurement of six kinds of PAEs, namely Global positive change, Sexual enhancement, Social and physical pleasure, Social assertiveness, Relaxation, and Arousal/Power. The TwoStep cluster methodology was used to explore the data. This methodology is convenient to deal with a mix of quantitative and qualitative variables, and it provides a classification model which is optimized through the use of an information criterion as Schwarz's Bayesian Information Criterion (BIC). The automatic clustering suggested five clusters, whose stability was ascertained until 75% of the sample size. Low drinkers (n=527) were split into one cluster of low PAEs (I1) and, interestingly, one cluster of high PAEs (I3, 46%). High drinkers (n=344) were split into one cluster of intermediate PAEs (II4) and one cluster of high PAEs (II5, 52%). Interestingly again, abusive drinkers (n=146) remained a single group (III2), exhibiting high PAEs. Clusters I3 and III3 comprised a significant proportion of males. Constraining the algorithm to find 6 clusters did not affect class III2, but split low drinkers into three clusters. Although the present results should be considered cautiously because of the novelty of TwoStep cluster methodology, they suggest a group of moderate drinkers with high PAEs. Also, abusive drinkers express high PAEs (except for 2 cases). Statistical homogeneity of moderate drinkers with respect to PAE variables appears as a dubious assumption.

  16. Synchronization of world economic activity

    NASA Astrophysics Data System (ADS)

    Groth, Andreas; Ghil, Michael

    2017-12-01

    Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.

  17. Synchronization of world economic activity.

    PubMed

    Groth, Andreas; Ghil, Michael

    2017-12-01

    Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.

  18. Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.

  19. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    PubMed

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.

  20. Grouping methods for estimating the prevalences of rare traits from complex survey data that preserve confidentiality of respondents.

    PubMed

    Hyun, Noorie; Gastwirth, Joseph L; Graubard, Barry I

    2018-03-26

    Originally, 2-stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1-stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1-stage group testing to surveys with sample weighted complex multistage-cluster designs. Sample weighted-generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests. Two difficulties arise when using group testing in complex samples: (1) How does one weight the results of the test on each group as the sample weights will differ among observations in the same group. Furthermore, if the sample weights are related to positivity of the diagnostic test, then group-level weighting is needed to reduce bias in the prevalence estimation; (2) How does one form groups that will allow accurate estimation of the standard errors of prevalence estimates under multistage-cluster sampling allowing for intracluster correlation of the test results. We study 5 different grouping methods to address the weighting and cluster sampling aspects of complex designed samples. Finite sample properties of the estimators of prevalences, variances, and confidence interval coverage for these grouping methods are studied using simulations. National Health and Nutrition Examination Survey data are used to illustrate the methods. Copyright © 2018 John Wiley & Sons, Ltd.

  1. Methodology of a nationwide cross-sectional survey of prevalence and epidemiological patterns of hepatitis A, B and C infection in Brazil.

    PubMed

    Ximenes, Ricardo Arraes de Alencar; Pereira, Leila Maria Beltrão; Martelli, Celina Maria Turchi; Merchán-Hamann, Edgar; Stein, Airton Tetelbom; Figueiredo, Gerusa Maria; Braga, Maria Cynthia; Montarroyos, Ulisses Ramos; Brasil, Leila Melo; Turchi, Marília Dalva; Fonseca, José Carlos Ferraz da; Lima, Maria Luiza Carvalho de; Alencar, Luis Cláudio Arraes de; Costa, Marcelo; Coral, Gabriela; Moreira, Regina Celia; Cardoso, Maria Regina Alves

    2010-09-01

    A population-based survey to provide information on the prevalence of hepatitis viral infection and the pattern of risk factors was carried out in the urban population of all Brazilian state capitals and the Federal District, between 2005 and 2009. This paper describes the design and methodology of the study which involved a population aged 5 to 19 for hepatitis A and 10 to 69 for hepatitis B and C. Interviews and blood samples were obtained through household visits. The sample was selected using stratified multi-stage cluster sampling and was drawn with equal probability from each domain of study (region and age-group). Nationwide, 19,280 households and ~31,000 residents were selected. The study is large enough to detect prevalence of viral infection around 0.1% and risk factor assessments within each region. The methodology seems to be a viable way of differentiating between distinct epidemiological patterns of hepatitis A, B and C. These data will be of value for the evaluation of vaccination policies and for the design of control program strategies.

  2. School and Emotional Well-Being: A Transcultural Analysis on Youth in Southern Spain

    ERIC Educational Resources Information Center

    Soriano, Encarnación; Cala, Verónica C. C.

    2018-01-01

    Purpose: The purpose of this paper is to assess and compare school well-being (SW) and emotional well-being (EW) among Romanian, Moroccan and Spanish youth, to determine the degree of relation between EW and scholar well-being. Design/methodology/approach: The paper employed cross-sectional research with cluster sampling in two primary schools and…

  3. Spatially resolved analysis of plutonium isotopic signatures in environmental particle samples by laser ablation-MC-ICP-MS.

    PubMed

    Konegger-Kappel, Stefanie; Prohaska, Thomas

    2016-01-01

    Laser ablation-multi-collector-inductively coupled plasma mass spectrometry (LA-MC-ICP-MS) was optimized and investigated with respect to its performance for determining spatially resolved Pu isotopic signatures within radioactive fuel particle clusters. Fuel particles had been emitted from the Chernobyl nuclear power plant (ChNPP) where the 1986 accident occurred and were deposited in the surrounding soil, where weathering processes caused their transformation into radioactive clusters, so-called micro-samples. The size of the investigated micro-samples, which showed surface alpha activities below 40 mBq, ranged from about 200 to 1000 μm. Direct single static point ablations allowed to identify variations of Pu isotopic signatures not only between distinct fuel particle clusters but also within individual clusters. The resolution was limited to 100 to 120 μm as a result of the applied laser ablation spot sizes and the resolving power of the nuclear track radiography methodology that was applied for particle pre-selection. The determined (242)Pu/(239)Pu and (240)Pu/(239)Pu isotope ratios showed a variation from low to high Pu isotope ratios, ranging from 0.007(2) to 0.047(8) for (242)Pu/(239)Pu and from 0.183(13) to 0.577(40) for (240)Pu/(239)Pu. In contrast to other studies, the applied methodology allowed for the first time to display the Pu isotopic distribution in the Chernobyl fallout, which reflects the differences in the spent fuel composition over the reactor core. The measured Pu isotopic signatures are in good agreement with the expected Pu isotopic composition distribution that is typical for a RBMK-1000 reactor, indicating that the analyzed samples are originating from the ill-fated Chernobyl reactor. The average Pu isotope ratios [(240)Pu/(239)Pu = 0.388(86), (242)Pu/(239)Pu = 0.028(11)] that were calculated from all investigated samples (n = 48) correspond well to previously published results of Pu analyses in contaminated samples from the vicinity of the Chernobyl NPP [e.g. (240)Pu/(239)Pu = 0.394(2) and (242)Pu/(239)Pu = 0.027(1); Nunnemann et al. (J Alloys Compd 271-273:45-48, 1998)].

  4. The use of hierarchical clustering for the design of optimized monitoring networks

    NASA Astrophysics Data System (ADS)

    Soares, Joana; Makar, Paul Andrew; Aklilu, Yayne; Akingunola, Ayodeji

    2018-05-01

    Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 - R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different sampling methodologies as well as outliers (stations' time series which are markedly different from all others in a given dataset).

  5. The Association of Multiple Interacting Genes with Specific Phenotypes in Rice Using Gene Coexpression Networks1[C][W][OA

    PubMed Central

    Ficklin, Stephen P.; Luo, Feng; Feltus, F. Alex

    2010-01-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes. PMID:20668062

  6. The association of multiple interacting genes with specific phenotypes in rice using gene coexpression networks.

    PubMed

    Ficklin, Stephen P; Luo, Feng; Feltus, F Alex

    2010-09-01

    Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.

  7. Who should be undertaking population-based surveys in humanitarian emergencies?

    PubMed Central

    Spiegel, Paul B

    2007-01-01

    Background Timely and accurate data are necessary to prioritise and effectively respond to humanitarian emergencies. 30-by-30 cluster surveys are commonly used in humanitarian emergencies because of their purported simplicity and reasonable validity and precision. Agencies have increasingly used 30-by-30 cluster surveys to undertake measurements beyond immunisation coverage and nutritional status. Methodological errors in cluster surveys have likely occurred for decades in humanitarian emergencies, often with unknown or unevaluated consequences. Discussion Most surveys in humanitarian emergencies are done by non-governmental organisations (NGOs). Some undertake good quality surveys while others have an already overburdened staff with limited epidemiological skills. Manuals explaining cluster survey methodology are available and in use. However, it is debatable as to whether using standardised, 'cookbook' survey methodologies are appropriate. Coordination of surveys is often lacking. If a coordinating body is established, as recommended, it is questionable whether it should have sole authority to release surveys due to insufficient independence. Donors should provide sufficient funding for personnel, training, and survey implementation, and not solely for direct programme implementation. Summary A dedicated corps of trained epidemiologists needs to be identified and made available to undertake surveys in humanitarian emergencies. NGOs in the field may need to form an alliance with certain specialised agencies or pool technically capable personnel. If NGOs continue to do surveys by themselves, a simple training manual with sample survey questionnaires, methodology, standardised files for data entry and analysis, and manual for interpretation should be developed and modified locally for each situation. At the beginning of an emergency, a central coordinating body should be established that has sufficient authority to set survey standards, coordinate when and where surveys should be undertaken and act as a survey repository. Technical expertise is expensive and donors must pay for it. As donors increasingly demand evidence-based programming, they have an obligation to ensure that sufficient funds are provided so organisations have adequate technical staff. PMID:17543107

  8. Enhancing local health department disaster response capacity with rapid community needs assessments: validation of a computerized program for binary attribute cluster sampling.

    PubMed

    Groenewold, Matthew R

    2006-01-01

    Local health departments are among the first agencies to respond to disasters or other mass emergencies. However, they often lack the ability to handle large-scale events. Plans including locally developed and deployed tools may enhance local response. Simplified cluster sampling methods can be useful in assessing community needs after a sudden-onset, short duration event. Using an adaptation of the methodology used by the World Health Organization Expanded Programme on Immunization (EPI), a Microsoft Access-based application for two-stage cluster sampling of residential addresses in Louisville/Jefferson County Metro, Kentucky was developed. The sampling frame was derived from geographically referenced data on residential addresses and political districts available through the Louisville/Jefferson County Information Consortium (LOJIC). The program randomly selected 30 clusters, defined as election precincts, from within the area of interest, and then, randomly selected 10 residential addresses from each cluster. The program, called the Rapid Assessment Tools Package (RATP), was tested in terms of accuracy and precision using data on a dichotomous characteristic of residential addresses available from the local tax assessor database. A series of 30 samples were produced and analyzed with respect to their precision and accuracy in estimating the prevalence of the study attribute. Point estimates with 95% confidence intervals were calculated by determining the proportion of the study attribute values in each of the samples and compared with the population proportion. To estimate the design effect, corresponding simple random samples of 300 addresses were taken after each of the 30 cluster samples. The sample proportion fell within +/-10 absolute percentage points of the true proportion in 80% of the samples. In 93.3% of the samples, the point estimate fell within +/-12.5%, and 96.7% fell within +/-15%. All of the point estimates fell within +/-20% of the true proportion. Estimates of the design effect ranged from 0.926 to 1.436 (mean = 1.157, median = 1.170) for the 30 samples. Although prospective evaluation of its performance in field trials or a real emergency is required to confirm its utility, this study suggests that the RATP, a locally designed and deployed tool, may provide population-based estimates of community needs or the extent of event-related consequences that are precise enough to serve as the basis for the initial post-event decisions regarding relief efforts.

  9. Using scan statistics for congenital anomalies surveillance: the EUROCAT methodology.

    PubMed

    Teljeur, Conor; Kelly, Alan; Loane, Maria; Densem, James; Dolk, Helen

    2015-11-01

    Scan statistics have been used extensively to identify temporal clusters of health events. We describe the temporal cluster detection methodology adopted by the EUROCAT (European Surveillance of Congenital Anomalies) monitoring system. Since 2001, EUROCAT has implemented variable window width scan statistic for detecting unusual temporal aggregations of congenital anomaly cases. The scan windows are based on numbers of cases rather than being defined by time. The methodology is imbedded in the EUROCAT Central Database for annual application to centrally held registry data. The methodology was incrementally adapted to improve the utility and to address statistical issues. Simulation exercises were used to determine the power of the methodology to identify periods of raised risk (of 1-18 months). In order to operationalize the scan methodology, a number of adaptations were needed, including: estimating date of conception as unit of time; deciding the maximum length (in time) and recency of clusters of interest; reporting of multiple and overlapping significant clusters; replacing the Monte Carlo simulation with a lookup table to reduce computation time; and placing a threshold on underlying population change and estimating the false positive rate by simulation. Exploration of power found that raised risk periods lasting 1 month are unlikely to be detected except when the relative risk and case counts are high. The variable window width scan statistic is a useful tool for the surveillance of congenital anomalies. Numerous adaptations have improved the utility of the original methodology in the context of temporal cluster detection in congenital anomalies.

  10. Using Public Data for Comparative Proteome Analysis in Precision Medicine Programs.

    PubMed

    Hughes, Christopher S; Morin, Gregg B

    2018-03-01

    Maximizing the clinical utility of information obtained in longitudinal precision medicine programs would benefit from robust comparative analyses to known information to assess biological features of patient material toward identifying the underlying features driving their disease phenotype. Herein, the potential for utilizing publically deposited mass-spectrometry-based proteomics data to perform inter-study comparisons of cell-line or tumor-tissue materials is investigated. To investigate the robustness of comparison between MS-based proteomics studies carried out with different methodologies, deposited data representative of label-free (MS1) and isobaric tagging (MS2 and MS3 quantification) are utilized. In-depth quantitative proteomics data acquired from analysis of ovarian cancer cell lines revealed the robust recapitulation of observable gene expression dynamics between individual studies carried out using significantly different methodologies. The observed signatures enable robust inter-study clustering of cell line samples. In addition, the ability to classify and cluster tumor samples based on observed gene expression trends when using a single patient sample is established. With this analysis, relevant gene expression dynamics are obtained from a single patient tumor, in the context of a precision medicine analysis, by leveraging a large cohort of repository data as a comparator. Together, these data establish the potential for state-of-the-art MS-based proteomics data to serve as resources for robust comparative analyses in precision medicine applications. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Cluster Randomized Test-Negative Design (CR-TND) Trials: A Novel and Efficient Method to Assess the Efficacy of Community Level Dengue Interventions.

    PubMed

    Anders, Katherine L; Cutcher, Zoe; Kleinschmidt, Immo; Donnelly, Christl A; Ferguson, Neil M; Indriani, Citra; O'Neill, Scott L; Jewell, Nicholas P; Simmons, Cameron P

    2018-05-07

    Cluster randomized trials are the gold standard for assessing efficacy of community-level interventions, such as vector control strategies against dengue. We describe a novel cluster randomized trial methodology with a test-negative design, which offers advantages over traditional approaches. It utilizes outcome-based sampling of patients presenting with a syndrome consistent with the disease of interest, who are subsequently classified as test-positive cases or test-negative controls on the basis of diagnostic testing. We use simulations of a cluster trial to demonstrate validity of efficacy estimates under the test-negative approach. This demonstrates that, provided study arms are balanced for both test-negative and test-positive illness at baseline and that other test-negative design assumptions are met, the efficacy estimates closely match true efficacy. We also briefly discuss analytical considerations for an odds ratio-based effect estimate arising from clustered data, and outline potential approaches to analysis. We conclude that application of the test-negative design to certain cluster randomized trials could increase their efficiency and ease of implementation.

  12. Improving detection probabilities for pests in stored grain.

    PubMed

    Elmouttie, David; Kiermeier, Andreas; Hamilton, Grant

    2010-12-01

    The presence of insects in stored grain is a significant problem for grain farmers, bulk grain handlers and distributors worldwide. Inspection of bulk grain commodities is essential to detect pests and thereby to reduce the risk of their presence in exported goods. It has been well documented that insect pests cluster in response to factors such as microclimatic conditions within bulk grain. Statistical sampling methodologies for grain, however, have typically considered pests and pathogens to be homogeneously distributed throughout grain commodities. In this paper, a sampling methodology is demonstrated that accounts for the heterogeneous distribution of insects in bulk grain. It is shown that failure to account for the heterogeneous distribution of pests may lead to overestimates of the capacity for a sampling programme to detect insects in bulk grain. The results indicate the importance of the proportion of grain that is infested in addition to the density of pests within the infested grain. It is also demonstrated that the probability of detecting pests in bulk grain increases as the number of subsamples increases, even when the total volume or mass of grain sampled remains constant. This study underlines the importance of considering an appropriate biological model when developing sampling methodologies for insect pests. Accounting for a heterogeneous distribution of pests leads to a considerable improvement in the detection of pests over traditional sampling models. Copyright © 2010 Society of Chemical Industry.

  13. Clustering of samples and variables with mixed-type data

    PubMed Central

    Edelmann, Dominic; Kopp-Schneider, Annette

    2017-01-01

    Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix. PMID:29182671

  14. The Grism Lens-Amplified Survey from Space (GLASS). V. Extent and Spatial Distribution of Star Formation in z ~ 0.5 Cluster Galaxies

    NASA Astrophysics Data System (ADS)

    Vulcani, Benedetta; Treu, Tommaso; Schmidt, Kasper B.; Poggianti, Bianca M.; Dressler, Alan; Fontana, Adriano; Bradač, Marusa; Brammer, Gabriel B.; Hoag, Austin; Huang, Kuan-Han; Malkan, Matthew; Pentericci, Laura; Trenti, Michele; von der Linden, Anja; Abramson, Louis; He, Julie; Morris, Glenn

    2015-12-01

    We present the first study of the spatial distribution of star formation in z ˜ 0.5 cluster galaxies. The analysis is based on data taken with the Wide Field Camera 3 as part of the Grism Lens-Amplified Survey from Space (GLASS). We illustrate the methodology by focusing on two clusters (MACS 0717.5+3745 and MACS 1423.8+2404) with different morphologies (one relaxed and one merging) and use foreground and background galaxies as a field control sample. The cluster+field sample consists of 42 galaxies with stellar masses in the range 108-1011 M⊙ and star formation rates in the range 1-20 M⊙ yr-1. Both in clusters and in the field, Hα is more extended than the rest-frame UV continuum in 60% of the cases, consistent with diffuse star formation and inside-out growth. In ˜20% of the cases, the Hα emission appears more extended in cluster galaxies than in the field, pointing perhaps to ionized gas being stripped and/or star formation being enhanced at large radii. The peak of the Hα emission and that of the continuum are offset by less than 1 kpc. We investigate trends with the hot gas density as traced by the X-ray emission, and with the surface mass density as inferred from gravitational lens models, and find no conclusive results. The diversity of morphologies and sizes observed in Hα illustrates the complexity of the environmental processes that regulate star formation. Upcoming analysis of the full GLASS data set will increase our sample size by almost an order of magnitude, verifying and strengthening the inference from this initial data set.

  15. Generalized quantum kinetic expansion: Higher-order corrections to multichromophoric Förster theory

    NASA Astrophysics Data System (ADS)

    Wu, Jianlan; Gong, Zhihao; Tang, Zhoufei

    2015-08-01

    For a general two-cluster energy transfer network, a new methodology of the generalized quantum kinetic expansion (GQKE) method is developed, which predicts an exact time-convolution equation for the cluster population evolution under the initial condition of the local cluster equilibrium state. The cluster-to-cluster rate kernel is expanded over the inter-cluster couplings. The lowest second-order GQKE rate recovers the multichromophoric Förster theory (MCFT) rate. The higher-order corrections to the MCFT rate are systematically included using the continued fraction resummation form, resulting in the resummed GQKE method. The reliability of the GQKE methodology is verified in two model systems, revealing the relevance of higher-order corrections.

  16. PhyloChip™ microarray comparison of sampling methods used for coral microbial ecology

    USGS Publications Warehouse

    Kellogg, Christina A.; Piceno, Yvette M.; Tom, Lauren M.; DeSantis, Todd Z.; Zawada, David G.; Andersen, Gary L.

    2012-01-01

    Interest in coral microbial ecology has been increasing steadily over the last decade, yet standardized methods of sample collection still have not been defined. Two methods were compared for their ability to sample coral-associated microbial communities: tissue punches and foam swabs, the latter being less invasive and preferred by reef managers. Four colonies of star coral, Montastraea annularis, were sampled in the Dry Tortugas National Park (two healthy and two with white plague disease). The PhyloChip™ G3 microarray was used to assess microbial community structure of amplified 16S rRNA gene sequences. Samples clustered based on methodology rather than coral colony. Punch samples from healthy and diseased corals were distinct. All swab samples clustered closely together with the seawater control and did not group according to the health state of the corals. Although more microbial taxa were detected by the swab method, there is a much larger overlap between the water control and swab samples than punch samples, suggesting some of the additional diversity is due to contamination from water absorbed by the swab. While swabs are useful for noninvasive studies of the coral surface mucus layer, these results show that they are not optimal for studies of coral disease.

  17. PhyloChip™ microarray comparison of sampling methods used for coral microbial ecology.

    PubMed

    Kellogg, Christina A; Piceno, Yvette M; Tom, Lauren M; DeSantis, Todd Z; Zawada, David G; Andersen, Gary L

    2012-01-01

    Interest in coral microbial ecology has been increasing steadily over the last decade, yet standardized methods of sample collection still have not been defined. Two methods were compared for their ability to sample coral-associated microbial communities: tissue punches and foam swabs, the latter being less invasive and preferred by reef managers. Four colonies of star coral, Montastraea annularis, were sampled in the Dry Tortugas National Park (two healthy and two with white plague disease). The PhyloChip™ G3 microarray was used to assess microbial community structure of amplified 16S rRNA gene sequences. Samples clustered based on methodology rather than coral colony. Punch samples from healthy and diseased corals were distinct. All swab samples clustered closely together with the seawater control and did not group according to the health state of the corals. Although more microbial taxa were detected by the swab method, there is a much larger overlap between the water control and swab samples than punch samples, suggesting some of the additional diversity is due to contamination from water absorbed by the swab. While swabs are useful for noninvasive studies of the coral surface mucus layer, these results show that they are not optimal for studies of coral disease. Published by Elsevier B.V.

  18. Segmentation methodology for automated classification and differentiation of soft tissues in multiband images of high-resolution ultrasonic transmission tomography.

    PubMed

    Jeong, Jeong-Won; Shin, Dae C; Do, Synho; Marmarelis, Vasilis Z

    2006-08-01

    This paper presents a novel segmentation methodology for automated classification and differentiation of soft tissues using multiband data obtained with the newly developed system of high-resolution ultrasonic transmission tomography (HUTT) for imaging biological organs. This methodology extends and combines two existing approaches: the L-level set active contour (AC) segmentation approach and the agglomerative hierarchical kappa-means approach for unsupervised clustering (UC). To prevent the trapping of the current iterative minimization AC algorithm in a local minimum, we introduce a multiresolution approach that applies the level set functions at successively increasing resolutions of the image data. The resulting AC clusters are subsequently rearranged by the UC algorithm that seeks the optimal set of clusters yielding the minimum within-cluster distances in the feature space. The presented results from Monte Carlo simulations and experimental animal-tissue data demonstrate that the proposed methodology outperforms other existing methods without depending on heuristic parameters and provides a reliable means for soft tissue differentiation in HUTT images.

  19. Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters

    NASA Astrophysics Data System (ADS)

    Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.

    2018-06-01

    We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.

  20. Melissopalynological Characterization of North Algerian Honeys.

    PubMed

    Nair, Samira; Meddah, Boumedienne; Aoues, Abdelkader

    2013-03-07

    A pollen analysis of Algerian honey was conducted on a total of 10 honey samples. The samples were prepared using the methodology described by Louveaux et al ., that was then further adapted by Ohe et al . The samples were subsequently observed using light microscopy. A total of 36 pollen taxa were discovered and could be identified in the analyzed honey samples. Seventy percent of the studied samples belonged to the group ofmonofloral honeys represented by Eucalyptus globulus , Thymus vulgaris , Citrus sp. and Lavandula angustifolia . Multifloral honeys comprised 30% of the honey samples, with pollen grains of Lavandula stoechas (28.49%) standing out as the most prevalent. Based on cluster analysis, two different groups of honey were observed according to different pollen types found in the samples. The identified pollen spectrum of honey confirmed their botanical origin.

  1. Mining biological information from 3D short time-series gene expression data: the OPTricluster algorithm.

    PubMed

    Tchagang, Alain B; Phan, Sieu; Famili, Fazel; Shearer, Heather; Fobert, Pierre; Huang, Yi; Zou, Jitao; Huang, Daiqing; Cutler, Adrian; Liu, Ziying; Pan, Youlian

    2012-04-04

    Nowadays, it is possible to collect expression levels of a set of genes from a set of biological samples during a series of time points. Such data have three dimensions: gene-sample-time (GST). Thus they are called 3D microarray gene expression data. To take advantage of the 3D data collected, and to fully understand the biological knowledge hidden in the GST data, novel subspace clustering algorithms have to be developed to effectively address the biological problem in the corresponding space. We developed a subspace clustering algorithm called Order Preserving Triclustering (OPTricluster), for 3D short time-series data mining. OPTricluster is able to identify 3D clusters with coherent evolution from a given 3D dataset using a combinatorial approach on the sample dimension, and the order preserving (OP) concept on the time dimension. The fusion of the two methodologies allows one to study similarities and differences between samples in terms of their temporal expression profile. OPTricluster has been successfully applied to four case studies: immune response in mice infected by malaria (Plasmodium chabaudi), systemic acquired resistance in Arabidopsis thaliana, similarities and differences between inner and outer cotyledon in Brassica napus during seed development, and to Brassica napus whole seed development. These studies showed that OPTricluster is robust to noise and is able to detect the similarities and differences between biological samples. Our analysis showed that OPTricluster generally outperforms other well known clustering algorithms such as the TRICLUSTER, gTRICLUSTER and K-means; it is robust to noise and can effectively mine the biological knowledge hidden in the 3D short time-series gene expression data.

  2. Design of the South East Asian Nutrition Survey (SEANUTS): a four-country multistage cluster design study.

    PubMed

    Schaafsma, Anne; Deurenberg, Paul; Calame, Wim; van den Heuvel, Ellen G H M; van Beusekom, Christien; Hautvast, Jo; Sandjaja; Bee Koon, Poh; Rojroongwasinkul, Nipa; Le Nguyen, Bao Khanh; Parikh, Panam; Khouw, Ilse

    2013-09-01

    Nutrition is a well-known factor in the growth, health and development of children. It is also acknowledged that worldwide many people have dietary imbalances resulting in over- or undernutrition. In 2009, the multinational food company FrieslandCampina initiated the South East Asian Nutrition Survey (SEANUTS), a combination of surveys carried out in Indonesia, Malaysia, Thailand and Vietnam, to get a better insight into these imbalances. The present study describes the general study design and methodology, as well as some problems and pitfalls encountered. In each of these countries, participants in the age range of 0·5-12 years were recruited according to a multistage cluster randomised or stratified random sampling methodology. Field teams took care of recruitment and data collection. For the health status of children, growth and body composition, physical activity, bone density, and development and cognition were measured. For nutrition, food intake and food habits were assessed by questionnaires, whereas in subpopulations blood and urine samples were collected to measure the biochemical status parameters of Fe, vitamins A and D, and DHA. In Thailand, the researchers additionally studied the lipid profile in blood, whereas in Indonesia iodine excretion in urine was analysed. Biochemical data were analysed in certified laboratories. Study protocols and methodology were aligned where practically possible. In December 2011, data collection was finalised. In total, 16,744 children participated in the present study. Information that will be very relevant for formulating nutritional health policies, as well as for designing innovative food and nutrition research and development programmes, has become available.

  3. MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

    PubMed

    Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A

    2011-07-01

    Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.

  4. Function Clustering Self-Organization Maps (FCSOMs) for mining differentially expressed genes in Drosophila and its correlation with the growth medium.

    PubMed

    Liu, L L; Liu, M J; Ma, M

    2015-09-28

    The central task of this study was to mine the gene-to-medium relationship. Adequate knowledge of this relationship could potentially improve the accuracy of differentially expressed gene mining. One of the approaches to differentially expressed gene mining uses conventional clustering algorithms to identify the gene-to-medium relationship. Compared to conventional clustering algorithms, self-organization maps (SOMs) identify the nonlinear aspects of the gene-to-medium relationships by mapping the input space into another higher dimensional feature space. However, SOMs are not suitable for huge datasets consisting of millions of samples. Therefore, a new computational model, the Function Clustering Self-Organization Maps (FCSOMs), was developed. FCSOMs take advantage of the theory of granular computing as well as advanced statistical learning methodologies, and are built specifically for each information granule (a function cluster of genes), which are intelligently partitioned by the clustering algorithm provided by the DAVID_6.7 software platform. However, only the gene functions, and not their expression values, are considered in the fuzzy clustering algorithm of DAVID. Compared to the clustering algorithm of DAVID, these experimental results show a marked improvement in the accuracy of classification with the application of FCSOMs. FCSOMs can handle huge datasets and their complex classification problems, as each FCSOM (modeled for each function cluster) can be easily parallelized.

  5. Identification of Common Differentially Expressed Genes in Urinary Bladder Cancer

    PubMed Central

    Zaravinos, Apostolos; Lambrou, George I.; Boulalas, Ioannis; Delakas, Dimitris; Spandidos, Demetrios A.

    2011-01-01

    Background Current diagnosis and treatment of urinary bladder cancer (BC) has shown great progress with the utilization of microarrays. Purpose Our goal was to identify common differentially expressed (DE) genes among clinically relevant subclasses of BC using microarrays. Methodology/Principal Findings BC samples and controls, both experimental and publicly available datasets, were analyzed by whole genome microarrays. We grouped the samples according to their histology and defined the DE genes in each sample individually, as well as in each tumor group. A dual analysis strategy was followed. First, experimental samples were analyzed and conclusions were formulated; and second, experimental sets were combined with publicly available microarray datasets and were further analyzed in search of common DE genes. The experimental dataset identified 831 genes that were DE in all tumor samples, simultaneously. Moreover, 33 genes were up-regulated and 85 genes were down-regulated in all 10 BC samples compared to the 5 normal tissues, simultaneously. Hierarchical clustering partitioned tumor groups in accordance to their histology. K-means clustering of all genes and all samples, as well as clustering of tumor groups, presented 49 clusters. K-means clustering of common DE genes in all samples revealed 24 clusters. Genes manifested various differential patterns of expression, based on PCA. YY1 and NFκB were among the most common transcription factors that regulated the expression of the identified DE genes. Chromosome 1 contained 32 DE genes, followed by chromosomes 2 and 11, which contained 25 and 23 DE genes, respectively. Chromosome 21 had the least number of DE genes. GO analysis revealed the prevalence of transport and binding genes in the common down-regulated DE genes; the prevalence of RNA metabolism and processing genes in the up-regulated DE genes; as well as the prevalence of genes responsible for cell communication and signal transduction in the DE genes that were down-regulated in T1-Grade III tumors and up-regulated in T2/T3-Grade III tumors. Combination of samples from all microarray platforms revealed 17 common DE genes, (BMP4, CRYGD, DBH, GJB1, KRT83, MPZ, NHLH1, TACR3, ACTC1, MFAP4, SPARCL1, TAGLN, TPM2, CDC20, LHCGR, TM9SF1 and HCCS) 4 of which participate in numerous pathways. Conclusions/Significance The identification of the common DE genes among BC samples of different histology can provide further insight into the discovery of new putative markers. PMID:21483740

  6. The clustering of galaxies in the completed SDSS-III Baryon Oscillation Spectroscopic Survey: Single-probe measurements from DR12 galaxy clustering – towards an accurate model

    DOE PAGES

    Chia -Hsun Chuang; Pellejero-Ibanez, Marco; Rodriguez-Torres, Sergio; ...

    2016-06-26

    We analyze the broad-range shape of the monopole and quadrupole correlation functions of the BOSS Data Release 12 (DR12) CMASS and LOWZ galaxy sample to obtain constraints on the Hubble expansion rate H(z), the angular-diameter distance DA(z), the normalised growth rate f(z)σ 8(z), and the physical matter density Ω mh 2. In addition, we adopt wide and flat priors on all model parameters in order to ensure the results are those of a `single-probe' galaxy clustering analysis. We also marginalize over three nuisance terms that account for potential observational systematics affecting the measured monopole. However, such Monte Carlo Markov Chainmore » analysis is computationally expensive for advanced theoretical models, thus we develop a new methodology to speed up our analysis.« less

  7. A methodological study of genome-wide DNA methylation analyses using matched archival formalin-fixed paraffin embedded and fresh frozen breast tumors.

    PubMed

    Espinal, Allyson C; Wang, Dan; Yan, Li; Liu, Song; Tang, Li; Hu, Qiang; Morrison, Carl D; Ambrosone, Christine B; Higgins, Michael J; Sucheston-Campbell, Lara E

    2017-02-28

    DNA from archival formalin-fixed and paraffin embedded (FFPE) tissue is an invaluable resource for genome-wide methylation studies although concerns about poor quality may limit its use. In this study, we compared DNA methylation profiles of breast tumors using DNA from fresh-frozen (FF) tissues and three types of matched FFPE samples. For 9/10 patients, correlation and unsupervised clustering analysis revealed that the FF and FFPE samples were consistently correlated with each other and clustered into distinct subgroups. Greater than 84% of the top 100 loci previously shown to differentiate ER+ and ER- tumors in FF tissues were also FFPE DML. Weighted Correlation Gene Network Analyses (WCGNA) grouped the DML loci into 16 modules in FF tissue, with ~85% of the module membership preserved across tissue types. Restored FFPE and matched FF samples were profiled using the Illumina Infinium HumanMethylation450K platform. Methylation levels (β-values) across all loci and the top 100 loci previously shown to differentiate tumors by estrogen receptor status (ER+ or ER-) in a larger FF study, were compared between matched FF and FFPE samples using Pearson's correlation, hierarchical clustering and WCGNA. Positive predictive values and sensitivity levels for detecting differentially methylated loci (DML) in FF samples were calculated in an independent FFPE cohort. FFPE breast tumors samples show lower overall detection of DMLs versus FF, however FFPE and FF DMLs compare favorably. These results support the emerging consensus that the 450K platform can be employed to investigate epigenetics in large sets of archival FFPE tissues.

  8. Melissopalynological Characterization of North Algerian Honeys

    PubMed Central

    Nair, Samira; Meddah, Boumedienne; Aoues, Abdelkader

    2013-01-01

    A pollen analysis of Algerian honey was conducted on a total of 10 honey samples. The samples were prepared using the methodology described by Louveaux et al., that was then further adapted by Ohe et al. The samples were subsequently observed using light microscopy. A total of 36 pollen taxa were discovered and could be identified in the analyzed honey samples. Seventy percent of the studied samples belonged to the group ofmonofloral honeys represented by Eucalyptus globulus, Thymus vulgaris, Citrus sp. and Lavandula angustifolia. Multifloral honeys comprised 30% of the honey samples, with pollen grains of Lavandula stoechas (28.49%) standing out as the most prevalent. Based on cluster analysis, two different groups of honey were observed according to different pollen types found in the samples. The identified pollen spectrum of honey confirmed their botanical origin. PMID:28239099

  9. Source Apportionment and Risk Assessment of Emerging Contaminants: An Approach of Pharmaco-Signature in Water Systems

    PubMed Central

    Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.

    2015-01-01

    This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375

  10. Mapping the spatial distribution of star formation in cluster galaxies at z ~ 0.5 with the Grism Lens-Amplified Survey from Space (GLASS)

    NASA Astrophysics Data System (ADS)

    Vulcani, Benedetta; Vulcani

    We present the first study of the spatial distribution of star formation in z ~ 0.5 cluster galaxies. The analysis is based on data taken with the Wide Field Camera 3 as part of the Grism Lens-Amplified Survey from Space (GLASS). We illustrate the methodology by focusing on two clusters (MACS0717.5+3745 and MACS1423.8+2404) with different morphologies (one relaxed and one merging) and use foreground and background galaxies as field control sample. The cluster+field sample consists of 42 galaxies with stellar masses in the range 108-1011 M ⊙, and star formation rates in the range 1-20 M⊙ yr -1. In both environments, Hα is more extended than the rest-frame UV continuum in 60% of the cases, consistent with diffuse star formation and inside out growth. The Hα emission appears more extended in cluster galaxies than in the field, pointing perhaps to ionized gas being stripped and/or star formation being enhanced at large radii. The peak of the Hα emission and that of the continuum are offset by less than 1 kpc. We investigate trends with the hot gas density as traced by the X-ray emission, and with the surface mass density as inferred from gravitational lens models and find no conclusive results. The diversity of morphologies and sizes observed in Hα illustrates the complexity of the environmental process that regulate star formation.

  11. Sequential analysis of hydrochemical data for watershed characterization.

    PubMed

    Thyne, Geoffrey; Güler, Cüneyt; Poeter, Eileen

    2004-01-01

    A methodology for characterizing the hydrogeology of watersheds using hydrochemical data that combine statistical, geochemical, and spatial techniques is presented. Surface water and ground water base flow and spring runoff samples (180 total) from a single watershed are first classified using hierarchical cluster analysis. The statistical clusters are analyzed for spatial coherence confirming that the clusters have a geological basis corresponding to topographic flowpaths and showing that the fractured rock aquifer behaves as an equivalent porous medium on the watershed scale. Then principal component analysis (PCA) is used to determine the sources of variation between parameters. PCA analysis shows that the variations within the dataset are related to variations in calcium, magnesium, SO4, and HCO3, which are derived from natural weathering reactions, and pH, NO3, and chlorine, which indicate anthropogenic impact. PHREEQC modeling is used to quantitatively describe the natural hydrochemical evolution for the watershed and aid in discrimination of samples that have an anthropogenic component. Finally, the seasonal changes in the water chemistry of individual sites were analyzed to better characterize the spatial variability of vertical hydraulic conductivity. The integrated result provides a method to characterize the hydrogeology of the watershed that fully utilizes traditional data.

  12. Drugs and personality: comparison of drug users, nonusers, and other clinical groups on the 16PF.

    PubMed

    Spotts, J V; Shontz, F C

    1991-10-01

    This article reviews published 16PF research on drug users. It also compares the 16PF scores of a new sample of nonusers with scores of matched groups of heavy, chronic users of cocaine, amphetamine, opiates, and barbiturates/sedative hypnotics, as well as combined groups of stimulant users, depressant users, and a combined group of users of all substances. No significant differences were found among drug user groups, but the profile of the nonuser group was distinctive. K-Means Cluster Analyses, as well as Cattell's Similarity and Pearson Product Moment Correlation Coefficients, were used to compare profiles of these new samples with the 19 groups described in an earlier meta-analysis of published 16PF studies. Data from the new samples did not cluster with data from other published research, although certain specific similarities appeared in more detailed correlational analyses. Methodological problems are discussed, and it is recommended that in future studies drug user groups be more carefully selected and defined, sample descriptions be more thorough and complete, complete profile information be routinely provided, and efforts be made to explore the utility of the Cattell CAQ in studies of drug users/misusers.

  13. Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.; Henson, Robert

    2012-01-01

    A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…

  14. A Refined Methodology for Defining Plant Communities Using Postagricultural Data from the Neotropics

    PubMed Central

    Myster, Randall W.

    2012-01-01

    How best to define and quantify plant communities was investigated using long-term plot data sampled from a recovering pasture in Puerto Rico and abandoned sugarcane and banana plantations in Ecuador. Significant positive associations between pairs of old field species were first computed and then clustered together into larger and larger species groups. I found that (1) no pasture or plantation had more than 5% of the possible significant positive associations, (2) clustering metrics showed groups of species participating in similar clusters among the five pasture/plantations over a gradient of decreasing association strength, and (3) there was evidence for repeatable communities—especially after banana cultivation—suggesting that past crops not only persist after abandonment but also form significant associations with invading plants. I then showed how the clustering hierarchy could be used to decide if any two pasture/plantation plots were in the same community, that is, to define old field communities. Finally, I suggested a similar procedure could be used for any plant community where the mechanisms and tolerances of species form the “cohesion” that produces clustering, making plant communities different than random assemblages of species. PMID:22536137

  15. Transport in the Subtropical Lowermost Stratosphere during CRYSTAL-FACE

    NASA Technical Reports Server (NTRS)

    Pittman, Jasna V.; Weinstock, elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph; hide

    2007-01-01

    We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NO(y)) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations, and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and lastly by the extent of convective influence, potentially related to the latitude of convective injection [Dessler and Sherwuud, 2004]. We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and non-local events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.

  16. Transport in the Subtropical Lowermost Stratosphere during the Cirrus Regional Study of Tropical Anvils and Cirrus Layers-Florida Area Cirrus Experiment

    NASA Technical Reports Server (NTRS)

    Pittman, Jasna V.; Weinstock, Elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph; hide

    2007-01-01

    We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NOy) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and last by the extent of convective influence, potentially related to the latitude of convective injection (Dessler and Sherwood, 2004). We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and nonlocal events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.

  17. Is the cluster environment quenching the Seyfert activity in elliptical and spiral galaxies?

    NASA Astrophysics Data System (ADS)

    de Souza, R. S.; Dantas, M. L. L.; Krone-Martins, A.; Cameron, E.; Coelho, P.; Hattab, M. W.; de Val-Borro, M.; Hilbe, J. M.; Elliott, J.; Hagen, A.; COIN Collaboration

    2016-09-01

    We developed a hierarchical Bayesian model (HBM) to investigate how the presence of Seyfert activity relates to their environment, herein represented by the galaxy cluster mass, M200, and the normalized cluster centric distance, r/r200. We achieved this by constructing an unbiased sample of galaxies from the Sloan Digital Sky Survey, with morphological classifications provided by the Galaxy Zoo Project. A propensity score matching approach is introduced to control the effects of confounding variables: stellar mass, galaxy colour, and star formation rate. The connection between Seyfert-activity and environmental properties in the de-biased sample is modelled within an HBM framework using the so-called logistic regression technique, suitable for the analysis of binary data (e.g. whether or not a galaxy hosts an AGN). Unlike standard ordinary least square fitting methods, our methodology naturally allows modelling the probability of Seyfert-AGN activity in galaxies on their natural scale, I.e. as a binary variable. Furthermore, we demonstrate how an HBM can incorporate information of each particular galaxy morphological type in an unified framework. In elliptical galaxies our analysis indicates a strong correlation of Seyfert-AGN activity with r/r200, and a weaker correlation with the mass of the host cluster. In spiral galaxies these trends do not appear, suggesting that the link between Seyfert activity and the properties of spiral galaxies are independent of the environment.

  18. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

    PubMed Central

    Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

    2011-01-01

    High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

  19. A new approach for the assessment of temporal clustering of extratropical wind storms

    NASA Astrophysics Data System (ADS)

    Schuster, Mareike; Eddounia, Fadoua; Kuhnel, Ivan; Ulbrich, Uwe

    2017-04-01

    A widely-used methodology to assess the clustering of storms in a region is based on dispersion statistics of a simple homogeneous Poisson process. This clustering measure is determined by the ratio of the variance and the mean of the local storm statistics per grid point. Resulting values larger than 1, i.e. when the variance is larger than the mean, indicate clustering; while values lower than 1 indicate a sequencing of storms that is more regular than a random process. However, a disadvantage of this methodology is that the characteristics are valid for a pre-defined climatological time period, and it is not possible to identify a temporal variability of clustering. Also, the absolute value of the dispersion statistics is not particularly intuitive. We have developed an approach to describe temporal clustering of storms which offers a more intuitive comprehension, and at the same time allows to assess temporal variations. The approach is based on the local distribution of waiting times between the occurrence of two individual storm events, the former being computed through the post-processing of individual windstorm tracks which in turn are obtained by an objective tracking algorithm. Based on this distribution a threshold can be set, either by the waiting time expected from a random process or by a quantile of the observed distribution. Thus, it can be determined if two consecutive wind storm events count as part of a (temporal) cluster. We analyze extratropical wind storms in a reanalysis dataset and compare the results of the traditional clustering measure with our new methodology. We assess what range of clustering events (in terms of duration and frequency) is covered and identify if the historically known clustered seasons are detectable by the new clustering measure in the reanalysis.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ward, Lee H.; Laros, James H., III

    This paper describes a methodology for implementing disk-less cluster systems using the Network File System (NFS) that scales to thousands of nodes. This method has been successfully deployed and is currently in use on several production systems at Sandia National Labs. This paper will outline our methodology and implementation, discuss hardware and software considerations in detail and present cluster configurations with performance numbers for various management operations like booting.

  1. Scattering of clusters of spherical particles—Modeling and inverse problem solution in the Rayleigh-Gans approximation

    NASA Astrophysics Data System (ADS)

    Eliçabe, Guillermo E.

    2013-09-01

    In this work, an exact scattering model for a system of clusters of spherical particles, based on the Rayleigh-Gans approximation, has been parameterized in such a way that it can be solved in inverse form using Thikhonov Regularization to obtain the morphological parameters of the clusters. That is to say, the average number of particles per cluster, the size of the primary spherical units that form the cluster, and the Discrete Distance Distribution Function from which the z-average square radius of gyration of the system of clusters is obtained. The methodology is validated through a series of simulated and experimental examples of x-ray and light scattering that show that the proposed methodology works satisfactorily in unideal situations such as: presence of error in the measurements, presence of error in the model, and several types of unideallities present in the experimental cases.

  2. Quantitative determination of the clustered silicon concentration in substoichiometric silicon oxide layer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Spinella, Corrado; Bongiorno, Corrado; Nicotra, Giuseppe

    2005-07-25

    We present an analytical methodology, based on electron energy loss spectroscopy (EELS) and energy-filtered transmission electron microscopy, which allows us to quantify the clustered silicon concentration in annealed substoichiometric silicon oxide layers, deposited by plasma-enhanced chemical vapor deposition. The clustered Si volume fraction was deduced from a fit to the experimental EELS spectrum using a theoretical description proposed to calculate the dielectric function of a system of spherical particles of equal radii, located at random in a host material. The methodology allowed us to demonstrate that the clustered Si concentration is only one half of the excess Si concentration dissolvedmore » in the layer.« less

  3. Towards a methodology for cluster searching to provide conceptual and contextual "richness" for systematic reviews of complex interventions: case study (CLUSTER).

    PubMed

    Booth, Andrew; Harris, Janet; Croot, Elizabeth; Springett, Jane; Campbell, Fiona; Wilkins, Emma

    2013-09-28

    Systematic review methodologies can be harnessed to help researchers to understand and explain how complex interventions may work. Typically, when reviewing complex interventions, a review team will seek to understand the theories that underpin an intervention and the specific context for that intervention. A single published report from a research project does not typically contain this required level of detail. A review team may find it more useful to examine a "study cluster"; a group of related papers that explore and explain various features of a single project and thus supply necessary detail relating to theory and/or context.We sought to conduct a preliminary investigation, from a single case study review, of techniques required to identify a cluster of related research reports, to document the yield from such methods, and to outline a systematic methodology for cluster searching. In a systematic review of community engagement we identified a relevant project - the Gay Men's Task Force. From a single "key pearl citation" we conducted a series of related searches to find contextually or theoretically proximate documents. We followed up Citations, traced Lead authors, identified Unpublished materials, searched Google Scholar, tracked Theories, undertook ancestry searching for Early examples and followed up Related projects (embodied in the CLUSTER mnemonic). Our structured, formalised procedure for cluster searching identified useful reports that are not typically identified from topic-based searches on bibliographic databases. Items previously rejected by an initial sift were subsequently found to inform our understanding of underpinning theory (for example Diffusion of Innovations Theory), context or both. Relevant material included book chapters, a Web-based process evaluation, and peer reviewed reports of projects sharing a common ancestry. We used these reports to understand the context for the intervention and to explore explanations for its relative lack of success. Additional data helped us to challenge simplistic assumptions on the homogeneity of the target population. A single case study suggests the potential utility of cluster searching, particularly for reviews that depend on an understanding of context, e.g. realist synthesis. The methodology is transparent, explicit and reproducible. There is no reason to believe that cluster searching is not generalizable to other review topics. Further research should examine the contribution of the methodology beyond improved yield, to the final synthesis and interpretation, possibly by utilizing qualitative sensitivity analysis.

  4. Health-risk behaviour in Croatia.

    PubMed

    Bécue-Bertaut, Mónica; Kern, Josipa; Hernández-Maldonado, Maria-Luisa; Juresa, Vesna; Vuletic, Silvije

    2008-02-01

    To identify the health-risk behaviour of various homogeneous clusters of individuals. The study was conducted in 13 of the 20 Croatian counties and in Zagreb, the Croatian capital. In the first stage, general practices were selected in each county. The second-stage sample was created by drawing a random subsample of 10% of the patients registered at each selected general practice. The sample was divided into seven homogenous clusters using statistical methodology, combining multiple factor analysis with a hybrid clustering method. Seven homogeneous clusters were identified, three composed of males and four composed of females, based on statistically significant differences between selected characteristics (P<0.001). Although, in general, self-assessed health declined with age, significant variations were observed within specific age intervals. Higher levels of self-assessed health were associated with higher levels of education and/or socio-economic status. Many individuals, especially females, who self-reported poor health were heavy consumers of sleeping pills. Males and females reported different health-risk behaviours related to lifestyle, diet and use of the healthcare system. Heavy alcohol and tobacco use, unhealthy diet, risky physical activity and non-use of the healthcare system influenced self-assessed health in males. Females were slightly less satisfied with their health than males of the same age and educational level. Even highly educated females who took preventive healthcare tests and ate a healthy diet reported a less satisfactory self-assessed level of health than expected. Sociodemographic characteristics, life style, self-assessed health and use of the healthcare system were used in the identification of seven homogeneous population clusters. A comprehensive analysis of these clusters suggests health-related prevention and intervention efforts geared towards specific populations.

  5. A methodological study of genome-wide DNA methylation analyses using matched archival formalin-fixed paraffin embedded and fresh frozen breast tumors

    PubMed Central

    Yan, Li; Liu, Song; Tang, Li; Hu, Qiang; Morrison, Carl D.; Ambrosone, Christine B.; Higgins, Michael J.; Sucheston-Campbell, Lara E.

    2017-01-01

    Background DNA from archival formalin-fixed and paraffin embedded (FFPE) tissue is an invaluable resource for genome-wide methylation studies although concerns about poor quality may limit its use. In this study, we compared DNA methylation profiles of breast tumors using DNA from fresh-frozen (FF) tissues and three types of matched FFPE samples. Results For 9/10 patients, correlation and unsupervised clustering analysis revealed that the FF and FFPE samples were consistently correlated with each other and clustered into distinct subgroups. Greater than 84% of the top 100 loci previously shown to differentiate ER+ and ER– tumors in FF tissues were also FFPE DML. Weighted Correlation Gene Network Analyses (WCGNA) grouped the DML loci into 16 modules in FF tissue, with ~85% of the module membership preserved across tissue types. Materials and Methods Restored FFPE and matched FF samples were profiled using the Illumina Infinium HumanMethylation450K platform. Methylation levels (β-values) across all loci and the top 100 loci previously shown to differentiate tumors by estrogen receptor status (ER+ or ER−) in a larger FF study, were compared between matched FF and FFPE samples using Pearson's correlation, hierarchical clustering and WCGNA. Positive predictive values and sensitivity levels for detecting differentially methylated loci (DML) in FF samples were calculated in an independent FFPE cohort. Conclusions FFPE breast tumors samples show lower overall detection of DMLs versus FF, however FFPE and FF DMLs compare favorably. These results support the emerging consensus that the 450K platform can be employed to investigate epigenetics in large sets of archival FFPE tissues. PMID:28118602

  6. Stability-based validation of dietary patterns obtained by cluster analysis.

    PubMed

    Sauvageot, Nicolas; Schritz, Anna; Leite, Sonia; Alkerwi, Ala'a; Stranges, Saverio; Zannad, Faiez; Streel, Sylvie; Hoge, Axelle; Donneau, Anne-Françoise; Albert, Adelin; Guillaume, Michèle

    2017-01-14

    Cluster analysis is a data-driven method used to create clusters of individuals sharing similar dietary habits. However, this method requires specific choices from the user which have an influence on the results. Therefore, there is a need of an objective methodology helping researchers in their decisions during cluster analysis. The objective of this study was to use such a methodology based on stability of clustering solutions to select the most appropriate clustering method and number of clusters for describing dietary patterns in the NESCAV study (Nutrition, Environment and Cardiovascular Health), a large population-based cross-sectional study in the Greater Region (N = 2298). Clustering solutions were obtained with K-means, K-medians and Ward's method and a number of clusters varying from 2 to 6. Their stability was assessed with three indices: adjusted Rand index, Cramer's V and misclassification rate. The most stable solution was obtained with K-means method and a number of clusters equal to 3. The "Convenient" cluster characterized by the consumption of convenient foods was the most prevalent with 46% of the population having this dietary behaviour. In addition, a "Prudent" and a "Non-Prudent" patterns associated respectively with healthy and non-healthy dietary habits were adopted by 25% and 29% of the population. The "Convenient" and "Non-Prudent" clusters were associated with higher cardiovascular risk whereas the "Prudent" pattern was associated with a decreased cardiovascular risk. Associations with others factors showed that the choice of a specific dietary pattern is part of a wider lifestyle profile. This study is of interest for both researchers and public health professionals. From a methodological standpoint, we showed that using stability of clustering solutions could help researchers in their choices. From a public health perspective, this study showed the need of targeted health promotion campaigns describing the benefits of healthy dietary patterns.

  7. Hierarchical cluster analysis of technical replicates to identify interferents in untargeted mass spectrometry metabolomics.

    PubMed

    Caesar, Lindsay K; Kvalheim, Olav M; Cech, Nadja B

    2018-08-27

    Mass spectral data sets often contain experimental artefacts, and data filtering prior to statistical analysis is crucial to extract reliable information. This is particularly true in untargeted metabolomics analyses, where the analyte(s) of interest are not known a priori. It is often assumed that chemical interferents (i.e. solvent contaminants such as plasticizers) are consistent across samples, and can be removed by background subtraction from blank injections. On the contrary, it is shown here that chemical contaminants may vary in abundance across each injection, potentially leading to their misidentification as relevant sample components. With this metabolomics study, we demonstrate the effectiveness of hierarchical cluster analysis (HCA) of replicate injections (technical replicates) as a methodology to identify chemical interferents and reduce their contaminating contribution to metabolomics models. Pools of metabolites with varying complexity were prepared from the botanical Angelica keiskei Koidzumi and spiked with known metabolites. Each set of pools was analyzed in triplicate and at multiple concentrations using ultraperformance liquid chromatography coupled to mass spectrometry (UPLC-MS). Before filtering, HCA failed to cluster replicates in the data sets. To identify contaminant peaks, we developed a filtering process that evaluated the relative peak area variance of each variable within triplicate injections. These interferent peaks were found across all samples, but did not show consistent peak area from injection to injection, even when evaluating the same chemical sample. This filtering process identified 128 ions that appear to originate from the UPLC-MS system. Data sets collected for a high number of pools with comparatively simple chemical composition were highly influenced by these chemical interferents, as were samples that were analyzed at a low concentration. When chemical interferent masses were removed, technical replicates clustered in all data sets. This work highlights the importance of technical replication in mass spectrometry-based studies, and presents a new application of HCA as a tool for evaluating the effectiveness of data filtering prior to statistical analysis. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. THE RED-SEQUENCE CLUSTER SURVEY-2 (RCS-2): SURVEY DETAILS AND PHOTOMETRIC CATALOG CONSTRUCTION

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbank, David G.; Gladders, M. D.; Yee, H. K. C.

    2011-03-15

    The second Red-sequence Cluster Survey (RCS-2) is a {approx}1000 deg{sup 2}, multi-color imaging survey using the square-degree imager, MegaCam, on the Canada-France-Hawaii Telescope. It is designed to detect clusters of galaxies over the redshift range 0.1 {approx}< z {approx}< 1. The primary aim is to build a statistically complete, large ({approx}10{sup 4}) sample of clusters, covering a sufficiently long redshift baseline to be able to place constraints on cosmological parameters via the evolution of the cluster mass function. Other main science goals include building a large sample of high surface brightness, strongly gravitationally lensed arcs associated with these clusters, andmore » an unprecedented sample of several tens of thousands of galaxy clusters and groups, spanning a large range of halo mass, with which to study the properties and evolution of their member galaxies. This paper describes the design of the survey and the methodology for acquiring, reducing, and calibrating the data for the production of high-precision photometric catalogs. We describe the method for calibrating our griz imaging data using the colors of the stellar locus and overlapping Two Micron All Sky Survey photometry. This yields an absolute accuracy of <0.03 mag on any color and {approx}0.05 mag in the r-band magnitude, verified with respect to the Sloan Digital Sky Survey (SDSS). Our astrometric calibration is accurate to <<0.''3 from comparison with SDSS positions. RCS-2 reaches average 5{sigma} point-source limiting magnitudes of griz = [24.4, 24.3, 23.7, 22.8], approximately 1-2 mag deeper than the SDSS. Due to the queue-scheduled nature of the observations, the data are highly uniform and taken in excellent seeing, mostly FWHM {approx}< 0.''7 in the r band. In addition to the main science goals just described, these data form the basis for a number of other planned and ongoing projects (including the WiggleZ survey), making RCS-2 an important next-generation imaging survey.« less

  9. Consumption of junk foods by school-aged children in rural Himachal Pradesh, India.

    PubMed

    Gupta, Aakriti; Kapil, Umesh; Singh, Gajendra

    2018-01-01

    There has been an increase in the consumption of junk food (JF) among school-aged children (SAC) possibly leading to obesity and diet-related diseases among them. We do not have evidence on consumption of JF in rural areas; hence, we conducted a study to assess the consumption of JF by SAC in rural, Himachal Pradesh. A total of 425 children in the age group of 12-18 years studying in 30 government schools (clusters) were included. The clusters were selected using population proportionate to size sampling methodology. We found high prevalence (36%) of consumption of JF among SAC during the last 24 h. Efforts should be taken to reduce the consumption of JF by promotion of healthy dietary habits and educating children about the ill effects of JF.

  10. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    NASA Astrophysics Data System (ADS)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  11. A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis

    ERIC Educational Resources Information Center

    DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim

    2004-01-01

    This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…

  12. Generalization of Clustering Coefficients to Signed Correlation Networks

    PubMed Central

    Costantini, Giulio; Perugini, Marco

    2014-01-01

    The recent interest in network analysis applications in personality psychology and psychopathology has put forward new methodological challenges. Personality and psychopathology networks are typically based on correlation matrices and therefore include both positive and negative edge signs. However, some applications of network analysis disregard negative edges, such as computing clustering coefficients. In this contribution, we illustrate the importance of the distinction between positive and negative edges in networks based on correlation matrices. The clustering coefficient is generalized to signed correlation networks: three new indices are introduced that take edge signs into account, each derived from an existing and widely used formula. The performances of the new indices are illustrated and compared with the performances of the unsigned indices, both on a signed simulated network and on a signed network based on actual personality psychology data. The results show that the new indices are more resistant to sample variations in correlation networks and therefore have higher convergence compared with the unsigned indices both in simulated networks and with real data. PMID:24586367

  13. MUSIC-Expected maximization gaussian mixture methodology for clustering and detection of task-related neuronal firing rates.

    PubMed

    Ortiz-Rosario, Alexis; Adeli, Hojjat; Buford, John A

    2017-01-15

    Researchers often rely on simple methods to identify involvement of neurons in a particular motor task. The historical approach has been to inspect large groups of neurons and subjectively separate neurons into groups based on the expertise of the investigator. In cases where neuron populations are small it is reasonable to inspect these neuronal recordings and their firing rates carefully to avoid data omissions. In this paper, a new methodology is presented for automatic objective classification of neurons recorded in association with behavioral tasks into groups. By identifying characteristics of neurons in a particular group, the investigator can then identify functional classes of neurons based on their relationship to the task. The methodology is based on integration of a multiple signal classification (MUSIC) algorithm to extract relevant features from the firing rate and an expectation-maximization Gaussian mixture algorithm (EM-GMM) to cluster the extracted features. The methodology is capable of identifying and clustering similar firing rate profiles automatically based on specific signal features. An empirical wavelet transform (EWT) was used to validate the features found in the MUSIC pseudospectrum and the resulting signal features captured by the methodology. Additionally, this methodology was used to inspect behavioral elements of neurons to physiologically validate the model. This methodology was tested using a set of data collected from awake behaving non-human primates. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Massive open star clusters using the VVV survey. I. Presentation of the data and description of the approach

    NASA Astrophysics Data System (ADS)

    Chené, A.-N.; Borissova, J.; Clarke, J. R. A.; Bonatto, C.; Majaess, D. J.; Moni Bidin, C.; Sale, S. E.; Mauro, F.; Kurtev, R.; Baume, G.; Feinstein, C.; Ivanov, V. D.; Geisler, D.; Catelan, M.; Minniti, D.; Lucas, P.; de Grijs, R.; Kumar, M. S. N.

    2012-09-01

    Context. The ESO Public Survey "VISTA Variables in the Vía Láctea" (VVV) provides deep multi-epoch infrared observations for unprecedented 562 sq. degrees of the Galactic bulge, and adjacent regions of the disk. Aims: The VVV observations will foster the construction of a sample of Galactic star clusters with reliable and homogeneously derived physical parameters (e.g., age, distance, and mass, etc.). In this first paper in a series, the methodology employed to establish cluster parameters for the envisioned database are elaborated upon by analysing four known young open clusters: Danks 1, Danks 2, RCW 79, and DBS 132. The analysis offers a first glimpse of the information that can be gleaned from the VVV observations for clusters in the final database. Methods: Wide-field, deep JHKs VVV observations, combined with new infrared spectroscopy, are employed to constrain fundamental parameters for a subset of clusters. Results: Results are inferred from VVV near-infrared photometry and numerous low resolution spectra (typically more than 10 per cluster). The high quality of the spectra and the deep wide-field VVV photometry enables us to precisely and independently determine the characteristics of the clusters studied, which we compare to previous determinations. An anomalous reddening law in the direction of the Danks clusters is found, specifically E(J - H)/E(H - Ks) = 2.20 ± 0.06, which exceeds published values for the inner Galaxy. The G305 star forming complex, which includes the Danks clusters, lies beyond the Sagittarius-Carina spiral arm and occupies the Centaurus arm. Finally, the first deep infrared colour-magnitude diagram of RCW 79 is presented, which reveals a sizeable pre-main sequence population. A list of candidate variable stars in G305 region is reported. Conclusions: This study demonstrates the strength of the dataset and methodology employed, and constitutes the first step of a broader study which shall include reliable parameters for a sizeable number of poorly characterised and/or newly discovered clusters. Based on observations made with NTT telescope at the La Silla Observatory, ESO, under programme ID 087.D-0490A, and with the Clay telescope at the Las Campanas Observatory under programme CN2011A-086. Also based on data from the VVV survey observed under program ID 172.B-2002.Tables 1, 5 and 6 are available in electronic form at http://www.aanda.org

  15. Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering.

    PubMed

    Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina

    2015-03-01

    Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Knowledge discovery about quality of life changes of spinal cord injury patients: clustering based on rules by states.

    PubMed

    Gibert, Karina; García-Rudolph, Alejandro; Curcoll, Lluïsa; Soler, Dolors; Pla, Laura; Tormos, José María

    2009-01-01

    In this paper, an integral Knowledge Discovery Methodology, named Clustering based on rules by States, which incorporates artificial intelligence (AI) and statistical methods as well as interpretation-oriented tools, is used for extracting knowledge patterns about the evolution over time of the Quality of Life (QoL) of patients with Spinal Cord Injury. The methodology incorporates the interaction with experts as a crucial element with the clustering methodology to guarantee usefulness of the results. Four typical patterns are discovered by taking into account prior expert knowledge. Several hypotheses are elaborated about the reasons for psychological distress or decreases in QoL of patients over time. The knowledge discovery from data (KDD) approach turns out, once again, to be a suitable formal framework for handling multidimensional complexity of the health domains.

  17. Reducing the Matrix Effect in Organic Cluster SIMS Using Dynamic Reactive Ionization

    NASA Astrophysics Data System (ADS)

    Tian, Hua; Wucher, Andreas; Winograd, Nicholas

    2016-12-01

    Dynamic reactive ionization (DRI) utilizes a reactive molecule, HCl, which is doped into an Ar cluster projectile and activated to produce protons at the bombardment site on the cold sample surface with the presence of water. The methodology has been shown to enhance the ionization of protonated molecular ions and to reduce salt suppression in complex biomatrices. In this study, we further examine the possibility of obtaining improved quantitation with DRI during depth profiling of thin films. Using a trehalose film as a model system, we are able to define optimal DRI conditions for depth profiling. Next, the strategy is applied to a multilayer system consisting of the polymer antioxidants Irganox 1098 and 1010. These binary mixtures have demonstrated large matrix effects, making quantitative SIMS measurement not feasible. Systematic comparisons of depth profiling of this multilayer film between directly using GCIB, and under DRI conditions, show that the latter enhances protonated ions for both components by 4- to 15-fold, resulting in uniform depth profiling in positive ion mode and almost no matrix effect in negative ion mode. The methodology offers a new strategy to tackle the matrix effect and should lead to improved quantitative measurement using SIMS.

  18. Net-zero Building Cluster Simulations and On-line Energy Forecasting for Adaptive and Real-Time Control and Decisions

    NASA Astrophysics Data System (ADS)

    Li, Xiwang

    Buildings consume about 41.1% of primary energy and 74% of the electricity in the U.S. Moreover, it is estimated by the National Energy Technology Laboratory that more than 1/4 of the 713 GW of U.S. electricity demand in 2010 could be dispatchable if only buildings could respond to that dispatch through advanced building energy control and operation strategies and smart grid infrastructure. In this study, it is envisioned that neighboring buildings will have the tendency to form a cluster, an open cyber-physical system to exploit the economic opportunities provided by a smart grid, distributed power generation, and storage devices. Through optimized demand management, these building clusters will then reduce overall primary energy consumption and peak time electricity consumption, and be more resilient to power disruptions. Therefore, this project seeks to develop a Net-zero building cluster simulation testbed and high fidelity energy forecasting models for adaptive and real-time control and decision making strategy development that can be used in a Net-zero building cluster. The following research activities are summarized in this thesis: 1) Development of a building cluster emulator for building cluster control and operation strategy assessment. 2) Development of a novel building energy forecasting methodology using active system identification and data fusion techniques. In this methodology, a systematic approach for building energy system characteristic evaluation, system excitation and model adaptation is included. The developed methodology is compared with other literature-reported building energy forecasting methods; 3) Development of the high fidelity on-line building cluster energy forecasting models, which includes energy forecasting models for buildings, PV panels, batteries and ice tank thermal storage systems 4) Small scale real building validation study to verify the performance of the developed building energy forecasting methodology. The outcomes of this thesis can be used for building cluster energy forecasting model development and model based control and operation optimization. The thesis concludes with a summary of the key outcomes of this research, as well as a list of recommendations for future work.

  19. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression

    PubMed Central

    Poole, William; Leinonen, Kalle; Shmulevich, Ilya

    2017-01-01

    Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C. PMID:28170390

  20. Multiscale mutation clustering algorithm identifies pan-cancer mutational clusters associated with pathway-level changes in gene expression.

    PubMed

    Poole, William; Leinonen, Kalle; Shmulevich, Ilya; Knijnenburg, Theo A; Bernard, Brady

    2017-02-01

    Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.

  1. The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies.

    PubMed

    Prosperi, Mattia C F; De Luca, Andrea; Di Giambenedetto, Simona; Bracciale, Laura; Fabbiani, Massimiliano; Cauda, Roberto; Salemi, Marco

    2010-10-25

    Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.

  2. Cross Section High Resolution Imaging of Polymer-Based Materials

    NASA Astrophysics Data System (ADS)

    Delaportas, D.; Aden, P.; Muckle, C.; Yeates, S.; Treutlein, R.; Haq, S.; Alexandrou, I.

    This paper describes a methodology for preparing cross sections of organic layers suitable for transmission electron microscopy (TEM) at high resolution. Our principal aim is to prepare samples that are tough enough to allow the slicing into sub-150 nm sections. We also need strong contrast at the organic layer area to make it identifiable during TEM. Our approach is to deposit organic layers on flexible substrates and prepare thin cross sections using ultra-microtomy. We sandwich the organic layer between two metal thin films in order to isolate it and improve contrast. Our methodology is used to study the microstructure of polymer/nanotube composites, allowing us to accurately measure the organic layer thickness, determine nanotube dispersion and assess the effect of nanotube clustering on film structural stability.

  3. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  4. The relationship between personality and attainment in 16-19-year-old students in a sixth form college: II: Self-perception, gender and attainment.

    PubMed

    Summerfield, M; Youngman, M

    1999-06-01

    A related paper (Summerfield & Youngman, 1999) has described the development of a scale, the Student Self-Perception Scale (SSPS) designed to explore the relationship between academic self-concept, attainment and personality in sixth form college students. The study aimed to identify groups of students exhibiting varying patterns of relationship using a range of measures including the SSPS. Issues of gender and also examined. The samples comprised a pilot sample of 152 students (aged 16-17 years from two sixth form colleges) and a main sample of 364 students (mean age, 16 yrs 10 mths range 16:0 to 18:6 years, from one sixth form college). The main sample included similar numbers of male and female students (46% male, 54% female) and ethnic minority students comprised 14% of this sample. Data comprised responses to two personality measures (the SSPS, Summerfield, 1995, and the Nowicki-Strickland Locus of Control Scale, Nowicki & Strickland, 1973), various student and tutor estimates of success, and performance data from college records. Students were classified using relocation cluster analysis and cluster differences verified using discriminant function analysis. Thirty outcome models were tested using covariance regression analysis. Eight distinct and interpretable groups, consistent with other research, were identified but the hypothesis of a positive, linear relationship between mastery and academic attainment was not sustained without qualification. Previous attainment was the major determinant of final performance. Gender variations were detected on the personality measures, particularly Confidence of outcomes, Prediction discrepancy, Passivity, Mastery, Dependency and Locus of control, and these were implicated in the cluster characteristics. The results suggest that a non-linear methodology may be required to isolate relationships between self-concept, personality and attainment, especially where gender effects may exist.

  5. Forensic discrimination of blue ballpoint pens on documents by laser ablation inductively coupled plasma mass spectrometry and multivariate analysis.

    PubMed

    Alamilla, Francisco; Calcerrada, Matías; García-Ruiz, Carmen; Torre, Mercedes

    2013-05-10

    The differentiation of blue ballpoint pen inks written on documents through an LA-ICP-MS methodology is proposed. Small common office paper portions containing ink strokes from 21 blue pens of known origin were cut and measured without any sample preparation. In a first step, Mg, Ca and Sr were proposed as internal standards (ISs) and used in order to normalize elemental intensities and subtract background signals from the paper. Then, specific criteria were designed and employed to identify target elements (Li, V, Mn, Co, Ni, Cu, Zn, Zr, Sn, W and Pb) which resulted independent of the IS chosen in a 98% of the cases and allowed a qualitative clustering of the samples. In a second step, an elemental-related ratio (ink ratio) based on the targets previously identified was used to obtain mass independent intensities and perform pairwise comparisons by means of multivariate statistical analyses (MANOVA, Tukey's HSD and T2 Hotelling). This treatment improved the discrimination power (DP) and provided objective results, achieving a complete differentiation among different brands and a partial differentiation within pen inks from the same brands. The designed data treatment, together with the use of multivariate statistical tools, represents an easy and useful tool for differentiating among blue ballpoint pen inks, with hardly sample destruction and without the need for methodological calibrations, being its use potentially advantageous from a forensic-practice standpoint. To test the procedure, it was applied to analyze real handwritten questioned contracts, previously studied by the Department of Forensic Document Exams of the Criminalistics Service of Civil Guard (Spain). The results showed that all questioned ink entries were clustered in the same group, being those different from the remaining ink on the document. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  6. Spatial pattern recognition of seismic events in South West Colombia

    NASA Astrophysics Data System (ADS)

    Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber

    2013-09-01

    Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.

  7. Compulsive buying disorder clustering based on sex, age, onset and personality traits.

    PubMed

    Granero, Roser; Fernández-Aranda, Fernando; Baño, Marta; Steward, Trevor; Mestre-Bach, Gemma; Del Pino-Gutiérrez, Amparo; Moragas, Laura; Mallorquí-Bagué, Núria; Aymamí, Neus; Goméz-Peña, Mónica; Tárrega, Salomé; Menchón, José M; Jiménez-Murcia, Susana

    2016-07-01

    In spite of the revived interest in compulsive buying disorder (CBD), its classification into the contemporary nosologic systems continues to be debated, and scarce studies have addressed heterogeneity in the clinical phenotype through methodologies based on a person-centered approach. To identify empirical clusters of CBD employing personality traits, as well as patients' sex, age and the age of CBD onset as indicators. An agglomerative hierarchical clustering method defining a combination of the Schwarz Bayesian Information Criterion and log-likelihood was used. Three clusters were identified in a sample of n=110 patients attending a specialized CBD unit a) "male compulsive buyers" reported the highest prevalence of comorbid gambling disorder and the lowest levels of reward dependence; b) "female low-dysfunctional" mainly included employed women, with the highest level of education, the oldest age of onset, the lowest scores in harm avoidance and the highest levels of persistence, self-directedness and cooperativeness; and c) "female highly-dysfunctional" with the youngest age of onset, the highest levels of comorbid psychopathology and harm avoidance, and the lowest score in self-directedness. Sociodemographic characteristics and personality traits can be used to determine CBD clusters which represent different clinical subtypes. These subtypes should be considered when developing assessment instruments, preventive programs and treatment interventions. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Sampling Molecular Conformers in Solution with Quantum Mechanical Accuracy at a Nearly Molecular-Mechanics Cost.

    PubMed

    Rosa, Marta; Micciarelli, Marco; Laio, Alessandro; Baroni, Stefano

    2016-09-13

    We introduce a method to evaluate the relative populations of different conformers of molecular species in solution, aiming at quantum mechanical accuracy, while keeping the computational cost at a nearly molecular-mechanics level. This goal is achieved by combining long classical molecular-dynamics simulations to sample the free-energy landscape of the system, advanced clustering techniques to identify the most relevant conformers, and thermodynamic perturbation theory to correct the resulting populations, using quantum-mechanical energies from density functional theory. A quantitative criterion for assessing the accuracy thus achieved is proposed. The resulting methodology is demonstrated in the specific case of cyanin (cyanidin-3-glucoside) in water solution.

  9. Molecular Typing of Mycobacterium Tuberculosis Complex by 24-Locus Based MIRU-VNTR Typing in Conjunction with Spoligotyping to Assess Genetic Diversity of Strains Circulating in Morocco

    PubMed Central

    Bouklata, Nada; Supply, Philip; Jaouhari, Sanae; Charof, Reda; Seghrouchni, Fouad; Sadki, Khalid; El Achhab, Youness; Nejjari, Chakib; Filali-Maltouf, Abdelkarim

    2015-01-01

    Background Standard 24-locus Mycobacterial Interspersed Repetitive Unit Variable Number Tandem Repeat (MIRU-VNTR) typing allows to get an improved resolution power for tracing TB transmission and predicting different strain (sub) lineages in a community. Methodology During 2010–2012, a total of 168 Mycobacterium tuberculosis Complex (MTBC) isolates were collected by cluster sampling from 10 different Moroccan cities, and centralized by the National Reference Laboratory of Tuberculosis over the study period. All isolates were genotyped using spoligotyping, and a subset of 75 was genotyped using 24-locus based MIRU-VNTR typing, followed by first line drug susceptibility testing. Corresponding strain lineages were predicted using MIRU-VNTRplus database. Principal Findings Spoligotyping resulted in 137 isolates in 18 clusters (2–50 isolates per cluster: clustering rate of 81.54%) corresponding to a SIT number in the SITVIT database, while 31(18.45%) patterns were unique of which 10 were labelled as “unknown” according to the same database. The most prevalent spoligotype family was LAM; (n = 81 or 48.24% of isolates, dominated by SIT42, n = 49), followed by Haarlem (23.80%), T superfamily (15.47%), >Beijing (2.97%), > U clade (2.38%) and S clade (1.19%). Subsequent 24-Locus MIRU-VNTR typing identified 64 unique types and 11 isolates in 5 clusters (2 to 3isolates per cluster), substantially reducing clusters defined by spoligotyping only. The single cluster of three isolates corresponded to two previously treated MDR-TB cases and one new MDR-TB case known to be contact a same index case and belonging to a same family, albeit residing in 3 different administrative regions. MIRU-VNTR loci 4052, 802, 2996, 2163b, 3690, 1955, 424, 2531, 2401 and 960 were highly discriminative in our setting (HGDI >0.6). Conclusions 24-locus MIRU-VNTR typing can substantially improve the resolution of large clusters initially defined by spoligotyping alone and predominating in Morocco, and could therefore be used to better study tuberculosis transmission in a population-based, multi-year sample context. PMID:26285026

  10. Enrichment 2.0 Gifted and Talented Education for the 21st Century

    ERIC Educational Resources Information Center

    Eckstein, Michelle

    2009-01-01

    Enrichment clusters, a component of the Schoolwide Enrichment Model, are multigrade investigative groups based on constructivist learning methodology. Enrichment clusters are organized around major disciplines, interdisciplinary themes, or cross-disciplinary topics. Within clusters, students are grouped across grade levels by interests and focused…

  11. Finding clusters of similar events within clinical incident reports: a novel methodology combining case based reasoning and information retrieval

    PubMed Central

    Tsatsoulis, C; Amthauer, H

    2003-01-01

    A novel methodological approach for identifying clusters of similar medical incidents by analyzing large databases of incident reports is described. The discovery of similar events allows the identification of patterns and trends, and makes possible the prediction of future events and the establishment of barriers and best practices. Two techniques from the fields of information science and artificial intelligence have been integrated—namely, case based reasoning and information retrieval—and very good clustering accuracies have been achieved on a test data set of incident reports from transfusion medicine. This work suggests that clustering should integrate the features of an incident captured in traditional form based records together with the detailed information found in the narrative included in event reports. PMID:14645892

  12. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods aremore » applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation methodology. A previously developed B-spline active rays segmentation method was also considered for comparison purposes. Results: Interobserver and intraobserver segmentation agreements (median and [25%, 75%] quartile range) were substantial with respect to the distance metrics HDIST{sub cluster} (2.3 [1.8, 2.9] and 2.5 [2.1, 3.2] pixels) and AMINDIST{sub cluster} (0.8 [0.6, 1.0] and 1.0 [0.8, 1.2] pixels), while moderate with respect to AOM{sub cluster} (0.64 [0.55, 0.71] and 0.59 [0.52, 0.66]). The proposed segmentation method outperformed (0.80 ± 0.04) statistically significantly (Mann-Whitney U-test, p < 0.05) the B-spline active rays segmentation method (0.69 ± 0.04), suggesting the significance of the proposed semiautomated method. Conclusions: Results indicate a reliable semiautomated segmentation method for MC clusters offered by deformable models, which could be utilized in MC cluster quantitative image analysis.« less

  13. cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud.

    PubMed

    Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren

    2016-01-15

    : One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. hodor_paul@bah.com. © The Author 2015. Published by Oxford University Press.

  14. cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud

    PubMed Central

    Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren

    2016-01-01

    Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Availability and implementation: Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. Contact: hodor_paul@bah.com PMID:26428290

  15. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    PubMed

    Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

    2015-01-01

    To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  16. Investigations of stacking fault density in perpendicular recording media

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piramanayagam, S. N., E-mail: prem-SN@dsi.a-star.edu.sg; Varghese, Binni; Yang, Yi

    In magnetic recording media, the grains or clusters reverse their magnetization over a range of reversal field, resulting in a switching field distribution. In order to achieve high areal densities, it is desirable to understand and minimize such a distribution. Clusters of grains which contain stacking faults (SF) or fcc phase have lower anisotropy, an order lower than those without them. It is believed that such low anisotropy regions reverse their magnetization at a much lower reversal field than the rest of the material with a larger anisotropy. Such clusters/grains cause recording performance deterioration, such as adjacent track erasure andmore » dc noise. Therefore, the observation of clusters that reverse at very low reversal fields (nucleation sites, NS) could give information on the noise and the adjacent track erasure. Potentially, the observed clusters could also provide information on the SF. In this paper, we study the reversal of nucleation sites in granular perpendicular media based on a magnetic force microscope (MFM) methodology and validate the observations with high resolution cross-section transmission electron microscopy (HRTEM) measurements. Samples, wherein a high anisotropy CoPt layer was introduced to control the NS or SF in a systematic way, were evaluated by MFM, TEM, and magnetometry. The magnetic properties indicated that the thickness of the CoPt layer results in an increase of nucleation sites. TEM measurements indicated a correlation between the thickness of CoPt layer and the stacking fault density. A clear correlation was also observed between the MFM results, TEM observations, and the coercivity and nucleation field of the samples, validating the effectiveness of the proposed method in evaluating the nucleation sites which potentially arise from stacking faults.« less

  17. Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora

    PubMed Central

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.

    2009-01-01

    Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601

  18. Assessment of cluster yield components by image analysis.

    PubMed

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  19. Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

    PubMed

    Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

    2016-01-01

    Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  20. The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

    NASA Astrophysics Data System (ADS)

    Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

    2012-07-01

    We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.

  1. Application of adaptive cluster sampling to low-density populations of freshwater mussels

    USGS Publications Warehouse

    Smith, D.R.; Villella, R.F.; Lemarie, D.P.

    2003-01-01

    Freshwater mussels appear to be promising candidates for adaptive cluster sampling because they are benthic macroinvertebrates that cluster spatially and are frequently found at low densities. We applied adaptive cluster sampling to estimate density of freshwater mussels at 24 sites along the Cacapon River, WV, where a preliminary timed search indicated that mussels were present at low density. Adaptive cluster sampling increased yield of individual mussels and detection of uncommon species; however, it did not improve precision of density estimates. Because finding uncommon species, collecting individuals of those species, and estimating their densities are important conservation activities, additional research is warranted on application of adaptive cluster sampling to freshwater mussels. However, at this time we do not recommend routine application of adaptive cluster sampling to freshwater mussel populations. The ultimate, and currently unanswered, question is how to tell when adaptive cluster sampling should be used, i.e., when is a population sufficiently rare and clustered for adaptive cluster sampling to be efficient and practical? A cost-effective procedure needs to be developed to identify biological populations for which adaptive cluster sampling is appropriate.

  2. Quadratic partial eigenvalue assignment in large-scale stochastic dynamic systems for resilient and economic design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Das, Sonjoy; Goswami, Kundan; Datta, Biswa N.

    2014-12-10

    Failure of structural systems under dynamic loading can be prevented via active vibration control which shifts the damped natural frequencies of the systems away from the dominant range of loading spectrum. The damped natural frequencies and the dynamic load typically show significant variations in practice. A computationally efficient methodology based on quadratic partial eigenvalue assignment technique and optimization under uncertainty has been formulated in the present work that will rigorously account for these variations and result in an economic and resilient design of structures. A novel scheme based on hierarchical clustering and importance sampling is also developed in this workmore » for accurate and efficient estimation of probability of failure to guarantee the desired resilience level of the designed system. Numerical examples are presented to illustrate the proposed methodology.« less

  3. Using Cluster Analysis for Data Mining in Educational Technology Research

    ERIC Educational Resources Information Center

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  4. DYNER: A DYNamic ClustER for Education and Research

    ERIC Educational Resources Information Center

    Kehagias, Dimitris; Grivas, Michael; Mamalis, Basilis; Pantziou, Grammati

    2006-01-01

    Purpose: The purpose of this paper is to evaluate the use of a non-expensive dynamic computing resource, consisting of a Beowulf class cluster and a NoW, as an educational and research infrastructure. Design/methodology/approach: Clusters, built using commodity-off-the-shelf (COTS) hardware components and free, or commonly used, software, provide…

  5. Statistical analysis of atom probe data: detecting the early stages of solute clustering and/or co-segregation.

    PubMed

    Hyde, J M; Cerezo, A; Williams, T J

    2009-04-01

    Statistical analysis of atom probe data has improved dramatically in the last decade and it is now possible to determine the size, the number density and the composition of individual clusters or precipitates such as those formed in reactor pressure vessel (RPV) steels during irradiation. However, the characterisation of the onset of clustering or co-segregation is more difficult and has traditionally focused on the use of composition frequency distributions (for detecting clustering) and contingency tables (for detecting co-segregation). In this work, the authors investigate the possibility of directly examining the neighbourhood of each individual solute atom as a means of identifying the onset of solute clustering and/or co-segregation. The methodology involves comparing the mean observed composition around a particular type of solute with that expected from the overall composition of the material. The methodology has been applied to atom probe data obtained from several irradiated RPV steels. The results show that the new approach is more sensitive to fine scale clustering and co-segregation than that achievable using composition frequency distribution and contingency table analyses.

  6. National sample survey to assess the new case disease burden of leprosy in India

    PubMed Central

    Katoch, Kiran; Aggarwal, Abha; Yadav, Virendra Singh; Pandey, Arvind

    2017-01-01

    A national sample survey of leprosy was undertaken in partnership with Indian Council of Medical Research (ICMR) institutions, National Leprosy Eradication Programme (NLEP), Panchayati Raj members, and treated leprosy patients to detect new cases of leprosy in India. The objectives of the survey were to estimate the new leprosy case load; record both Grade 1 and Grade 2 disabilities in the new cases; and to assess the magnitude of stigma and discrimination prevalent in the society. A cluster based, cross-sectional survey involving all States was used for the door-to-door survey using inverse sampling methodology. Rural and urban clusters were sampled separately. The population screened for detecting 28 new cases in rural and 30 in urban clusters was enumerated, recorded and analyzed. Data capture and analysis in different schedules were the main tools used. For quality control three tiers of experts were utilized for the confirmation of cases and disabilities. Self-stigma was assessed in more than half of the total new patients detected with disabilities by the approved questionnaire. A different questionnaire was used to assess the stigma in the community. A population of 14,725,525 (10,302,443 rural; 4,423,082 urban) was screened and 2161 new cases - 1300 paucibacillary (PB) and 861 multibacillary (MB) were detected. New case estimates for leprosy was 330,346 (95% Confidence limits, 287,445-380,851). Disabilities observed in these cases were 2.05/100,000 population and 13.9 per cent (302/2161) in new cases. Self-stigma in patients with disabilities was reduced, and the patients were well accepted by the spouse, neighbour, at workplace and in social functions. PMID:29512601

  7. RosettaAntibodyDesign (RAbD): A general framework for computational antibody design

    PubMed Central

    Adolf-Bryfogle, Jared; Kalyuzhniy, Oleks; Kubitz, Michael; Hu, Xiaozhen; Adachi, Yumiko; Schief, William R.

    2018-01-01

    A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228–256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody–antigen complexes, using two design strategies—optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody–antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters. PMID:29702641

  8. RosettaAntibodyDesign (RAbD): A general framework for computational antibody design.

    PubMed

    Adolf-Bryfogle, Jared; Kalyuzhniy, Oleks; Kubitz, Michael; Weitzner, Brian D; Hu, Xiaozhen; Adachi, Yumiko; Schief, William R; Dunbrack, Roland L

    2018-04-01

    A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228-256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody-antigen complexes, using two design strategies-optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody-antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters.

  9. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.

    PubMed

    Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra

    2016-11-20

    The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. X-ray versus infrared selection of distant galaxy clusters: A case study using the XMM-LSS and SpARCS cluster samples

    NASA Astrophysics Data System (ADS)

    Willis, J. P.; Ramos-Ceja, M. E.; Muzzin, A.; Pacaud, F.; Yee, H. K. C.; Wilson, G.

    2018-04-01

    We present a comparison of two samples of z > 0.8 galaxy clusters selected using different wavelength-dependent techniques and examine the physical differences between them. We consider 18 clusters from the X-ray selected XMM-LSS distant cluster survey and 92 clusters from the optical-MIR selected SpARCS cluster survey. Both samples are selected from the same approximately 9 square degree sky area and we examine them using common XMM-Newton, Spitzer-SWIRE and CFHT Legacy Survey data. Clusters from each sample are compared employing aperture measures of X-ray and MIR emission. We divide the SpARCS distant cluster sample into three sub-samples: a) X-ray bright, b) X-ray faint, MIR bright, and c) X-ray faint, MIR faint clusters. We determine that X-ray and MIR selected clusters display very similar surface brightness distributions of galaxy MIR light. In addition, the average location and amplitude of the galaxy red sequence as measured from stacked colour histograms is very similar in the X-ray and MIR-selected samples. The sub-sample of X-ray faint, MIR bright clusters displays a distribution of BCG-barycentre position offsets which extends to higher values than all other samples. This observation indicates that such clusters may exist in a more disturbed state compared to the majority of the distant cluster population sampled by XMM-LSS and SpARCS. This conclusion is supported by stacked X-ray images for the X-ray faint, MIR bright cluster sub-sample that display weak, centrally-concentrated X-ray emission, consistent with a population of growing clusters accreting from an extended envelope of material.

  11. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

    PubMed Central

    Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

    2015-01-01

    Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888

  12. Dark Energy Survey Year 1 Results: galaxy mock catalogues for BAO

    NASA Astrophysics Data System (ADS)

    Avila, S.; Crocce, M.; Ross, A. J.; García-Bellido, J.; Percival, W. J.; Banik, N.; Camacho, H.; Kokron, N.; Chan, K. C.; Andrade-Oliveira, F.; Gomes, R.; Gomes, D.; Lima, M.; Rosenfeld, R.; Salvador, A. I.; Friedrich, O.; Abdalla, F. B.; Annis, J.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Carrasco Kind, M.; Carretero, J.; Castander, F. J.; Cunha, C. E.; da Costa, L. N.; Davis, C.; De Vicente, J.; Doel, P.; Fosalba, P.; Frieman, J.; Gerdes, D. W.; Gruen, D.; Gruendl, R. A.; Gutierrez, G.; Hartley, W. G.; Hollowood, D.; Honscheid, K.; James, D. J.; Kuehn, K.; Kuropatkin, N.; Miquel, R.; Plazas, A. A.; Sanchez, E.; Scarpine, V.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Walker, A. R.; Dark Energy Survey Collaboration

    2018-05-01

    Mock catalogues are a crucial tool in the analysis of galaxy surveys data, both for the accurate computation of covariance matrices, and for the optimisation of analysis methodology and validation of data sets. In this paper, we present a set of 1800 galaxy mock catalogues designed to match the Dark Energy Survey Year-1 BAO sample (Crocce et al. 2017) in abundance, observational volume, redshift distribution and uncertainty, and redshift dependent clustering. The simulated samples were built upon HALOGEN (Avila et al. 2015) halo catalogues, based on a 2LPT density field with an empirical halo bias. For each of them, a lightcone is constructed by the superposition of snapshots in the redshift range 0.45 < z < 1.4. Uncertainties introduced by so-called photometric redshifts estimators were modelled with a double-skewed-Gaussian curve fitted to the data. We populate halos with galaxies by introducing a hybrid Halo Occupation Distribution - Halo Abundance Matching model with two free parameters. These are adjusted to achieve a galaxy bias evolution b(zph) that matches the data at the 1-σ level in the range 0.6 < zph < 1.0. We further analyse the galaxy mock catalogues and compare their clustering to the data using the angular correlation function w(θ), the comoving transverse separation clustering ξμ < 0.8(s⊥) and the angular power spectrum Cℓ, finding them in agreement. This is the first large set of three-dimensional {ra,dec,z} galaxy mock catalogues able to simultaneously accurately reproduce the photometric redshift uncertainties and the galaxy clustering.

  13. Dark Energy Survey Year 1 Results: galaxy mock catalogues for BAO

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Avila, S.; et al.

    Mock catalogues are a crucial tool in the analysis of galaxy surveys data, both for the accurate computation of covariance matrices, and for the optimisation of analysis methodology and validation of data sets. In this paper, we present a set of 1800 galaxy mock catalogues designed to match the Dark Energy Survey Year-1 BAO sample (Crocce et al. 2017) in abundance, observational volume, redshift distribution and uncertainty, and redshift dependent clustering. The simulated samples were built upon HALOGEN (Avila et al. 2015) halo catalogues, based on a $2LPT$ density field with an exponential bias. For each of them, a lightconemore » is constructed by the superposition of snapshots in the redshift range $0.45« less

  14. X-ray versus infrared selection of distant galaxy clusters: a case study using the XMM-LSS and SpARCS cluster samples

    NASA Astrophysics Data System (ADS)

    Willis, J. P.; Ramos-Ceja, M. E.; Muzzin, A.; Pacaud, F.; Yee, H. K. C.; Wilson, G.

    2018-07-01

    We present a comparison of two samples of z> 0.8 galaxy clusters selected using different wavelength-dependent techniques and examine the physical differences between them. We consider 18 clusters from the X-ray-selected XMM Large Scale Structure (LSS) distant cluster survey and 92 clusters from the optical-mid-infrared (MIR)-selected Spitzer Adaptation of the Red Sequence Cluster survey (SpARCS) cluster survey. Both samples are selected from the same approximately 9 sq deg sky area and we examine them using common XMM-Newton, Spitizer Wide-Area Infrared Extra-galactic (SWIRE) survey, and Canada-France-Hawaii Telescope Legacy Survey data. Clusters from each sample are compared employing aperture measures of X-ray and MIR emission. We divide the SpARCS distant cluster sample into three sub-samples: (i) X-ray bright, (ii) X-ray faint, MIR bright, and (iii) X-ray faint, MIR faint clusters. We determine that X-ray- and MIR-selected clusters display very similar surface brightness distributions of galaxy MIR light. In addition, the average location and amplitude of the galaxy red sequence as measured from stacked colour histograms is very similar in the X-ray- and MIR-selected samples. The sub-sample of X-ray faint, MIR bright clusters displays a distribution of brightest cluster galaxy-barycentre position offsets which extends to higher values than all other samples. This observation indicates that such clusters may exist in a more disturbed state compared to the majority of the distant cluster population sampled by XMM-LSS and SpARCS. This conclusion is supported by stacked X-ray images for the X-ray faint, MIR bright cluster sub-sample that display weak, centrally concentrated X-ray emission, consistent with a population of growing clusters accreting from an extended envelope of material.

  15. Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis

    NASA Astrophysics Data System (ADS)

    Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo

    2016-01-01

    Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties.

  16. Marginal regression models for clustered count data based on zero-inflated Conway-Maxwell-Poisson distribution with applications.

    PubMed

    Choo-Wosoba, Hyoyoung; Levy, Steven M; Datta, Somnath

    2016-06-01

    Community water fluoridation is an important public health measure to prevent dental caries, but it continues to be somewhat controversial. The Iowa Fluoride Study (IFS) is a longitudinal study on a cohort of Iowa children that began in 1991. The main purposes of this study (http://www.dentistry.uiowa.edu/preventive-fluoride-study) were to quantify fluoride exposures from both dietary and nondietary sources and to associate longitudinal fluoride exposures with dental fluorosis (spots on teeth) and dental caries (cavities). We analyze a subset of the IFS data by a marginal regression model with a zero-inflated version of the Conway-Maxwell-Poisson distribution for count data exhibiting excessive zeros and a wide range of dispersion patterns. In general, we introduce two estimation methods for fitting a ZICMP marginal regression model. Finite sample behaviors of the estimators and the resulting confidence intervals are studied using extensive simulation studies. We apply our methodologies to the dental caries data. Our novel modeling incorporating zero inflation, clustering, and overdispersion sheds some new light on the effect of community water fluoridation and other factors. We also include a second application of our methodology to a genomic (next-generation sequencing) dataset that exhibits underdispersion. © 2015, The International Biometric Society.

  17. Evaluating the statistical methodology of randomized trials on dentin hypersensitivity management.

    PubMed

    Matranga, Domenica; Matera, Federico; Pizzo, Giuseppe

    2017-12-27

    The present study aimed to evaluate the characteristics and quality of statistical methodology used in clinical studies on dentin hypersensitivity management. An electronic search was performed for data published from 2009 to 2014 by using PubMed, Ovid/MEDLINE, and Cochrane Library databases. The primary search terms were used in combination. Eligibility criteria included randomized clinical trials that evaluated the efficacy of desensitizing agents in terms of reducing dentin hypersensitivity. A total of 40 studies were considered eligible for assessment of quality statistical methodology. The four main concerns identified were i) use of nonparametric tests in the presence of large samples, coupled with lack of information about normality and equality of variances of the response; ii) lack of P-value adjustment for multiple comparisons; iii) failure to account for interactions between treatment and follow-up time; and iv) no information about the number of teeth examined per patient and the consequent lack of cluster-specific approach in data analysis. Owing to these concerns, statistical methodology was judged as inappropriate in 77.1% of the 35 studies that used parametric methods. Additional studies with appropriate statistical analysis are required to obtain appropriate assessment of the efficacy of desensitizing agents.

  18. On the effect of model parameters on forecast objects

    NASA Astrophysics Data System (ADS)

    Marzban, Caren; Jones, Corinne; Li, Ning; Sandgathe, Scott

    2018-04-01

    Many physics-based numerical models produce a gridded, spatial field of forecasts, e.g., a temperature map. The field for some quantities generally consists of spatially coherent and disconnected objects. Such objects arise in many problems, including precipitation forecasts in atmospheric models, eddy currents in ocean models, and models of forest fires. Certain features of these objects (e.g., location, size, intensity, and shape) are generally of interest. Here, a methodology is developed for assessing the impact of model parameters on the features of forecast objects. The main ingredients of the methodology include the use of (1) Latin hypercube sampling for varying the values of the model parameters, (2) statistical clustering algorithms for identifying objects, (3) multivariate multiple regression for assessing the impact of multiple model parameters on the distribution (across the forecast domain) of object features, and (4) methods for reducing the number of hypothesis tests and controlling the resulting errors. The final output of the methodology is a series of box plots and confidence intervals that visually display the sensitivities. The methodology is demonstrated on precipitation forecasts from a mesoscale numerical weather prediction model.

  19. Impact of Sampling Density on the Extent of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2014-01-01

    Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430

  20. Potential of SNP markers for the characterization of Brazilian cassava germplasm.

    PubMed

    de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte

    2014-06-01

    High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.

  1. Cluster categorization of urban roads to optimize their noise monitoring.

    PubMed

    Zambon, G; Benocci, R; Brambilla, G

    2016-01-01

    Road traffic in urban areas is recognized to be associated with urban mobility and public health, and it is often the main source of noise pollution. Lately, noise maps have been considered a powerful tool to estimate the population exposure to environmental noise, but they need to be validated by measured noise data. The project Dynamic Acoustic Mapping (DYNAMAP), co-funded in the framework of the LIFE 2013 program, is aimed to develop a statistically based method to optimize the choice and the number of monitoring sites and to automate the noise mapping update using the data retrieved from a low-cost monitoring network. Indeed, the first objective should improve the spatial sampling based on the legislative road classification, as this classification is mainly based on the geometrical characteristics of the road, rather than its noise emission. The present paper describes the statistical approach of the methodology under development and the results of its preliminary application to a limited sample of roads in the city of Milan. The resulting categorization of roads, based on clustering the 24-h hourly L Aeqh, looks promising to optimize the spatial sampling of noise monitoring toward a description of the noise pollution due to complex urban road networks more efficient than that based on the legislative road classification.

  2. Observed intra-cluster correlation coefficients in a cluster survey sample of patient encounters in general practice in Australia

    PubMed Central

    Knox, Stephanie A; Chondros, Patty

    2004-01-01

    Background Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Methods Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Results Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. Conclusions The demonstrated precision and reliability of the estimated intra-cluster correlations indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit. PMID:15613248

  3. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  4. The clustering of galaxies in the completed SDSS-III Baryon Oscillation Spectroscopic Survey: towards a computationally efficient analysis without informative priors

    NASA Astrophysics Data System (ADS)

    Pellejero-Ibanez, Marcos; Chuang, Chia-Hsun; Rubiño-Martín, J. A.; Cuesta, Antonio J.; Wang, Yuting; Zhao, Gongbo; Ross, Ashley J.; Rodríguez-Torres, Sergio; Prada, Francisco; Slosar, Anže; Vazquez, Jose A.; Alam, Shadab; Beutler, Florian; Eisenstein, Daniel J.; Gil-Marín, Héctor; Grieb, Jan Niklas; Ho, Shirley; Kitaura, Francisco-Shu; Percival, Will J.; Rossi, Graziano; Salazar-Albornoz, Salvador; Samushia, Lado; Sánchez, Ariel G.; Satpathy, Siddharth; Seo, Hee-Jong; Tinker, Jeremy L.; Tojeiro, Rita; Vargas-Magaña, Mariana; Brownstein, Joel R.; Nichol, Robert C.; Olmstead, Matthew D.

    2017-07-01

    We develop a new computationally efficient methodology called double-probe analysis with the aim of minimizing informative priors (those coming from extra probes) in the estimation of cosmological parameters. Using our new methodology, we extract the dark energy model-independent cosmological constraints from the joint data sets of the Baryon Oscillation Spectroscopic Survey (BOSS) galaxy sample and Planck cosmic microwave background (CMB) measurements. We measure the mean values and covariance matrix of {R, la, Ωbh2, ns, log(As), Ωk, H(z), DA(z), f(z)σ8(z)}, which give an efficient summary of the Planck data and two-point statistics from the BOSS galaxy sample. The CMB shift parameters are R=√{Ω _m H_0^2} r(z_*) and la = πr(z*)/rs(z*), where z* is the redshift at the last scattering surface, and r(z*) and rs(z*) denote our comoving distance to the z* and sound horizon at z*, respectively; Ωb is the baryon fraction at z = 0. This approximate methodology guarantees that we will not need to put informative priors on the cosmological parameters that galaxy clustering is unable to constrain, I.e. Ωbh2 and ns. The main advantage is that the computational time required for extracting these parameters is decreased by a factor of 60 with respect to exact full-likelihood analyses. The results obtained show no tension with the flat Λ cold dark matter (ΛCDM) cosmological paradigm. By comparing with the full-likelihood exact analysis with fixed dark energy models, on one hand we demonstrate that the double-probe method provides robust cosmological parameter constraints that can be conveniently used to study dark energy models, and on the other hand we provide a reliable set of measurements assuming dark energy models to be used, for example, in distance estimations. We extend our study to measure the sum of the neutrino mass using different methodologies, including double-probe analysis (introduced in this study), full-likelihood analysis and single-probe analysis. From full-likelihood analysis, we obtain Σmν < 0.12 (68 per cent), assuming ΛCDM and Σmν < 0.20 (68 per cent) assuming owCDM. We also find that there is degeneracy between observational systematics and neutrino masses, which suggests that one should take great care when estimating these parameters in the case of not having control over the systematics of a given sample.

  5. Post-decomposition optimizations using pattern matching and rule-based clustering for multi-patterning technology

    NASA Astrophysics Data System (ADS)

    Wang, Lynn T.-N.; Madhavan, Sriram

    2018-03-01

    A pattern matching and rule-based polygon clustering methodology with DFM scoring is proposed to detect decomposition-induced manufacturability detractors and fix the layout designs prior to manufacturing. A pattern matcher scans the layout for pre-characterized patterns from a library. If a pattern were detected, rule-based clustering identifies the neighboring polygons that interact with those captured by the pattern. Then, DFM scores are computed for the possible layout fixes: the fix with the best score is applied. The proposed methodology was applied to two 20nm products with a chip area of 11 mm2 on the metal 2 layer. All the hotspots were resolved. The number of DFM spacing violations decreased by 7-15%.

  6. Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

    EPA Science Inventory

    Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...

  7. A self-learning algorithm for biased molecular dynamics

    PubMed Central

    Tribello, Gareth A.; Ceriotti, Michele; Parrinello, Michele

    2010-01-01

    A new self-learning algorithm for accelerated dynamics, reconnaissance metadynamics, is proposed that is able to work with a very large number of collective coordinates. Acceleration of the dynamics is achieved by constructing a bias potential in terms of a patchwork of one-dimensional, locally valid collective coordinates. These collective coordinates are obtained from trajectory analyses so that they adapt to any new features encountered during the simulation. We show how this methodology can be used to enhance sampling in real chemical systems citing examples both from the physics of clusters and from the biological sciences. PMID:20876135

  8. Text Mining of Journal Articles for Sleep Disorder Terminologies.

    PubMed

    Lam, Calvin; Lai, Fu-Chih; Wang, Chia-Hui; Lai, Mei-Hsin; Hsu, Nanly; Chung, Min-Huey

    2016-01-01

    Research on publication trends in journal articles on sleep disorders (SDs) and the associated methodologies by using text mining has been limited. The present study involved text mining for terms to determine the publication trends in sleep-related journal articles published during 2000-2013 and to identify associations between SD and methodology terms as well as conducting statistical analyses of the text mining findings. SD and methodology terms were extracted from 3,720 sleep-related journal articles in the PubMed database by using MetaMap. The extracted data set was analyzed using hierarchical cluster analyses and adjusted logistic regression models to investigate publication trends and associations between SD and methodology terms. MetaMap had a text mining precision, recall, and false positive rate of 0.70, 0.77, and 11.51%, respectively. The most common SD term was breathing-related sleep disorder, whereas narcolepsy was the least common. Cluster analyses showed similar methodology clusters for each SD term, except narcolepsy. The logistic regression models showed an increasing prevalence of insomnia, parasomnia, and other sleep disorders but a decreasing prevalence of breathing-related sleep disorder during 2000-2013. Different SD terms were positively associated with different methodology terms regarding research design terms, measure terms, and analysis terms. Insomnia-, parasomnia-, and other sleep disorder-related articles showed an increasing publication trend, whereas those related to breathing-related sleep disorder showed a decreasing trend. Furthermore, experimental studies more commonly focused on hypersomnia and other SDs and less commonly on insomnia, breathing-related sleep disorder, narcolepsy, and parasomnia. Thus, text mining may facilitate the exploration of the publication trends in SDs and the associated methodologies.

  9. Fluorescent probes for tracking the transfer of iron–sulfur cluster and other metal cofactors in biosynthetic reaction pathways

    DOE PAGES

    Vranish, James N.; Russell, William K.; Yu, Lusa E.; ...

    2014-12-05

    Iron–sulfur (Fe–S) clusters are protein cofactors that are constructed and delivered to target proteins by elaborate biosynthetic machinery. Mechanistic insights into these processes have been limited by the lack of sensitive probes for tracking Fe–S cluster synthesis and transfer reactions. Here we present fusion protein- and intein-based fluorescent labeling strategies that can probe Fe–S cluster binding. The fluorescence is sensitive to different cluster types ([2Fe–2S] and [4Fe–4S] clusters), ligand environments ([2Fe–2S] clusters on Rieske, ferredoxin (Fdx), and glutaredoxin), and cluster oxidation states. The power of this approach is highlighted with an extreme example in which the kinetics of Fe–S clustermore » transfer reactions are monitored between two Fdx molecules that have identical Fe–S spectroscopic properties. This exchange reaction between labeled and unlabeled Fdx is catalyzed by dithiothreitol (DTT), a result that was confirmed by mass spectrometry. DTT likely functions in a ligand substitution reaction that generates a [2Fe–2S]–DTT species, which can transfer the cluster to either labeled or unlabeled Fdx. The ability to monitor this challenging cluster exchange reaction indicates that real-time Fe–S cluster incorporation can be tracked for a specific labeled protein in multicomponent assays that include several unlabeled Fe–S binding proteins or other chromophores. Such advanced kinetic experiments are required to untangle the intricate networks of transfer pathways and the factors affecting flux through branch points. High sensitivity and suitability with high-throughput methodology are additional benefits of this approach. Lastly, we anticipate that this cluster detection methodology will transform the study of Fe–S cluster pathways and potentially other metal cofactor biosynthetic pathways.« less

  10. Bayesian network meta-analysis for cluster randomized trials with binary outcomes.

    PubMed

    Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard

    2017-06-01

    Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. Whole-Volume Clustering of Time Series Data from Zebrafish Brain Calcium Images via Mixture Modeling.

    PubMed

    Nguyen, Hien D; Ullmann, Jeremy F P; McLachlan, Geoffrey J; Voleti, Venkatakaushik; Li, Wenze; Hillman, Elizabeth M C; Reutens, David C; Janke, Andrew L

    2018-02-01

    Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via Gaussian mixtures is suggested for the clustering of data from such visualizations is proposed. The methodology is theoretically justified and a computationally efficient approach to estimation is suggested. An example analysis of a zebrafish imaging experiment is presented.

  12. Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.

    PubMed

    You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary

    2011-02-01

    The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure of relative efficiency might be less than the measure in the literature under some conditions, underestimating the relative efficiency. The relative efficiency of unequal versus equal cluster sizes defined using the noncentrality parameter suggests a sample size approach that is a flexible alternative and a useful complement to existing methods.

  13. Application of Classification Methods for Forecasting Mid-Term Power Load Patterns

    NASA Astrophysics Data System (ADS)

    Piao, Minghao; Lee, Heon Gyu; Park, Jin Hyoung; Ryu, Keun Ho

    Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed approach in this paper consists of three stages: (i) data preprocessing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

  14. Breaking the bottleneck: Use of molecular tailoring approach for the estimation of binding energies at MP2/CBS limit for large water clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Gurmeet; Nandi, Apurba; Gadre, Shridhar R., E-mail: gadre@iitk.ac.in

    2016-03-14

    A pragmatic method based on the molecular tailoring approach (MTA) for estimating the complete basis set (CBS) limit at Møller-Plesset second order perturbation (MP2) theory accurately for large molecular clusters with limited computational resources is developed. It is applied to water clusters, (H{sub 2}O){sub n} (n = 7, 8, 10, 16, 17, and 25) optimized employing aug-cc-pVDZ (aVDZ) basis-set. Binding energies (BEs) of these clusters are estimated at the MP2/aug-cc-pVNZ (aVNZ) [N = T, Q, and 5 (whenever possible)] levels of theory employing grafted MTA (GMTA) methodology and are found to lie within 0.2 kcal/mol of the corresponding full calculationmore » MP2 BE, wherever available. The results are extrapolated to CBS limit using a three point formula. The GMTA-MP2 calculations are feasible on off-the-shelf hardware and show around 50%–65% saving of computational time. The methodology has a potential for application to molecular clusters containing ∼100 atoms.« less

  15. Distinct Gene Expression Patterns between Nasal Mucosal Cells and Blood Collected from Allergic Rhinitis Sufferers.

    PubMed

    Watts, Annabelle M; West, Nicholas P; Cripps, Allan W; Smith, Pete K; Cox, Amanda J

    2018-06-19

    Investigations of gene expression in allergic rhinitis (AR) typically rely on invasive nasal biopsies (site of inflammation) or blood samples (systemic immunity) to obtain sufficient genetic material for analysis. New methodologies to circumvent the need for invasive sample collection offer promise to further the understanding of local immune mechanisms relevant in AR. A within-subject design was employed to compare immune gene expression profiles obtained from nasal washing/brushing and whole blood samples collected during peak pollen season. Twelve adults (age: 46.3 ± 12.3 years) with more than a 2-year history of AR and a confirmed grass pollen allergy participated in the study. Gene expression analysis was performed using a panel of 760 immune genes with the NanoString nCounter platform on nasal lavage/brushing cell lysates and compared to RNA extracted from blood. A total of 355 genes were significantly differentially expressed between sample types (9.87 to -9.71 log2 fold change). The top 3 genes significantly upregulated in nasal lysate samples were Mucin 1 (MUC1), Tight Junction Protein 1 (TJP1), and Lipocalin-2 (LCN2). The top 3 genes significantly upregulated in blood samples were cluster of differentiation 3e (CD3E), FYN Proto-Oncogene Src Family Tyrosine Kinase (FYN) and cluster of differentiation 3d (CD3D). Overall, the blood and nasal lavage samples showed vastly distinct gene expression profiles and functional gene pathways which reflect their anatomical and functional origins. Evaluating immune gene expression of the nasal mucosa in addition to blood samples may be beneficial in understanding AR pathophysiology and response to allergen challenge. © 2018 S. Karger AG, Basel.

  16. A Clustering Methodology of Web Log Data for Learning Management Systems

    ERIC Educational Resources Information Center

    Valsamidis, Stavros; Kontogiannis, Sotirios; Kazanidis, Ioannis; Theodosiou, Theodosios; Karakos, Alexandros

    2012-01-01

    Learning Management Systems (LMS) collect large amounts of data. Data mining techniques can be applied to analyse their web data log files. The instructors may use this data for assessing and measuring their courses. In this respect, we have proposed a methodology for analysing LMS courses and students' activity. This methodology uses a Markov…

  17. Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis.

    PubMed

    Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo

    2016-01-15

    Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Use of Machine Learning Algorithms to Propose a New Methodology to Conduct, Critique and Validate Urban Scale Building Energy Modeling

    NASA Astrophysics Data System (ADS)

    Pathak, Maharshi

    City administrators and real-estate developers have been setting up rather aggressive energy efficiency targets. This, in turn, has led the building science research groups across the globe to focus on urban scale building performance studies and level of abstraction associated with the simulations of the same. The increasing maturity of the stakeholders towards energy efficiency and creating comfortable working environment has led researchers to develop methodologies and tools for addressing the policy driven interventions whether it's urban level energy systems, buildings' operational optimization or retrofit guidelines. Typically, these large-scale simulations are carried out by grouping buildings based on their design similarities i.e. standardization of the buildings. Such an approach does not necessarily lead to potential working inputs which can make decision-making effective. To address this, a novel approach is proposed in the present study. The principle objective of this study is to propose, to define and evaluate the methodology to utilize machine learning algorithms in defining representative building archetypes for the Stock-level Building Energy Modeling (SBEM) which are based on operational parameter database. The study uses "Phoenix- climate" based CBECS-2012 survey microdata for analysis and validation. Using the database, parameter correlations are studied to understand the relation between input parameters and the energy performance. Contrary to precedence, the study establishes that the energy performance is better explained by the non-linear models. The non-linear behavior is explained by advanced learning algorithms. Based on these algorithms, the buildings at study are grouped into meaningful clusters. The cluster "mediod" (statistically the centroid, meaning building that can be represented as the centroid of the cluster) are established statistically to identify the level of abstraction that is acceptable for the whole building energy simulations and post that the retrofit decision-making. Further, the methodology is validated by conducting Monte-Carlo simulations on 13 key input simulation parameters. The sensitivity analysis of these 13 parameters is utilized to identify the optimum retrofits. From the sample analysis, the envelope parameters are found to be more sensitive towards the EUI of the building and thus retrofit packages should also be directed to maximize the energy usage reduction.

  19. Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies

    EPA Science Inventory

    Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...

  20. Retinal ganglion cells in the eastern newt Notophthalmus viridescens: topography, morphology, and diversity.

    PubMed

    Pushchin, Igor I; Karetin, Yuriy A

    2009-10-20

    The topography and morphology of retinal ganglion cells (RGCs) in the eastern newt were studied. Cells were retrogradely labeled with tetramethylrhodamine-conjugated dextran amines or horseradish peroxidase and examined in retinal wholemounts. Their total number was 18,025 +/- 3,602 (mean +/- SEM). The spatial density of RGCs varied from 2,100 cells/mm(2) in the retinal periphery to 4,500 cells/mm(2) in the dorsotemporal retina. No prominent retinal specializations were found. The spatial resolution estimated from the spatial density of RGCs varied from 1.4 cycles per degree in the periphery to 1.95 cycles per degree in the region of the peak RGC density. A sample of 68 cells was camera lucida drawn and subjected to quantitative analysis. A total of 21 parameters related to RGC morphology and stratification in the retina were estimated. Partitionings obtained by using different clustering algorithms combined with automatic variable weighting and dimensionality reduction techniques were compared, and an effective solution was found by using silhouette analysis. A total of seven clusters were identified and associated with potential cell types. Kruskal-Wallis ANOVA-on-Ranks with post hoc Mann-Whitney U tests showed significant pairwise between-cluster differences in one or more of the clustering variables. The average silhouette values of the clusters were reasonably high, ranging from 0.52 to 0.79. Cells assigned to the same cluster displayed similar morphology and stratification in the retina. The advantages and limitations of the methodology adopted are discussed. The present classification is compared with known morphological and physiological RGC classifications in other salamanders.

  1. Tumour-associated and non-tumour-associated microbiota in colorectal cancer

    PubMed Central

    Flemer, Burkhardt; Lynch, Denise B; Brown, Jillian M R; Jeffery, Ian B; Ryan, Feargal J; Claesson, Marcus J; O'Riordain, Micheal; Shanahan, Fergus; O'Toole, Paul W

    2017-01-01

    Objective A signature that unifies the colorectal cancer (CRC) microbiota across multiple studies has not been identified. In addition to methodological variance, heterogeneity may be caused by both microbial and host response differences, which was addressed in this study. Design We prospectively studied the colonic microbiota and the expression of specific host response genes using faecal and mucosal samples (‘ON’ and ‘OFF’ the tumour, proximal and distal) from 59 patients undergoing surgery for CRC, 21 individuals with polyps and 56 healthy controls. Microbiota composition was determined by 16S rRNA amplicon sequencing; expression of host genes involved in CRC progression and immune response was quantified by real-time quantitative PCR. Results The microbiota of patients with CRC differed from that of controls, but alterations were not restricted to the cancerous tissue. Differences between distal and proximal cancers were detected and faecal microbiota only partially reflected mucosal microbiota in CRC. Patients with CRC can be stratified based on higher level structures of mucosal-associated bacterial co-abundance groups (CAGs) that resemble the previously formulated concept of enterotypes. Of these, Bacteroidetes Cluster 1 and Firmicutes Cluster 1 were in decreased abundance in CRC mucosa, whereas Bacteroidetes Cluster 2, Firmicutes Cluster 2, Pathogen Cluster and Prevotella Cluster showed increased abundance in CRC mucosa. CRC-associated CAGs were differentially correlated with the expression of host immunoinflammatory response genes. Conclusions CRC-associated microbiota profiles differ from those in healthy subjects and are linked with distinct mucosal gene-expression profiles. Compositional alterations in the microbiota are not restricted to cancerous tissue and differ between distal and proximal cancers. PMID:26992426

  2. WGCNA: an R package for weighted correlation network analysis.

    PubMed

    Langfelder, Peter; Horvath, Steve

    2008-12-29

    Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA.

  3. WGCNA: an R package for weighted correlation network analysis

    PubMed Central

    Langfelder, Peter; Horvath, Steve

    2008-01-01

    Background Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. Results The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. Conclusion The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at . PMID:19114008

  4. Clustering for unsupervised fault diagnosis in nuclear turbine shut-down transients

    NASA Astrophysics Data System (ADS)

    Baraldi, Piero; Di Maio, Francesco; Rigamonti, Marco; Zio, Enrico; Seraoui, Redouane

    2015-06-01

    Empirical methods for fault diagnosis usually entail a process of supervised training based on a set of examples of signal evolutions "labeled" with the corresponding, known classes of fault. However, in practice, the signals collected during plant operation may be, very often, "unlabeled", i.e., the information on the corresponding type of occurred fault is not available. To cope with this practical situation, in this paper we develop a methodology for the identification of transient signals showing similar characteristics, under the conjecture that operational/faulty transient conditions of the same type lead to similar behavior in the measured signals evolution. The methodology is founded on a feature extraction procedure, which feeds a spectral clustering technique, embedding the unsupervised fuzzy C-means (FCM) algorithm, which evaluates the functional similarity among the different operational/faulty transients. A procedure for validating the plausibility of the obtained clusters is also propounded based on physical considerations. The methodology is applied to a real industrial case, on the basis of 148 shut-down transients of a Nuclear Power Plant (NPP) steam turbine.

  5. Prospective Molecular Profiling of Canine Cancers Provides a Clinically Relevant Comparative Model for Evaluating Personalized Medicine (PMed) Trials

    PubMed Central

    Mazcko, Christina; Cherba, David; Hendricks, William; Lana, Susan; Ehrhart, E. J.; Charles, Brad; Fehling, Heather; Kumar, Leena; Vail, David; Henson, Michael; Childress, Michael; Kitchell, Barbara; Kingsley, Christopher; Kim, Seungchan; Neff, Mark; Davis, Barbara

    2014-01-01

    Background Molecularly-guided trials (i.e. PMed) now seek to aid clinical decision-making by matching cancer targets with therapeutic options. Progress has been hampered by the lack of cancer models that account for individual-to-individual heterogeneity within and across cancer types. Naturally occurring cancers in pet animals are heterogeneous and thus provide an opportunity to answer questions about these PMed strategies and optimize translation to human patients. In order to realize this opportunity, it is now necessary to demonstrate the feasibility of conducting molecularly-guided analysis of tumors from dogs with naturally occurring cancer in a clinically relevant setting. Methodology A proof-of-concept study was conducted by the Comparative Oncology Trials Consortium (COTC) to determine if tumor collection, prospective molecular profiling, and PMed report generation within 1 week was feasible in dogs. Thirty-one dogs with cancers of varying histologies were enrolled. Twenty-four of 31 samples (77%) successfully met all predefined QA/QC criteria and were analyzed via Affymetrix gene expression profiling. A subsequent bioinformatics workflow transformed genomic data into a personalized drug report. Average turnaround from biopsy to report generation was 116 hours (4.8 days). Unsupervised clustering of canine tumor expression data clustered by cancer type, but supervised clustering of tumors based on the personalized drug report clustered by drug class rather than cancer type. Conclusions Collection and turnaround of high quality canine tumor samples, centralized pathology, analyte generation, array hybridization, and bioinformatic analyses matching gene expression to therapeutic options is achievable in a practical clinical window (<1 week). Clustering data show robust signatures by cancer type but also showed patient-to-patient heterogeneity in drug predictions. This lends further support to the inclusion of a heterogeneous population of dogs with cancer into the preclinical modeling of personalized medicine. Future comparative oncology studies optimizing the delivery of PMed strategies may aid cancer drug development. PMID:24637659

  6. Co-variations and Clustering of Chronic Disease Behavioral Risk Factors in China: China Chronic Disease and Risk Factor Surveillance, 2007

    PubMed Central

    Li, Yichong; Zhang, Mei; Jiang, Yong; Wu, Fan

    2012-01-01

    Background Chronic diseases have become the leading causes of mortality in China and related behavioral risk factors (BRFs) changed dramatically in past decades. We aimed to examine the prevalence, co-variations, clustering and the independent correlates of five BRFs at the national level. Methodology/Principal Findings We used data from the 2007 China Chronic Disease and Risk Factor Surveillance, in which multistage clustering sampling was adopted to collect a nationally representative sample of 49,247 Chinese aged 15 to 69 years. We estimated the prevalence and clustering (mean number of BRFs) of five BRFs: tobacco use, excessive alcohol drinking, insufficient intake of vegetable and fruit, physical inactivity, and overweight or obesity. We conducted binary logistic regression models to examine the co-variations among five BRFs with adjustment of demographic and socioeconomic factors, chronic conditions and other BRFs. Ordinal logistic regression was constructed to investigate the independent associations between each covariate and the clustering of BRFs within individuals. Overall, 57.0% of Chinese population had at least two BRFs and the mean number of BRFs is 1.80 (95% confidence interval: 1.78–1.83). Eight of the ten pairs of bivariate associations between the five BRFs were found statistically significant. Chinese with older age, being a male, living in rural areas, having lower education level and lower yearly household income experienced increased likelihood of having more BRFs. Conclusions/Significance Current BRFs place the majority of Chinese aged 15 to 69 years at risk for the future development of chronic disease, which calls for urgent public health programs to reduce these risk factors. Prominent correlations between BRFs imply that a combined package of interventions targeting multiple BRFs might be appropriate. These interventions should target elder population, men, and rural residents, especially those with lower SES. PMID:22439010

  7. The Gaia-ESO Survey: open clusters in Gaia-DR1 . A way forward to stellar age calibration

    NASA Astrophysics Data System (ADS)

    Randich, S.; Tognelli, E.; Jackson, R.; Jeffries, R. D.; Degl'Innocenti, S.; Pancino, E.; Re Fiorentin, P.; Spagna, A.; Sacco, G.; Bragaglia, A.; Magrini, L.; Prada Moroni, P. G.; Alfaro, E.; Franciosini, E.; Morbidelli, L.; Roccatagliata, V.; Bouy, H.; Bravi, L.; Jiménez-Esteban, F. M.; Jordi, C.; Zari, E.; Tautvaišiene, G.; Drazdauskas, A.; Mikolaitis, S.; Gilmore, G.; Feltzing, S.; Vallenari, A.; Bensby, T.; Koposov, S.; Korn, A.; Lanzafame, A.; Smiljanic, R.; Bayo, A.; Carraro, G.; Costado, M. T.; Heiter, U.; Hourihane, A.; Jofré, P.; Lewis, J.; Monaco, L.; Prisinzano, L.; Sbordone, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.

    2018-05-01

    Context. Determination and calibration of the ages of stars, which heavily rely on stellar evolutionary models, are very challenging, while representing a crucial aspect in many astrophysical areas. Aims: We describe the methodologies that, taking advantage of Gaia-DR1 and the Gaia-ESO Survey data, enable the comparison of observed open star cluster sequences with stellar evolutionary models. The final, long-term goal is the exploitation of open clusters as age calibrators. Methods: We perform a homogeneous analysis of eight open clusters using the Gaia-DR1 TGAS catalogue for bright members and information from the Gaia-ESO Survey for fainter stars. Cluster membership probabilities for the Gaia-ESO Survey targets are derived based on several spectroscopic tracers. The Gaia-ESO Survey also provides the cluster chemical composition. We obtain cluster parallaxes using two methods. The first one relies on the astrometric selection of a sample of bona fide members, while the other one fits the parallax distribution of a larger sample of TGAS sources. Ages and reddening values are recovered through a Bayesian analysis using the 2MASS magnitudes and three sets of standard models. Lithium depletion boundary (LDB) ages are also determined using literature observations and the same models employed for the Bayesian analysis. Results: For all but one cluster, parallaxes derived by us agree with those presented in Gaia Collaboration (2017, A&A, 601, A19), while a discrepancy is found for NGC 2516; we provide evidence supporting our own determination. Inferred cluster ages are robust against models and are generally consistent with literature values. Conclusions: The systematic parallax errors inherent in the Gaia DR1 data presently limit the precision of our results. Nevertheless, we have been able to place these eight clusters onto the same age scale for the first time, with good agreement between isochronal and LDB ages where there is overlap. Our approach appears promising and demonstrates the potential of combining Gaia and ground-based spectroscopic datasets. Based on observations collected with the FLAMES instrument at VLT/UT2 telescope (Paranal Observatory, ESO, Chile), for the Gaia-ESO Large Public Spectroscopic Survey (188.B-3002, 193.B-0936).Additional tables are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/612/A99

  8. Occurrence of Radio Minihalos in a Mass-Limited Sample of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Giacintucci, Simona; Markevitch, Maxim; Cassano, Rossella; Venturi, Tiziana; Clarke, Tracy E.; Brunetti, Gianfranco

    2017-01-01

    We investigate the occurrence of radio minihalos-diffuse radio sources of unknown origin observed in the cores of some galaxy clusters-in a statistical sample of 58 clusters drawn from the Planck Sunyaev-Zeldovich cluster catalog using a mass cut (M(sub 500) greater than 6 x 10(exp 14) solar mass). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present. Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores-at least 12 out of 15 (80%)-in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or "warm cores." These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.

  9. Objective sampling design in a highly heterogeneous landscape - characterizing environmental determinants of malaria vector distribution in French Guiana, in the Amazonian region.

    PubMed

    Roux, Emmanuel; Gaborit, Pascal; Romaña, Christine A; Girod, Romain; Dessay, Nadine; Dusfour, Isabelle

    2013-12-01

    Sampling design is a key issue when establishing species inventories and characterizing habitats within highly heterogeneous landscapes. Sampling efforts in such environments may be constrained and many field studies only rely on subjective and/or qualitative approaches to design collection strategy. The region of Cacao, in French Guiana, provides an excellent study site to understand the presence and abundance of Anopheles mosquitoes, their species dynamics and the transmission risk of malaria across various environments. We propose an objective methodology to define a stratified sampling design. Following thorough environmental characterization, a factorial analysis of mixed groups allows the data to be reduced and non-collinear principal components to be identified while balancing the influences of the different environmental factors. Such components defined new variables which could then be used in a robust k-means clustering procedure. Then, we identified five clusters that corresponded to our sampling strata and selected sampling sites in each stratum. We validated our method by comparing the species overlap of entomological collections from selected sites and the environmental similarities of the same sites. The Morisita index was significantly correlated (Pearson linear correlation) with environmental similarity based on i) the balanced environmental variable groups considered jointly (p = 0.001) and ii) land cover/use (p-value < 0.001). The Jaccard index was significantly correlated with land cover/use-based environmental similarity (p-value = 0.001). The results validate our sampling approach. Land cover/use maps (based on high spatial resolution satellite images) were shown to be particularly useful when studying the presence, density and diversity of Anopheles mosquitoes at local scales and in very heterogeneous landscapes.

  10. Objective sampling design in a highly heterogeneous landscape - characterizing environmental determinants of malaria vector distribution in French Guiana, in the Amazonian region

    PubMed Central

    2013-01-01

    Background Sampling design is a key issue when establishing species inventories and characterizing habitats within highly heterogeneous landscapes. Sampling efforts in such environments may be constrained and many field studies only rely on subjective and/or qualitative approaches to design collection strategy. The region of Cacao, in French Guiana, provides an excellent study site to understand the presence and abundance of Anopheles mosquitoes, their species dynamics and the transmission risk of malaria across various environments. We propose an objective methodology to define a stratified sampling design. Following thorough environmental characterization, a factorial analysis of mixed groups allows the data to be reduced and non-collinear principal components to be identified while balancing the influences of the different environmental factors. Such components defined new variables which could then be used in a robust k-means clustering procedure. Then, we identified five clusters that corresponded to our sampling strata and selected sampling sites in each stratum. Results We validated our method by comparing the species overlap of entomological collections from selected sites and the environmental similarities of the same sites. The Morisita index was significantly correlated (Pearson linear correlation) with environmental similarity based on i) the balanced environmental variable groups considered jointly (p = 0.001) and ii) land cover/use (p-value << 0.001). The Jaccard index was significantly correlated with land cover/use-based environmental similarity (p-value = 0.001). Conclusions The results validate our sampling approach. Land cover/use maps (based on high spatial resolution satellite images) were shown to be particularly useful when studying the presence, density and diversity of Anopheles mosquitoes at local scales and in very heterogeneous landscapes. PMID:24289184

  11. Sampling designs for HIV molecular epidemiology with application to Honduras.

    PubMed

    Shepherd, Bryan E; Rossini, Anthony J; Soto, Ramon Jeremias; De Rivera, Ivette Lorenzana; Mullins, James I

    2005-11-01

    Proper sampling is essential to characterize the molecular epidemiology of human immunodeficiency virus (HIV). HIV sampling frames are difficult to identify, so most studies use convenience samples. We discuss statistically valid and feasible sampling techniques that overcome some of the potential for bias due to convenience sampling and ensure better representation of the study population. We employ a sampling design called stratified cluster sampling. This first divides the population into geographical and/or social strata. Within each stratum, a population of clusters is chosen from groups, locations, or facilities where HIV-positive individuals might be found. Some clusters are randomly selected within strata and individuals are randomly selected within clusters. Variation and cost help determine the number of clusters and the number of individuals within clusters that are to be sampled. We illustrate the approach through a study designed to survey the heterogeneity of subtype B strains in Honduras.

  12. The ROSAT Brightest Cluster Sample - I. The compilation of the sample and the cluster log N-log S distribution

    NASA Astrophysics Data System (ADS)

    Ebeling, H.; Edge, A. C.; Bohringer, H.; Allen, S. W.; Crawford, C. S.; Fabian, A. C.; Voges, W.; Huchra, J. P.

    1998-12-01

    We present a 90 per cent flux-complete sample of the 201 X-ray-brightest clusters of galaxies in the northern hemisphere (delta>=0 deg), at high Galactic latitudes (|b|>=20 deg), with measured redshifts z<=0.3 and fluxes higher than 4.4x10^-12 erg cm^-2 s^-1 in the 0.1-2.4 keV band. The sample, called the ROSAT Brightest Cluster Sample (BCS), is selected from ROSAT All-Sky Survey data and is the largest X-ray-selected cluster sample compiled to date. In addition to Abell clusters, which form the bulk of the sample, the BCS also contains the X-ray-brightest Zwicky clusters and other clusters selected from their X-ray properties alone. Effort has been made to ensure the highest possible completeness of the sample and the smallest possible contamination by non-cluster X-ray sources. X-ray fluxes are computed using an algorithm tailored for the detection and characterization of X-ray emission from galaxy clusters. These fluxes are accurate to better than 15 per cent (mean 1sigma error). We find the cumulative logN-logS distribution of clusters to follow a power law kappa S^alpha with alpha=1.31^+0.06_-0.03 (errors are the 10th and 90th percentiles) down to fluxes of 2x10^-12 erg cm^-2 s^-1, i.e. considerably below the BCS flux limit. Although our best-fitting slope disagrees formally with the canonical value of -1.5 for a Euclidean distribution, the BCS logN-logS distribution is consistent with a non-evolving cluster population if cosmological effects are taken into account. Our sample will allow us to examine large-scale structure in the northern hemisphere, determine the spatial cluster-cluster correlation function, investigate correlations between the X-ray and optical properties of the clusters, establish the X-ray luminosity function for galaxy clusters, and discuss the implications of the results for cluster evolution.

  13. Correction of the angular dependence of satellite retrieved LST at global scale using parametric models

    NASA Astrophysics Data System (ADS)

    Ermida, S. L.; Trigo, I. F.; DaCamara, C.; Ghent, D.

    2017-12-01

    Land surface temperature (LST) values retrieved from satellite measurements in the thermal infrared (TIR) may be strongly affected by spatial anisotropy. This effect introduces significant discrepancies among LST estimations from different sensors, overlapping in space and time, that are not related to uncertainties in the methodologies or input data used. Furthermore, these directional effects deviate LST products from an ideally defined LST, which should represent to the ensemble of directional radiometric temperature of all surface elements within the FOV. Angular effects on LST are here conveniently estimated by means of a parametric model of the surface thermal emission, which describes the angular dependence of LST as a function of viewing and illumination geometry. Two models are consistently analyzed to evaluate their performance of and to assess their respective potential to correct directional effects on LST for a wide range of surface conditions, in terms of tree coverage, vegetation density, surface emissivity. We also propose an optimization of the correction of directional effects through a synergistic use of both models. The models are calibrated using LST data as provided by two sensors: MODIS on-board NASA's TERRA and AQUA; and SEVIRI on-board EUMETSAT's MSG. As shown in our previous feasibility studies the sampling of illumination and view angles has a high impact on the model parameters. This impact may be mitigated when the sampling size is increased by aggregating pixels with similar surface conditions. Here we propose a methodology where land surface is stratified by means of a cluster analysis using information on land cover type, fraction of vegetation cover and topography. The models are then adjusted to LST data corresponding to each cluster. It is shown that the quality of the cluster based models is very close to the pixel based ones. Furthermore, the reduced number of parameters allows improving the model trough the incorporation of a seasonal component. The application of the procedure discussed here towards the harmonization of LST products from multi-sensors has been tested within the framework of the ESA DUE GlobTemperature project. It is also expected to help the characterization of directional effects of LST products generated within the EUMETSAT LSA SAF.

  14. PuReD-MCL: a graph-based PubMed document clustering methodology.

    PubMed

    Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A

    2008-09-01

    Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/

  15. Analysis of grain elements and identification of best genotypes for Fe and P in Afghan wheat landraces

    PubMed Central

    Kondou, Youichi; Manickavelu, Alagu; Komatsu, Kenji; Arifi, Mujiburahman; Kawashima, Mika; Ishii, Takayoshi; Hattori, Tomohiro; Iwata, Hiroyoshi; Tsujimoto, Hisashi; Ban, Tomohiro; Matsui, Minami

    2016-01-01

    This study was carried out with the aim of developing the methodology to determine elemental composition in wheat and identify the best germplasm for further research. Orphan and genetically diverse Afghan wheat landraces were chosen and EDXRF was used to measure the content of some of the elements to establish elemental composition in grains of 266 landraces using 10 reference lines. Four elements, K, Mg, P, and Fe, were measured by standardizing sample preparation. The results of hierarchical cluster analysis using elemental composition data sets indicated that the Fe content has an opposite pattern to the other elements, especially that of K. By systematic analysis the best wheat germplasms for P content and Fe content were identified. In order to compare the sensitivity of EDXRF, the ICP method was also used and the similar results obtained confirmed the EDXRF methodology. The sampling method for measurement using EDXRF was optimized resulting in high-throughput profiling of elemental composition in wheat grains at low cost. Using this method, we have characterized the Afghan wheat landraces and isolated the best genotypes that have high-elemental content and have the potential to be used in crop improvement. PMID:28163583

  16. Hierarchical modeling of cluster size in wildlife surveys

    USGS Publications Warehouse

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  17. Superresolution Imaging of Aquaporin-4 Cluster Size in Antibody-Stained Paraffin Brain Sections

    PubMed Central

    Smith, Alex J.; Verkman, Alan S.

    2015-01-01

    The water channel aquaporin-4 (AQP4) forms supramolecular clusters whose size is determined by the ratio of M1- and M23-AQP4 isoforms. In cultured astrocytes, differences in the subcellular localization and macromolecular interactions of small and large AQP4 clusters results in distinct physiological roles for M1- and M23-AQP4. Here, we developed quantitative superresolution optical imaging methodology to measure AQP4 cluster size in antibody-stained paraffin sections of mouse cerebral cortex and spinal cord, human postmortem brain, and glioma biopsy specimens. This methodology was used to demonstrate that large AQP4 clusters are formed in AQP4−/− astrocytes transfected with only M23-AQP4, but not in those expressing only M1-AQP4, both in vitro and in vivo. Native AQP4 in mouse cortex, where both isoforms are expressed, was enriched in astrocyte foot-processes adjacent to microcapillaries; clusters in perivascular regions of the cortex were larger than in parenchymal regions, demonstrating size-dependent subcellular segregation of AQP4 clusters. Two-color superresolution imaging demonstrated colocalization of Kir4.1 with AQP4 clusters in perivascular areas but not in parenchyma. Surprisingly, the subcellular distribution of AQP4 clusters was different between gray and white matter astrocytes in spinal cord, demonstrating regional specificity in cluster polarization. Changes in AQP4 subcellular distribution are associated with several neurological diseases and we demonstrate that AQP4 clustering was preserved in a postmortem human cortical brain tissue specimen, but that AQP4 was not substantially clustered in a human glioblastoma specimen despite high-level expression. Our results demonstrate the utility of superresolution optical imaging for measuring the size of AQP4 supramolecular clusters in paraffin sections of brain tissue and support AQP4 cluster size as a primary determinant of its subcellular distribution. PMID:26682810

  18. The clustering of diet, physical activity and sedentary behavior in children and adolescents: a review.

    PubMed

    Leech, Rebecca M; McNaughton, Sarah A; Timperio, Anna

    2014-01-22

    Diet, physical activity (PA) and sedentary behavior are important, yet modifiable, determinants of obesity. Recent research into the clustering of these behaviors suggests that children and adolescents have multiple obesogenic risk factors. This paper reviews studies using empirical, data-driven methodologies, such as cluster analysis (CA) and latent class analysis (LCA), to identify clustering patterns of diet, PA and sedentary behavior among children or adolescents and their associations with socio-demographic indicators, and overweight and obesity. A literature search of electronic databases was undertaken to identify studies which have used data-driven methodologies to investigate the clustering of diet, PA and sedentary behavior among children and adolescents aged 5-18 years old. Eighteen studies (62% of potential studies) were identified that met the inclusion criteria, of which eight examined the clustering of PA and sedentary behavior and eight examined diet, PA and sedentary behavior. Studies were mostly cross-sectional and conducted in older children and adolescents (≥ 9 years). Findings from the review suggest that obesogenic cluster patterns are complex with a mixed PA/sedentary behavior cluster observed most frequently, but healthy and unhealthy patterning of all three behaviors was also reported. Cluster membership was found to differ according to age, gender and socio-economic status (SES). The tendency for older children/adolescents, particularly females, to comprise clusters defined by low PA was the most robust finding. Findings to support an association between obesogenic cluster patterns and overweight and obesity were inconclusive, with longitudinal research in this area limited. Diet, PA and sedentary behavior cluster together in complex ways that are not well understood. Further research, particularly in younger children, is needed to understand how cluster membership differs according to socio-demographic profile. Longitudinal research is also essential to establish how different cluster patterns track over time and their influence on the development of overweight and obesity.

  19. The clustering of diet, physical activity and sedentary behavior in children and adolescents: a review

    PubMed Central

    2014-01-01

    Diet, physical activity (PA) and sedentary behavior are important, yet modifiable, determinants of obesity. Recent research into the clustering of these behaviors suggests that children and adolescents have multiple obesogenic risk factors. This paper reviews studies using empirical, data-driven methodologies, such as cluster analysis (CA) and latent class analysis (LCA), to identify clustering patterns of diet, PA and sedentary behavior among children or adolescents and their associations with socio-demographic indicators, and overweight and obesity. A literature search of electronic databases was undertaken to identify studies which have used data-driven methodologies to investigate the clustering of diet, PA and sedentary behavior among children and adolescents aged 5–18 years old. Eighteen studies (62% of potential studies) were identified that met the inclusion criteria, of which eight examined the clustering of PA and sedentary behavior and eight examined diet, PA and sedentary behavior. Studies were mostly cross-sectional and conducted in older children and adolescents (≥9 years). Findings from the review suggest that obesogenic cluster patterns are complex with a mixed PA/sedentary behavior cluster observed most frequently, but healthy and unhealthy patterning of all three behaviors was also reported. Cluster membership was found to differ according to age, gender and socio-economic status (SES). The tendency for older children/adolescents, particularly females, to comprise clusters defined by low PA was the most robust finding. Findings to support an association between obesogenic cluster patterns and overweight and obesity were inconclusive, with longitudinal research in this area limited. Diet, PA and sedentary behavior cluster together in complex ways that are not well understood. Further research, particularly in younger children, is needed to understand how cluster membership differs according to socio-demographic profile. Longitudinal research is also essential to establish how different cluster patterns track over time and their influence on the development of overweight and obesity. PMID:24450617

  20. Self-organization and clustering algorithms

    NASA Technical Reports Server (NTRS)

    Bezdek, James C.

    1991-01-01

    Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.

  1. The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters

    NASA Astrophysics Data System (ADS)

    Bayliss, Matthew

    2017-08-01

    We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics |*| the infamous |*|gastrophysics|*| in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.

  2. The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters

    NASA Astrophysics Data System (ADS)

    Bayliss, Matthew

    2017-09-01

    We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics -- the infamous ``gastrophysics''-- in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.

  3. Stratified sampling design based on data mining.

    PubMed

    Kim, Yeonkook J; Oh, Yoonhwan; Park, Sunghoon; Cho, Sungzoon; Park, Hayoung

    2013-09-01

    To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea.

  4. A fast learning method for large scale and multi-class samples of SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  5. Crisp clustering of airborne geophysical data from the Alto Ligonha pegmatite field, northeastern Mozambique, to predict zones of increased rare earth element potential

    NASA Astrophysics Data System (ADS)

    Eberle, Detlef G.; Daudi, Elias X. F.; Muiuane, Elônio A.; Nyabeze, Peter; Pontavida, Alfredo M.

    2012-01-01

    The National Geology Directorate of Mozambique (DNG) and Maputo-based Eduardo-Mondlane University (UEM) entered a joint venture with the South African Council for Geoscience (CGS) to conduct a case study over the meso-Proterozoic Alto Ligonha pegmatite field in the Zambézia Province of northeastern Mozambique to support the local exploration and mining sectors. Rare-metal minerals, i.e. tantalum and niobium, as well as rare-earth minerals have been mined in the Alto Ligonha pegmatite field since decades, but due to the civil war (1977-1992) production nearly ceased. The Government now strives to promote mining in the region as contribution to poverty alleviation. This study was undertaken to facilitate the extraction of geological information from the high resolution airborne magnetic and radiometric data sets recently acquired through a World Bank funded survey and mapping project. The aim was to generate a value-added map from the airborne geophysical data that is easier to read and use by the exploration and mining industries than mere airborne geophysical grid data or maps. As a first step towards clustering, thorium (Th) and potassium (K) concentrations were determined from the airborne geophysical data as well as apparent magnetic susceptibility and first vertical magnetic gradient data. These four datasets were projected onto a 100 m spaced regular grid to assemble 850,000 four-element (multivariate) sample vectors over the study area. Classification of the sample vectors using crisp clustering based upon the Euclidian distance between sample and class centre provided a (pseudo-) geology map or value-added map, respectively, displaying the spatial distribution of six different classes in the study area. To learn the quality of sample allocation, the degree of membership of each sample vector was determined using a-posterior discriminant analysis. Geophysical ground truth control was essential to allocate geology/geophysical attributes to the six classes. The highest probability to meet pegmatite bodies is in close vicinity to (magnetic) amphibole schist occurring in areas where depletion of potassium as indication of metasomatic processes is evident from the airborne radiometric data. Clustering has proven to be a fast and effective method to compile value-added maps from multivariate geophysical datasets. Experience made in the Alto Ligonha pegmatite field encourages adopting this new methodology for mapping other parts of the Mozambique Fold Belt.

  6. Modeling and clustering water demand patterns from real-world smart meter data

    NASA Astrophysics Data System (ADS)

    Cheifetz, Nicolas; Noumir, Zineb; Samé, Allou; Sandraz, Anne-Claire; Féliers, Cédric; Heim, Véronique

    2017-08-01

    Nowadays, drinking water utilities need an acute comprehension of the water demand on their distribution network, in order to efficiently operate the optimization of resources, manage billing and propose new customer services. With the emergence of smart grids, based on automated meter reading (AMR), a better understanding of the consumption modes is now accessible for smart cities with more granularities. In this context, this paper evaluates a novel methodology for identifying relevant usage profiles from the water consumption data produced by smart meters. The methodology is fully data-driven using the consumption time series which are seen as functions or curves observed with an hourly time step. First, a Fourier-based additive time series decomposition model is introduced to extract seasonal patterns from time series. These patterns are intended to represent the customer habits in terms of water consumption. Two functional clustering approaches are then used to classify the extracted seasonal patterns: the functional version of K-means, and the Fourier REgression Mixture (FReMix) model. The K-means approach produces a hard segmentation and K representative prototypes. On the other hand, the FReMix is a generative model and also produces K profiles as well as a soft segmentation based on the posterior probabilities. The proposed approach is applied to a smart grid deployed on the largest water distribution network (WDN) in France. The two clustering strategies are evaluated and compared. Finally, a realistic interpretation of the consumption habits is given for each cluster. The extensive experiments and the qualitative interpretation of the resulting clusters allow one to highlight the effectiveness of the proposed methodology.

  7. The Mass Function of Abell Clusters

    NASA Astrophysics Data System (ADS)

    Chen, J.; Huchra, J. P.; McNamara, B. R.; Mader, J.

    1998-12-01

    The velocity dispersion and mass functions for rich clusters of galaxies provide important constraints on models of the formation of Large-Scale Structure (e.g., Frenk et al. 1990). However, prior estimates of the velocity dispersion or mass function for galaxy clusters have been based on either very small samples of clusters (Bahcall and Cen 1993; Zabludoff et al. 1994) or large but incomplete samples (e.g., the Girardi et al. (1998) determination from a sample of clusters with more than 30 measured galaxy redshifts). In contrast, we approach the problem by constructing a volume-limited sample of Abell clusters. We collected individual galaxy redshifts for our sample from two major galaxy velocity databases, the NASA Extragalactic Database, NED, maintained at IPAC, and ZCAT, maintained at SAO. We assembled a database with velocity information for possible cluster members and then selected cluster members based on both spatial and velocity data. Cluster velocity dispersions and masses were calculated following the procedures of Danese, De Zotti, and di Tullio (1980) and Heisler, Tremaine, and Bahcall (1985), respectively. The final velocity dispersion and mass functions were analyzed in order to constrain cosmological parameters by comparison to the results of N-body simulations. Our data for the cluster sample as a whole and for the individual clusters (spatial maps and velocity histograms) in our sample is available on-line at http://cfa-www.harvard.edu/ huchra/clusters. This website will be updated as more data becomes available in the master redshift compilations, and will be expanded to include more clusters and large groups of galaxies.

  8. Construction of ground-state preserving sparse lattice models for predictive materials simulations

    NASA Astrophysics Data System (ADS)

    Huang, Wenxuan; Urban, Alexander; Rong, Ziqin; Ding, Zhiwei; Luo, Chuan; Ceder, Gerbrand

    2017-08-01

    First-principles based cluster expansion models are the dominant approach in ab initio thermodynamics of crystalline mixtures enabling the prediction of phase diagrams and novel ground states. However, despite recent advances, the construction of accurate models still requires a careful and time-consuming manual parameter tuning process for ground-state preservation, since this property is not guaranteed by default. In this paper, we present a systematic and mathematically sound method to obtain cluster expansion models that are guaranteed to preserve the ground states of their reference data. The method builds on the recently introduced compressive sensing paradigm for cluster expansion and employs quadratic programming to impose constraints on the model parameters. The robustness of our methodology is illustrated for two lithium transition metal oxides with relevance for Li-ion battery cathodes, i.e., Li2xFe2(1-x)O2 and Li2xTi2(1-x)O2, for which the construction of cluster expansion models with compressive sensing alone has proven to be challenging. We demonstrate that our method not only guarantees ground-state preservation on the set of reference structures used for the model construction, but also show that out-of-sample ground-state preservation up to relatively large supercell size is achievable through a rapidly converging iterative refinement. This method provides a general tool for building robust, compressed and constrained physical models with predictive power.

  9. Biochemical imaging of tissues by SIMS for biomedical applications

    NASA Astrophysics Data System (ADS)

    Lee, Tae Geol; Park, Ji-Won; Shon, Hyun Kyong; Moon, Dae Won; Choi, Won Woo; Li, Kapsok; Chung, Jin Ho

    2008-12-01

    With the development of optimal surface cleaning techniques by cluster ion beam sputtering, certain applications of SIMS for analyzing cells and tissues have been actively investigated. For this report, we collaborated with bio-medical scientists to study bio-SIMS analyses of skin and cancer tissues for biomedical diagnostics. We pay close attention to the setting up of a routine procedure for preparing tissue specimens and treating the surface before obtaining the bio-SIMS data. Bio-SIMS was used to study two biosystems, skin tissues for understanding the effects of photoaging and colon cancer tissues for insight into the development of new cancer diagnostics for cancer. Time-of-flight SIMS imaging measurements were taken after surface cleaning with cluster ion bombardment by Bi n or C 60 under varying conditions. The imaging capability of bio-SIMS with a spatial resolution of a few microns combined with principal component analysis reveal biologically meaningful information, but the lack of high molecular weight peaks even with cluster ion bombardment was a problem. This, among other problems, shows that discourse with biologists and medical doctors are critical to glean any meaningful information from SIMS mass spectrometric and imaging data. For SIMS to be accepted as a routine, daily analysis tool in biomedical laboratories, various practical sample handling methodology such as surface matrix treatment, including nano-metal particles and metal coating, in addition to cluster sputtering, should be studied.

  10. Evaluation of primary immunization coverage of infants under universal immunization programme in an urban area of bangalore city using cluster sampling and lot quality assurance sampling techniques.

    PubMed

    K, Punith; K, Lalitha; G, Suman; Bs, Pradeep; Kumar K, Jayanth

    2008-07-01

    Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Population-based cross-sectional study. Areas under Mathikere Urban Health Center. Children aged 12 months to 23 months. 220 in cluster sampling, 76 in lot quality assurance sampling. Percentages and Proportions, Chi square Test. (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.

  11. Megacity analysis: a clustering approach to classification

    DTIC Science & Technology

    2017-06-01

    kinetic or non -kinetic urban operations. We develop and implement a methodology to classify megacities into groups. Using 33 variables, we construct a...is interested in these megacity networks and their implications for potential urban operations. We develop a methodology to group like megacities...is interested in these megacity networks and their implications for potential urban operations. We develop a methodology to group like megacities

  12. Distributive Education Competency-Based Curriculum Models by Occupational Clusters. Final Report.

    ERIC Educational Resources Information Center

    Davis, Rodney E.; Husted, Stewart W.

    To meet the needs of distributive education teachers and students, a project was initiated to develop competency-based curriculum models for marketing and distributive education clusters. The models which were developed incorporate competencies, materials and resources, teaching methodologies/learning activities, and evaluative criteria for the…

  13. Measuring Systemic and Climate Diversity in Ontario's University Sector

    ERIC Educational Resources Information Center

    Piché, Pierre Gilles

    2015-01-01

    This article proposes a methodology for measuring institutional diversity and applies it to Ontario's university sector. This study first used hierarchical cluster analysis, which suggested there has been very little change in diversity between 1994 and 2010 as universities were clustered in three groups for both years. However, by adapting…

  14. Impurity profiling of a chemical weapon precursor for possible forensic signatures by comprehensive two-dimensional gas chromatography/mass spectrometry and chemometrics.

    PubMed

    Hoggard, Jamin C; Wahl, Jon H; Synovec, Robert E; Mong, Gary M; Fraga, Carlos G

    2010-01-15

    In this report we present the feasibility of using analytical and chemometric methodologies to reveal and exploit the chemical impurity profiles from commercial dimethyl methylphosphonate (DMMP) samples to illustrate the type of forensic information that may be obtained from chemical-attack evidence. Using DMMP as a model compound of a toxicant that may be used in a chemical attack, we used comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry (GC x GC/TOF-MS) to detect and identify trace organic impurities in six samples of commercially acquired DMMP. The GC x GC/TOF-MS data was analyzed to produce impurity profiles for all six DMMP samples using 29 analyte impurities. The use of PARAFAC for the mathematical resolution of overlapped GC x GC peaks ensured clean spectra for the identification of many of the detected analytes by spectral library matching. The use of statistical pairwise comparison revealed that there were trace impurities that were quantitatively similar and different among five of the six DMMP samples. Two of the DMMP samples were revealed to have identical impurity profiles by this approach. The use of nonnegative matrix factorization indicated that there were five distinct DMMP sample types as illustrated by the clustering of the multiple DMMP analyses into five distinct clusters in the scores plots. The two indistinguishable DMMP samples were confirmed by their chemical supplier to be from the same bulk source. Sample information from the other chemical suppliers supported the idea that the other four DMMP samples were likely from different bulk sources. These results demonstrate that the matching of synthesized products from the same source is possible using impurity profiling. In addition, the identified impurities common to all six DMMP samples provide strong evidence that basic route information can be obtained from impurity profiles. Finally, impurities that may be unique to the sole bulk manufacturer of DMMP were found in some of the DMMP samples.

  15. Wide-Field Lensing Mass Maps from Dark Energy Survey Science Verification Data

    DOE PAGES

    Chang, C.

    2015-07-29

    We present a mass map reconstructed from weak gravitational lensing shear measurements over 139 deg 2 from the Dark Energy Survey science verification data. The mass map probes both luminous and dark matter, thus providing a tool for studying cosmology. We also find good agreement between the mass map and the distribution of massive galaxy clusters identified using a red-sequence cluster finder. Potential candidates for superclusters and voids are identified using these maps. We measure the cross-correlation between the mass map and a magnitude-limited foreground galaxy sample and find a detection at the 6.8σ level with 20 arc min smoothing.more » These measurements are consistent with simulated galaxy catalogs based on N-body simulations from a cold dark matter model with a cosmological constant. This suggests low systematics uncertainties in the map. Finally, we summarize our key findings in this Letter; the detailed methodology and tests for systematics are presented in a companion paper.« less

  16. Wide-Field Lensing Mass Maps from Dark Energy Survey Science Verification Data.

    PubMed

    Chang, C; Vikram, V; Jain, B; Bacon, D; Amara, A; Becker, M R; Bernstein, G; Bonnett, C; Bridle, S; Brout, D; Busha, M; Frieman, J; Gaztanaga, E; Hartley, W; Jarvis, M; Kacprzak, T; Kovács, A; Lahav, O; Lin, H; Melchior, P; Peiris, H; Rozo, E; Rykoff, E; Sánchez, C; Sheldon, E; Troxel, M A; Wechsler, R; Zuntz, J; Abbott, T; Abdalla, F B; Allam, S; Annis, J; Bauer, A H; Benoit-Lévy, A; Brooks, D; Buckley-Geer, E; Burke, D L; Capozzi, D; Carnero Rosell, A; Carrasco Kind, M; Castander, F J; Crocce, M; D'Andrea, C B; Desai, S; Diehl, H T; Dietrich, J P; Doel, P; Eifler, T F; Evrard, A E; Fausti Neto, A; Flaugher, B; Fosalba, P; Gruen, D; Gruendl, R A; Gutierrez, G; Honscheid, K; James, D; Kent, S; Kuehn, K; Kuropatkin, N; Maia, M A G; March, M; Martini, P; Merritt, K W; Miller, C J; Miquel, R; Neilsen, E; Nichol, R C; Ogando, R; Plazas, A A; Romer, A K; Roodman, A; Sako, M; Sanchez, E; Sevilla, I; Smith, R C; Soares-Santos, M; Sobreira, F; Suchyta, E; Tarle, G; Thaler, J; Thomas, D; Tucker, D; Walker, A R

    2015-07-31

    We present a mass map reconstructed from weak gravitational lensing shear measurements over 139  deg2 from the Dark Energy Survey science verification data. The mass map probes both luminous and dark matter, thus providing a tool for studying cosmology. We find good agreement between the mass map and the distribution of massive galaxy clusters identified using a red-sequence cluster finder. Potential candidates for superclusters and voids are identified using these maps. We measure the cross-correlation between the mass map and a magnitude-limited foreground galaxy sample and find a detection at the 6.8σ level with 20 arc min smoothing. These measurements are consistent with simulated galaxy catalogs based on N-body simulations from a cold dark matter model with a cosmological constant. This suggests low systematics uncertainties in the map. We summarize our key findings in this Letter; the detailed methodology and tests for systematics are presented in a companion paper.

  17. [Methodological Aspects of the Sampling Design for the 2015 National Mental Health Survey].

    PubMed

    Rodríguez, Nelcy; Rodríguez, Viviana Alejandra; Ramírez, Eugenia; Cediel, Sandra; Gil, Fabián; Rondón, Martín Alonso

    2016-12-01

    The WHO has encouraged the development, implementation and evaluation of policies related to mental health all over the world. In Colombia, within this framework and promoted by the Ministry of Health and Social Protection, as well as being supported by Colciencias, the fourth National Mental Health Survey (NMHST) was conducted using a observational cross sectional study. According to the context and following the guidelines and sampling design, a summary of the methodology used for this sampling process is presented. The fourth NMHST used the Homes Master Sample for Studies in Health from the National System of Studies and Population Surveys for Health to calculate its sample. This Master Sample was developed and implemented in the year 2013 by the Ministry of Social Protection. This study included non-institutionalised civilian population divided into four age groups: children 7-11 years, adolescent 12-17 years, 18-44 years and 44 years old or older. The sample size calculation was based on the reported prevalences in other studies for the outcomes of mental disorders, depression, suicide, associated morbidity, and alcohol use. A probabilistic, cluster, stratified and multistage selection process was used. Expansions factors to the total population were calculated. A total of 15,351 completed surveys were collected and were distributed according to the age groups: 2727, 7-11 years, 1754, 12-17 years, 5889, 18-44 years, and 4981, ≥45 years. All the surveys were distributed in five regions: Atlantic, Oriental, Bogotá, Central and Pacific. A sufficient number of surveys were collected in this study to obtain a more precise approximation of the mental problems and disorders at the regional and national level. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  18. Simulating Asymmetric Top Impurities in Superfluid Clusters: A para-Water Dopant in para-Hydrogen.

    PubMed

    Zeng, Tao; Li, Hui; Roy, Pierre-Nicholas

    2013-01-03

    We present the first simulation study of bosonic clusters doped with an asymmetric top molecule. The path-integral Monte Carlo method with the latest methodological advance in treating rigid-body rotation [Noya, E. G.; Vega, C.; McBride, C. J. Chem. Phys.2011, 134, 054117] is employed to study a para-water impurity in para-hydrogen clusters with up to 20 para-hydrogen molecules. The growth pattern of the doped clusters is similar in nature to that of pure clusters. The para-water molecule appears to rotate freely in the cluster. The presence of para-water substantially quenches the superfluid response of para-hydrogen with respect to the space-fixed frame.

  19. Creating peer groups for assessing and comparing nursing home performance.

    PubMed

    Byrne, Margaret M; Daw, Christina; Pietz, Ken; Reis, Brian; Petersen, Laura A

    2013-11-01

    Publicly reported performance data for hospitals and nursing homes are becoming ubiquitous. For such comparisons to be fair, facilities must be compared with their peers. To adapt a previously published methodology for developing hospital peer groupings so that it is applicable to nursing homes and to explore the characteristics of "nearest-neighbor" peer groupings. Analysis of Department of Veterans Affairs administrative databases and nursing home facility characteristics. The nearest-neighbor methodology for developing peer groupings involves calculating the Euclidean distance between facilities based on facility characteristics. We describe our steps in selection of facility characteristics, describe the characteristics of nearest-neighbor peer groups, and compare them with peer groups derived through classical cluster analysis. The facility characteristics most pertinent to nursing home groupings were found to be different from those that were most relevant for hospitals. Unlike classical cluster groups, nearest neighbor groups are not mutually exclusive, and the nearest-neighbor methodology resulted in nursing home peer groupings that were substantially less diffuse than nursing home peer groups created using traditional cluster analysis. It is essential that healthcare policy makers and administrators have a means of fairly grouping facilities for the purposes of quality, cost, or efficiency comparisons. In this research, we show that a previously published methodology can be successfully applied to a nursing home setting. The same approach could be applied in other clinical settings such as primary care.

  20. Methodological framework for projecting the potential loss of intraspecific genetic diversity due to global climate change

    PubMed Central

    2012-01-01

    Background While research on the impact of global climate change (GCC) on ecosystems and species is flourishing, a fundamental component of biodiversity – molecular variation – has not yet received its due attention in such studies. Here we present a methodological framework for projecting the loss of intraspecific genetic diversity due to GCC. Methods The framework consists of multiple steps that combines 1) hierarchical genetic clustering methods to define comparable units of inference, 2) species accumulation curves (SAC) to infer sampling completeness, and 3) species distribution modelling (SDM) to project the genetic diversity loss under GCC. We suggest procedures for existing data sets as well as specifically designed studies. We illustrate the approach with two worked examples from a land snail (Trochulus villosus) and a caddisfly (Smicridea (S.) mucronata). Results Sampling completeness was diagnosed on the third coarsest haplotype clade level for T. villosus and the second coarsest for S. mucronata. For both species, a substantial species range loss was projected under the chosen climate scenario. However, despite substantial differences in data set quality concerning spatial sampling and sampling depth, no loss of haplotype clades due to GCC was predicted for either species. Conclusions The suggested approach presents a feasible method to tap the rich resources of existing phylogeographic data sets and guide the design and analysis of studies explicitly designed to estimate the impact of GCC on a currently still neglected level of biodiversity. PMID:23176586

  1. Stochastic coupled cluster theory: Efficient sampling of the coupled cluster expansion

    NASA Astrophysics Data System (ADS)

    Scott, Charles J. C.; Thom, Alex J. W.

    2017-09-01

    We consider the sampling of the coupled cluster expansion within stochastic coupled cluster theory. Observing the limitations of previous approaches due to the inherently non-linear behavior of a coupled cluster wavefunction representation, we propose new approaches based on an intuitive, well-defined condition for sampling weights and on sampling the expansion in cluster operators of different excitation levels. We term these modifications even and truncated selections, respectively. Utilising both approaches demonstrates dramatically improved calculation stability as well as reduced computational and memory costs. These modifications are particularly effective at higher truncation levels owing to the large number of terms within the cluster expansion that can be neglected, as demonstrated by the reduction of the number of terms to be sampled when truncating at triple excitations by 77% and hextuple excitations by 98%.

  2. Damage detection methodology under variable load conditions based on strain field pattern recognition using FBGs, nonlinear principal component analysis, and clustering techniques

    NASA Astrophysics Data System (ADS)

    Sierra-Pérez, Julián; Torres-Arredondo, M.-A.; Alvarez-Montoya, Joham

    2018-01-01

    Structural health monitoring consists of using sensors integrated within structures together with algorithms to perform load monitoring, damage detection, damage location, damage size and severity, and prognosis. One possibility is to use strain sensors to infer structural integrity by comparing patterns in the strain field between the pristine and damaged conditions. In previous works, the authors have demonstrated that it is possible to detect small defects based on strain field pattern recognition by using robust machine learning techniques. They have focused on methodologies based on principal component analysis (PCA) and on the development of several unfolding and standardization techniques, which allow dealing with multiple load conditions. However, before a real implementation of this approach in engineering structures, changes in the strain field due to conditions different from damage occurrence need to be isolated. Since load conditions may vary in most engineering structures and promote significant changes in the strain field, it is necessary to implement novel techniques for uncoupling such changes from those produced by damage occurrence. A damage detection methodology based on optimal baseline selection (OBS) by means of clustering techniques is presented. The methodology includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique in conjunction with Q and nonlinear-T 2 damage indices. The methodology is experimentally validated using strain measurements obtained by 32 fiber Bragg grating sensors bonded to an aluminum beam under dynamic bending loads and simultaneously submitted to variations in its pitch angle. The results demonstrated the capability of the methodology for clustering data according to 13 different load conditions (pitch angles), performing the OBS and detecting six different damages induced in a cumulative way. The proposed methodology showed a true positive rate of 100% and a false positive rate of 1.28% for a 99% of confidence.

  3. Occurrence of Radio Minihalos in a Mass-limited Sample of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giacintucci, Simona; Clarke, Tracy E.; Markevitch, Maxim

    2017-06-01

    We investigate the occurrence of radio minihalos—diffuse radio sources of unknown origin observed in the cores of some galaxy clusters—in a statistical sample of 58 clusters drawn from the Planck Sunyaev–Zel’dovich cluster catalog using a mass cut ( M {sub 500} > 6 × 10{sup 14} M {sub ⊙}). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present.more » Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores—at least 12 out of 15 (80%)—in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or “warm cores.” These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.« less

  4. Combining cluster number counts and galaxy clustering

    NASA Astrophysics Data System (ADS)

    Lacasa, Fabien; Rosenfeld, Rogerio

    2016-08-01

    The abundance of clusters and the clustering of galaxies are two of the important cosmological probes for current and future large scale surveys of galaxies, such as the Dark Energy Survey. In order to combine them one has to account for the fact that they are not independent quantities, since they probe the same density field. It is important to develop a good understanding of their correlation in order to extract parameter constraints. We present a detailed modelling of the joint covariance matrix between cluster number counts and the galaxy angular power spectrum. We employ the framework of the halo model complemented by a Halo Occupation Distribution model (HOD). We demonstrate the importance of accounting for non-Gaussianity to produce accurate covariance predictions. Indeed, we show that the non-Gaussian covariance becomes dominant at small scales, low redshifts or high cluster masses. We discuss in particular the case of the super-sample covariance (SSC), including the effects of galaxy shot-noise, halo second order bias and non-local bias. We demonstrate that the SSC obeys mathematical inequalities and positivity. Using the joint covariance matrix and a Fisher matrix methodology, we examine the prospects of combining these two probes to constrain cosmological and HOD parameters. We find that the combination indeed results in noticeably better constraints, with improvements of order 20% on cosmological parameters compared to the best single probe, and even greater improvement on HOD parameters, with reduction of error bars by a factor 1.4-4.8. This happens in particular because the cross-covariance introduces a synergy between the probes on small scales. We conclude that accounting for non-Gaussian effects is required for the joint analysis of these observables in galaxy surveys.

  5. Group sequential designs for stepped-wedge cluster randomised trials

    PubMed Central

    Grayling, Michael J; Wason, James MS; Mander, Adrian P

    2017-01-01

    Background/Aims: The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Methods: Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. Results: We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial’s type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. Conclusion: The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial. PMID:28653550

  6. Group sequential designs for stepped-wedge cluster randomised trials.

    PubMed

    Grayling, Michael J; Wason, James Ms; Mander, Adrian P

    2017-10-01

    The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial's type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial.

  7. The quality of reporting in cluster randomised crossover trials: proposal for reporting items and an assessment of reporting quality.

    PubMed

    Arnup, Sarah J; Forbes, Andrew B; Kahan, Brennan C; Morgan, Katy E; McKenzie, Joanne E

    2016-12-06

    The cluster randomised crossover (CRXO) design is gaining popularity in trial settings where individual randomisation or parallel group cluster randomisation is not feasible or practical. Our aim is to stimulate discussion on the content of a reporting guideline for CRXO trials and to assess the reporting quality of published CRXO trials. We undertook a systematic review of CRXO trials. Searches of MEDLINE, EMBASE, and CINAHL Plus as well as citation searches of CRXO methodological articles were conducted to December 2014. Reporting quality was assessed against both modified items from 2010 CONSORT and 2012 cluster trials extension and other proposed quality measures. Of the 3425 records identified through database searching, 83 trials met the inclusion criteria. Trials were infrequently identified as "cluster randomis(z)ed crossover" in title (n = 7, 8%) or abstract (n = 21, 25%), and a rationale for the design was infrequently provided (n = 20, 24%). Design parameters such as the number of clusters and number of periods were well reported. Discussion of carryover took place in only 17 trials (20%). Sample size methods were only reported in 58% (n = 48) of trials. A range of approaches were used to report baseline characteristics. The analysis method was not adequately reported in 23% (n = 19) of trials. The observed within-cluster within-period intracluster correlation and within-cluster between-period intracluster correlation for the primary outcome data were not reported in any trial. The potential for selection, performance, and detection bias could be evaluated in 30%, 81%, and 70% of trials, respectively. There is a clear need to improve the quality of reporting in CRXO trials. Given the unique features of a CRXO trial, it is important to develop a CONSORT extension. Consensus amongst trialists on the content of such a guideline is essential.

  8. Evaluation of Primary Immunization Coverage of Infants Under Universal Immunization Programme in an Urban Area of Bangalore City Using Cluster Sampling and Lot Quality Assurance Sampling Techniques

    PubMed Central

    K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth

    2008-01-01

    Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474

  9. A priori evaluation of two-stage cluster sampling for accuracy assessment of large-area land-cover maps

    USGS Publications Warehouse

    Wickham, J.D.; Stehman, S.V.; Smith, J.H.; Wade, T.G.; Yang, L.

    2004-01-01

    Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, within-cluster correlation may reduce the precision of the accuracy estimates. The detailed population information to quantify a priori the effect of within-cluster correlation on precision is typically unavailable. Consequently, a convenient, practical approach to evaluate the likely performance of a two-stage cluster sample is needed. We describe such an a priori evaluation protocol focusing on the spatial distribution of the sample by land-cover class across different cluster sizes and costs of different sampling options, including options not imposing clustering. This protocol also assesses the two-stage design's adequacy for estimating the precision of accuracy estimates for rare land-cover classes. We illustrate the approach using two large-area, regional accuracy assessments from the National Land-Cover Data (NLCD), and describe how the a priorievaluation was used as a decision-making tool when implementing the NLCD design.

  10. Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping.

    PubMed

    Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin

    2009-02-01

    A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.

  11. Changes to Serum Sample Tube and Processing Methodology Does Not Cause Inter-Individual Variation in Automated Whole Serum N-Glycan Profiling in Health and Disease

    PubMed Central

    Shubhakar, Archana; Kalla, Rahul; Nimmo, Elaine R.; Fernandes, Daryl L.; Satsangi, Jack; Spencer, Daniel I. R.

    2015-01-01

    Introduction Serum N-glycans have been identified as putative biomarkers for numerous diseases. The impact of different serum sample tubes and processing methods on N-glycan analysis has received relatively little attention. This study aimed to determine the effect of different sample tubes and processing methods on the whole serum N-glycan profile in both health and disease. A secondary objective was to describe a robot automated N-glycan release, labeling and cleanup process for use in a biomarker discovery system. Methods 25 patients with active and quiescent inflammatory bowel disease and controls had three different serum sample tubes taken at the same draw. Two different processing methods were used for three types of tube (with and without gel-separation medium). Samples were randomised and processed in a blinded fashion. Whole serum N-glycan release, 2-aminobenzamide labeling and cleanup was automated using a Hamilton Microlab STARlet Liquid Handling robot. Samples were analysed using a hydrophilic interaction liquid chromatography/ethylene bridged hybrid(BEH) column on an ultra-high performance liquid chromatography instrument. Data were analysed quantitatively by pairwise correlation and hierarchical clustering using the area under each chromatogram peak. Qualitatively, a blinded assessor attempted to match chromatograms to each individual. Results There was small intra-individual variation in serum N-glycan profiles from samples collected using different sample processing methods. Intra-individual correlation coefficients were between 0.99 and 1. Unsupervised hierarchical clustering and principal coordinate analyses accurately matched samples from the same individual. Qualitative analysis demonstrated good chromatogram overlay and a blinded assessor was able to accurately match individuals based on chromatogram profile, regardless of disease status. Conclusions The three different serum sample tubes processed using the described methods cause minimal inter-individual variation in serum whole N-glycan profile when processed using an automated workstream. This has important implications for N-glycan biomarker discovery studies using different serum processing standard operating procedures. PMID:25831126

  12. Changes to serum sample tube and processing methodology does not cause Intra-Individual [corrected] variation in automated whole serum N-glycan profiling in health and disease.

    PubMed

    Ventham, Nicholas T; Gardner, Richard A; Kennedy, Nicholas A; Shubhakar, Archana; Kalla, Rahul; Nimmo, Elaine R; Fernandes, Daryl L; Satsangi, Jack; Spencer, Daniel I R

    2015-01-01

    Serum N-glycans have been identified as putative biomarkers for numerous diseases. The impact of different serum sample tubes and processing methods on N-glycan analysis has received relatively little attention. This study aimed to determine the effect of different sample tubes and processing methods on the whole serum N-glycan profile in both health and disease. A secondary objective was to describe a robot automated N-glycan release, labeling and cleanup process for use in a biomarker discovery system. 25 patients with active and quiescent inflammatory bowel disease and controls had three different serum sample tubes taken at the same draw. Two different processing methods were used for three types of tube (with and without gel-separation medium). Samples were randomised and processed in a blinded fashion. Whole serum N-glycan release, 2-aminobenzamide labeling and cleanup was automated using a Hamilton Microlab STARlet Liquid Handling robot. Samples were analysed using a hydrophilic interaction liquid chromatography/ethylene bridged hybrid(BEH) column on an ultra-high performance liquid chromatography instrument. Data were analysed quantitatively by pairwise correlation and hierarchical clustering using the area under each chromatogram peak. Qualitatively, a blinded assessor attempted to match chromatograms to each individual. There was small intra-individual variation in serum N-glycan profiles from samples collected using different sample processing methods. Intra-individual correlation coefficients were between 0.99 and 1. Unsupervised hierarchical clustering and principal coordinate analyses accurately matched samples from the same individual. Qualitative analysis demonstrated good chromatogram overlay and a blinded assessor was able to accurately match individuals based on chromatogram profile, regardless of disease status. The three different serum sample tubes processed using the described methods cause minimal inter-individual variation in serum whole N-glycan profile when processed using an automated workstream. This has important implications for N-glycan biomarker discovery studies using different serum processing standard operating procedures.

  13. Exploring innovative ways to conduct coverage surveys for neglected tropical diseases in Malawi, Mali, and Uganda.

    PubMed

    Woodhall, Dana M; Mkwanda, Square; Dembele, Massitan; Lwanga, Harriet; Drexler, Naomi; Dubray, Christine; Harris, Jennifer; Worrell, Caitlin; Mathieu, Els

    2014-04-01

    Currently, a 30-cluster survey to monitor drug coverage after mass drug administration for neglected tropical diseases is the most common methodology used by control programs. We investigated alternative survey methodologies that could potentially provide an estimation of drug coverage. Three alternative survey methods (market, village chief, and religious leader) were conducted and compared to the 30-cluster method in Malawi, Mali, and Uganda. In Malawi, drug coverage for the 30-cluster, market, village chief, and religious leader methods were 66.8% (95% CI 60.3-73.4), 74.3%, 76.3%, and 77.8%, respectively. In Mali, results for round 1 were 62.6% (95% CI 54.4-70.7), 56.1%, 74.8%, and 83.2%, and 57.2% (95% CI 49.0-65.4), 54.5%, 72.2%, and 73.3%, respectively, for round 2. Uganda survey results were 65.7% (59.4-72.0), 43.7%, 67.2%, and 77.6% respectively. Further research is needed to test different coverage survey methodologies to determine which survey methods are the most scientifically rigorous and resource efficient. Published by Elsevier B.V.

  14. Designing Trend-Monitoring Sounds for Helicopters: Methodological Issues and an Application

    ERIC Educational Resources Information Center

    Edworthy, Judy; Hellier, Elizabeth; Aldrich, Kirsteen; Loxley, Sarah

    2004-01-01

    This article explores methodological issues in sonification and sound design arising from the design of helicopter monitoring sounds. Six monitoring sounds (each with 5 levels) were tested for similarity and meaning with 3 different techniques: hierarchical cluster analysis, linkage analysis, and multidimensional scaling. In Experiment 1,…

  15. A TRAJECTORY-CLUSTERING CORRELATION METHODOLOGY FOR EXAMINING THE LONG-RANGE TRANSPORT OF AIR POLLUTANTS. (R825260)

    EPA Science Inventory

    We present a robust methodology for examining the relationship between synoptic-scale atmospheric transport patterns and pollutant concentration levels observed at a site. Our approach entails calculating a large number of back-trajectories from the observational site over a long...

  16. Automatic Clustering of Rolling Element Bearings Defects with Artificial Neural Network

    NASA Astrophysics Data System (ADS)

    Antonini, M.; Faglia, R.; Pedersoli, M.; Tiboni, M.

    2006-06-01

    The paper presents the optimization of a methodology for automatic clustering based on Artificial Neural Networks to detect the presence of defects in rolling bearings. The research activity was developed in co-operation with an Italian company which is expert in the production of water pumps for automotive use (Industrie Saleri Italo). The final goal of the work is to develop a system for the automatic control of the pumps, at the end of the production line. In this viewpoint, we are gradually considering the main elements of the water pump, which can cause malfunctioning. The first elements we have considered are the rolling bearing, a very critic component for the system. The experimental activity is based on the vibration measuring of rolling bearings opportunely damaged; vibration signals are in the second phase elaborated; the third and last phase is an automatic clustering. Different signal elaboration techniques are compared to optimize the methodology.

  17. Modelling the angular effects on satellite retrieved LST at global scale using a land surface classification

    NASA Astrophysics Data System (ADS)

    Ermida, Sofia; DaCamara, Carlos C.; Trigo, Isabel F.; Pires, Ana C.; Ghent, Darren

    2017-04-01

    Land Surface Temperature (LST) is a key climatological variable and a diagnostic parameter of land surface conditions. Remote sensing constitutes the most effective method to observe LST over large areas and on a regular basis. Although LST estimation from remote sensing instruments operating in the Infrared (IR) is widely used and has been performed for nearly 3 decades, there is still a list of open issues. One of these is the LST dependence on viewing and illumination geometry. This effect introduces significant discrepancies among LST estimations from different sensors, overlapping in space and time, that are not related to uncertainties in the methodologies or input data used. Furthermore, these directional effects deviate LST products from an ideally defined LST, which should represent to the ensemble of directional radiometric temperature of all surface elements within the FOV. Angular effects on LST are here conveniently estimated by means of a kernel model of the surface thermal emission, which describes the angular dependence of LST as a function of viewing and illumination geometry. The model is calibrated using LST data as provided by a wide range of sensors to optimize spatial coverage, namely: 1) a LEO sensor - the Moderate Resolution Imaging Spectroradiometer (MODIS) on-board NASA's TERRA and AQUA; and 2) 3 GEO sensors - the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on-board EUMETSAT's Meteosat Second Generation (MSG), the Japanese Meteorological Imager (JAMI) on-board the Japanese Meteorological Association (JMA) Multifunction Transport SATellite (MTSAT-2), and NASA's Geostationary Operational Environmental Satellites (GOES). As shown in our previous feasibility studies the sampling of illumination and view angles has a high impact on the obtained model parameters. This impact may be mitigated when the sampling size is increased by aggregating pixels with similar surface conditions. Here we propose a methodology where land surface is stratified by means of a cluster analysis using information on land cover type, fraction of vegetation cover and topography. The kernel model is then adjusted to LST data corresponding to each cluster. It is shown that the quality of the cluster based kernel model is very close to the pixel based one. Furthermore, the reduced number of parameters (limited to the number of identified clusters, instead of a pixel-by-pixel model calibration) allows improving the kernel model trough the incorporation of a seasonal component. The application of the here discussed procedure towards the harmonization of LST products from multi-sensors is on the framework of the ESA DUE GlobTemperature project.

  18. Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

    PubMed Central

    Bushel, Pierre R; Wolfinger, Russell D; Gibson, Greg

    2007-01-01

    Background Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. Results We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. Conclusion The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable. PMID:17408499

  19. Quantum clustering and network analysis of MD simulation trajectories to probe the conformational ensembles of protein-ligand interactions.

    PubMed

    Bhattacharyya, Moitrayee; Vishveshwara, Saraswathi

    2011-07-01

    In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.

  20. Analysis of the structure and dynamics of human serum albumin.

    PubMed

    Guizado, T R Cuya

    2014-10-01

    Human serum albumin (HSA) is a biologically relevant protein that binds a variety of drugs and other small molecules. No less than 50 structures are deposited in the RCSB Protein Data Bank (PDB). Based on these structures, we first performed a clustering analysis. Despite the diversity of ligands, only two well defined conformations are detected, with a deviation of 0.46 nm between the average structures of the two clusters, while deviations within each cluster are smaller than 0.08 nm. Those two conformations are representative of the apoprotein and the HSA-myristate complex already identified in previous literature. Considering the structures within each cluster as a representative sample of the dynamical states of the corresponding conformation, we scrutinize the structural and dynamical differences between both conformations. Analysis of the fluctuations within each cluster set reveals that domain II is the most rigid one and better matches both structures. Then, taking this domain as reference, we show that the structural difference between both conformations can be expressed in terms of twist and hinge motions of domains I and III, respectively. We also characterize the dynamical difference between conformations by computing correlations and principal components for each set of dynamical states. The two conformations display different collective motions. The results are compared with those obtained from the trajectories of short molecular dynamics simulations, giving consistent outcomes. Let us remark that, beyond the relevance of the results for the structural and dynamical characterization of HAS conformations, the present methodology could be extended to other proteins in the PDB archive.

  1. Assessment of economic status in trauma registries: A new algorithm for generating population-specific clustering-based models of economic status for time-constrained low-resource settings.

    PubMed

    Eyler, Lauren; Hubbard, Alan; Juillard, Catherine

    2016-10-01

    Low and middle-income countries (LMICs) and the world's poor bear a disproportionate share of the global burden of injury. Data regarding disparities in injury are vital to inform injury prevention and trauma systems strengthening interventions targeted towards vulnerable populations, but are limited in LMICs. We aim to facilitate injury disparities research by generating a standardized methodology for assessing economic status in resource-limited country trauma registries where complex metrics such as income, expenditures, and wealth index are infeasible to assess. To address this need, we developed a cluster analysis-based algorithm for generating simple population-specific metrics of economic status using nationally representative Demographic and Health Surveys (DHS) household assets data. For a limited number of variables, g, our algorithm performs weighted k-medoids clustering of the population using all combinations of g asset variables and selects the combination of variables and number of clusters that maximize average silhouette width (ASW). In simulated datasets containing both randomly distributed variables and "true" population clusters defined by correlated categorical variables, the algorithm selected the correct variable combination and appropriate cluster numbers unless variable correlation was very weak. When used with 2011 Cameroonian DHS data, our algorithm identified twenty economic clusters with ASW 0.80, indicating well-defined population clusters. This economic model for assessing health disparities will be used in the new Cameroonian six-hospital centralized trauma registry. By describing our standardized methodology and algorithm for generating economic clustering models, we aim to facilitate measurement of health disparities in other trauma registries in resource-limited countries. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  2. Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation.

    PubMed

    Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J

    2009-04-01

    Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67x3 (67 clusters of three observations) and a 33x6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67x3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis.

  3. General Framework for Effect Sizes in Cluster Randomized Experiments

    ERIC Educational Resources Information Center

    VanHoudnos, Nathan

    2016-01-01

    Cluster randomized experiments are ubiquitous in modern education research. Although a variety of modeling approaches are used to analyze these data, perhaps the most common methodology is a normal mixed effects model where some effects, such as the treatment effect, are regarded as fixed, and others, such as the effect of group random assignment…

  4. Communication via Chalkboard and on Paper. TLA-100.00 (F.U.F.).

    ERIC Educational Resources Information Center

    Hardison, Margaret J.

    The purpose of this cluster of learning modules is to increase the teacher intern's understanding and skills with regard to his role as a communicator. The cluster contains three modules: (a) objectives for teaching handwriting, (b) methodology of manuscript writing, and (c) practice teaching of manuscript handwriting. Each module contains a…

  5. Combining analytical hierarchy process and agglomerative hierarchical clustering in search of expert consensus in green corridors development management.

    PubMed

    Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal

    2013-07-01

    Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts-landscape architects, regional planners, and geographers-revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.

  6. Initial Analysis of and Predictive Model Development for Weather Reroute Advisory Use

    NASA Technical Reports Server (NTRS)

    Arneson, Heather M.

    2016-01-01

    In response to severe weather conditions, traffic management coordinators specify reroutes to route air traffic around affected regions of airspace. Providing analysis and recommendations of available reroute options would assist the traffic management coordinators in making more efficient rerouting decisions. These recommendations can be developed by examining historical data to determine which previous reroute options were used in similar weather and traffic conditions. Essentially, using previous information to inform future decisions. This paper describes the initial steps and methodology used towards this goal. A method to extract relevant features from the large volume of weather data to quantify the convective weather scenario during a particular time range is presented. Similar routes are clustered. A description of the algorithm to identify which cluster of reroute advisories were actually followed by pilots is described. Models built for fifteen of the top twenty most frequently used reroute clusters correctly predict the use of the cluster for over 60 of the test examples. Results are preliminary but indicate that the methodology is worth pursuing with modifications based on insight gained from this analysis.

  7. Combining Analytical Hierarchy Process and Agglomerative Hierarchical Clustering in Search of Expert Consensus in Green Corridors Development Management

    NASA Astrophysics Data System (ADS)

    Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal

    2013-07-01

    Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts—landscape architects, regional planners, and geographers—revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.

  8. Clustering molecular dynamics trajectories for optimizing docking experiments.

    PubMed

    De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D; Norberto de Souza, Osmar; Barros, Rodrigo C

    2015-01-01

    Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand.

  9. Understanding the cluster randomised crossover design: a graphical illustraton of the components of variation and a sample size tutorial.

    PubMed

    Arnup, Sarah J; McKenzie, Joanne E; Hemming, Karla; Pilcher, David; Forbes, Andrew B

    2017-08-15

    In a cluster randomised crossover (CRXO) design, a sequence of interventions is assigned to a group, or 'cluster' of individuals. Each cluster receives each intervention in a separate period of time, forming 'cluster-periods'. Sample size calculations for CRXO trials need to account for both the cluster randomisation and crossover aspects of the design. Formulae are available for the two-period, two-intervention, cross-sectional CRXO design, however implementation of these formulae is known to be suboptimal. The aims of this tutorial are to illustrate the intuition behind the design; and provide guidance on performing sample size calculations. Graphical illustrations are used to describe the effect of the cluster randomisation and crossover aspects of the design on the correlation between individual responses in a CRXO trial. Sample size calculations for binary and continuous outcomes are illustrated using parameters estimated from the Australia and New Zealand Intensive Care Society - Adult Patient Database (ANZICS-APD) for patient mortality and length(s) of stay (LOS). The similarity between individual responses in a CRXO trial can be understood in terms of three components of variation: variation in cluster mean response; variation in the cluster-period mean response; and variation between individual responses within a cluster-period; or equivalently in terms of the correlation between individual responses in the same cluster-period (within-cluster within-period correlation, WPC), and between individual responses in the same cluster, but in different periods (within-cluster between-period correlation, BPC). The BPC lies between zero and the WPC. When the WPC and BPC are equal the precision gained by crossover aspect of the CRXO design equals the precision lost by cluster randomisation. When the BPC is zero there is no advantage in a CRXO over a parallel-group cluster randomised trial. Sample size calculations illustrate that small changes in the specification of the WPC or BPC can increase the required number of clusters. By illustrating how the parameters required for sample size calculations arise from the CRXO design and by providing guidance on both how to choose values for the parameters and perform the sample size calculations, the implementation of the sample size formulae for CRXO trials may improve.

  10. 75 FR 44937 - Submission for OMB Review; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-30

    ... is a block cluster, which consists of one or more contiguous census blocks. The P sample is a sample of housing units and persons obtained independently from the census for a sample of block clusters. The E sample is a sample of census housing units and enumerations in the same block of clusters as the...

  11. Genetic Structure of Bluefin Tuna in the Mediterranean Sea Correlates with Environmental Variables

    PubMed Central

    Riccioni, Giulia; Stagioni, Marco; Landi, Monica; Ferrara, Giorgia; Barbujani, Guido; Tinti, Fausto

    2013-01-01

    Background Atlantic Bluefin Tuna (ABFT) shows complex demography and ecological variation in the Mediterranean Sea. Genetic surveys have detected significant, although weak, signals of population structuring; catch series analyses and tagging programs identified complex ABFT spatial dynamics and migration patterns. Here, we tested the hypothesis that the genetic structure of the ABFT in the Mediterranean is correlated with mean surface temperature and salinity. Methodology We used six samples collected from Western and Central Mediterranean integrated with a new sample collected from the recently identified easternmost reproductive area of Levantine Sea. To assess population structure in the Mediterranean we used a multidisciplinary framework combining classical population genetics, spatial and Bayesian clustering methods and a multivariate approach based on factor analysis. Conclusions FST analysis and Bayesian clustering methods detected several subpopulations in the Mediterranean, a result also supported by multivariate analyses. In addition, we identified significant correlations of genetic diversity with mean salinity and surface temperature values revealing that ABFT is genetically structured along two environmental gradients. These results suggest that a preference for some spawning habitat conditions could contribute to shape ABFT genetic structuring in the Mediterranean. However, further studies should be performed to assess to what extent ABFT spawning behaviour in the Mediterranean Sea can be affected by environmental variation. PMID:24260341

  12. Evaluation of Nine Consensus Indices in Delphi Foresight Research and Their Dependency on Delphi Survey Characteristics: A Simulation Study and Debate on Delphi Design and Interpretation.

    PubMed

    Birko, Stanislav; Dove, Edward S; Özdemir, Vural

    2015-01-01

    The extent of consensus (or the lack thereof) among experts in emerging fields of innovation can serve as antecedents of scientific, societal, investor and stakeholder synergy or conflict. Naturally, how we measure consensus is of great importance to science and technology strategic foresight. The Delphi methodology is a widely used anonymous survey technique to evaluate consensus among a panel of experts. Surprisingly, there is little guidance on how indices of consensus can be influenced by parameters of the Delphi survey itself. We simulated a classic three-round Delphi survey building on the concept of clustered consensus/dissensus. We evaluated three study characteristics that are pertinent for design of Delphi foresight research: (1) the number of survey questions, (2) the sample size, and (3) the extent to which experts conform to group opinion (the Group Conformity Index) in a Delphi study. Their impacts on the following nine Delphi consensus indices were then examined in 1000 simulations: Clustered Mode, Clustered Pairwise Agreement, Conger's Kappa, De Moivre index, Extremities Version of the Clustered Pairwise Agreement, Fleiss' Kappa, Mode, the Interquartile Range and Pairwise Agreement. The dependency of a consensus index on the Delphi survey characteristics was expressed from 0.000 (no dependency) to 1.000 (full dependency). The number of questions (range: 6 to 40) in a survey did not have a notable impact whereby the dependency values remained below 0.030. The variation in sample size (range: 6 to 50) displayed the top three impacts for the Interquartile Range, the Clustered Mode and the Mode (dependency = 0.396, 0.130, 0.116, respectively). The Group Conformity Index, a construct akin to measuring stubbornness/flexibility of experts' opinions, greatly impacted all nine Delphi consensus indices (dependency = 0.200 to 0.504), except the Extremity CPWA and the Interquartile Range that were impacted only beyond the first decimal point (dependency = 0.087 and 0.083, respectively). Scholars in technology design, foresight research and future(s) studies might consider these new findings in strategic planning of Delphi studies, for example, in rational choice of consensus indices and sample size, or accounting for confounding factors such as experts' variable degrees of conformity (stubbornness/flexibility) in modifying their opinions.

  13. Evaluation of Nine Consensus Indices in Delphi Foresight Research and Their Dependency on Delphi Survey Characteristics: A Simulation Study and Debate on Delphi Design and Interpretation

    PubMed Central

    Birko, Stanislav; Dove, Edward S.; Özdemir, Vural

    2015-01-01

    The extent of consensus (or the lack thereof) among experts in emerging fields of innovation can serve as antecedents of scientific, societal, investor and stakeholder synergy or conflict. Naturally, how we measure consensus is of great importance to science and technology strategic foresight. The Delphi methodology is a widely used anonymous survey technique to evaluate consensus among a panel of experts. Surprisingly, there is little guidance on how indices of consensus can be influenced by parameters of the Delphi survey itself. We simulated a classic three-round Delphi survey building on the concept of clustered consensus/dissensus. We evaluated three study characteristics that are pertinent for design of Delphi foresight research: (1) the number of survey questions, (2) the sample size, and (3) the extent to which experts conform to group opinion (the Group Conformity Index) in a Delphi study. Their impacts on the following nine Delphi consensus indices were then examined in 1000 simulations: Clustered Mode, Clustered Pairwise Agreement, Conger’s Kappa, De Moivre index, Extremities Version of the Clustered Pairwise Agreement, Fleiss’ Kappa, Mode, the Interquartile Range and Pairwise Agreement. The dependency of a consensus index on the Delphi survey characteristics was expressed from 0.000 (no dependency) to 1.000 (full dependency). The number of questions (range: 6 to 40) in a survey did not have a notable impact whereby the dependency values remained below 0.030. The variation in sample size (range: 6 to 50) displayed the top three impacts for the Interquartile Range, the Clustered Mode and the Mode (dependency = 0.396, 0.130, 0.116, respectively). The Group Conformity Index, a construct akin to measuring stubbornness/flexibility of experts’ opinions, greatly impacted all nine Delphi consensus indices (dependency = 0.200 to 0.504), except the Extremity CPWA and the Interquartile Range that were impacted only beyond the first decimal point (dependency = 0.087 and 0.083, respectively). Scholars in technology design, foresight research and future(s) studies might consider these new findings in strategic planning of Delphi studies, for example, in rational choice of consensus indices and sample size, or accounting for confounding factors such as experts’ variable degrees of conformity (stubbornness/flexibility) in modifying their opinions. PMID:26270647

  14. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review

    PubMed Central

    Morris, Tom; Gray, Laura

    2017-01-01

    Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637

  15. Deriving photometric redshifts using fuzzy archetypes and self-organizing maps - I. Methodology

    NASA Astrophysics Data System (ADS)

    Speagle, Joshua S.; Eisenstein, Daniel J.

    2017-07-01

    We propose a method to substantially increase the flexibility and power of template fitting-based photometric redshifts by transforming a large number of galaxy spectral templates into a corresponding collection of 'fuzzy archetypes' using a suitable set of perturbative priors designed to account for empirical variation in dust attenuation and emission-line strengths. To bypass widely separated degeneracies in parameter space (e.g. the redshift-reddening degeneracy), we train self-organizing maps (SOMs) on large 'model catalogues' generated from Monte Carlo sampling of our fuzzy archetypes to cluster the predicted observables in a topologically smooth fashion. Subsequent sampling over the SOM then allows full reconstruction of the relevant probability distribution functions (PDFs). This combined approach enables the multimodal exploration of known variation among galaxy spectral energy distributions with minimal modelling assumptions. We demonstrate the power of this approach to recover full redshift PDFs using discrete Markov chain Monte Carlo sampling methods combined with SOMs constructed from Large Synoptic Survey Telescope ugrizY and Euclid YJH mock photometry.

  16. Solidification kinetics of a Cu-Zr alloy: ground-based and microgravity experiments

    NASA Astrophysics Data System (ADS)

    Galenko, P. K.; Hanke, R.; Paul, P.; Koch, S.; Rettenmayr, M.; Gegner, J.; Herlach, D. M.; Dreier, W.; Kharanzhevski, E. V.

    2017-04-01

    Experimental and theoretical results obtained in the MULTIPHAS-project (ESA-European Space Agency and DLR-German Aerospace Center) are critically discussed regarding solidification kinetics of congruently melting and glass forming Cu50Zr50 alloy samples. The samples are investigated during solidification using a containerless technique in the Electromagnetic Levitation Facility [1]. Applying elaborated methodologies for ground-based and microgravity experimental investigations [2], the kinetics of primary dendritic solidification is quantitatively evaluated. Electromagnetic Levitator in microgravity (parabolic flights and on board of the International Space Station) and Electrostatic Levitator on Ground are employed. The solidification kinetics is determined using a high-speed camera and applying two evaluation methods: “Frame by Frame” (FFM) and “First Frame - Last Frame” (FLM). In the theoretical interpretation of the solidification experiments, special attention is given to the behavior of the cluster structure in Cu50Zr50 samples with the increase of undercooling. Experimental results on solidification kinetics are interpreted using a theoretical model of diffusion controlled dendrite growth.

  17. Diverse molecular signatures for ribosomally ‘active’ Perkinsea in marine sediments

    PubMed Central

    2014-01-01

    Background Perkinsea are a parasitic lineage within the eukaryotic superphylum Alveolata. Recent studies making use of environmental small sub-unit ribosomal RNA gene (SSU rDNA) sequencing methodologies have detected a significant diversity and abundance of Perkinsea-like phylotypes in freshwater environments. In contrast only a few Perkinsea environmental sequences have been retrieved from marine samples and only two groups of Perkinsea have been cultured and morphologically described and these are parasites of marine molluscs or marine protists. These two marine groups form separate and distantly related phylogenetic clusters, composed of closely related lineages on SSU rDNA trees. Here, we test the hypothesis that Perkinsea are a hitherto under-sampled group in marine environments. Using 454 diversity ‘tag’ sequencing we investigate the diversity and distribution of these protists in marine sediments and water column samples taken from the Deep Chlorophyll Maximum (DCM) and sub-surface using both DNA and RNA as the source template and sampling four European offshore locations. Results We detected the presence of 265 sequences branching with known Perkinsea, the majority of them recovered from marine sediments. Moreover, 27% of these sequences were sampled from RNA derived cDNA libraries. Phylogenetic analyses classify a large proportion of these sequences into 38 cluster groups (including 30 novel marine cluster groups), which share less than 97% sequence similarity suggesting this diversity encompasses a range of biologically and ecologically distinct organisms. Conclusions These results demonstrate that the Perkinsea lineage is considerably more diverse than previously detected in marine environments. This wide diversity of Perkinsea-like protists is largely retrieved in marine sediment with a significant proportion detected in RNA derived libraries suggesting this diversity represents ribosomally ‘active’ and intact cells. Given the phylogenetic range of hosts infected by known Perkinsea parasites, these data suggest that Perkinsea either play a significant but hitherto unrecognized role as parasites in marine sediments and/or members of this group are present in the marine sediment possibly as part of the ‘seed bank’ microbial community. PMID:24779375

  18. WHISPER or SHOUT study: protocol of a cluster-randomised controlled trial assessing mHealth sexual reproductive health and nutrition interventions among female sex workers in Mombasa, Kenya

    PubMed Central

    Ampt, Frances H; Mudogo, Collins; Gichangi, Peter; Lim, Megan S C; Manguro, Griffins; Chersich, Matthew; Jaoko, Walter; Temmerman, Marleen; Laini, Marilyn; Comrie-Thomson, Liz; Stoové, Mark; Agius, Paul A; Hellard, Margaret; L’Engle, Kelly; Luchters, Stanley

    2017-01-01

    Introduction New interventions are required to reduce unintended pregnancies among female sex workers (FSWs) in low- and middle-income countries and to improve their nutritional health. Given sex workers’ high mobile phone usage, repeated exposure to short messaging service (SMS) messages could address individual and interpersonal barriers to contraceptive uptake and better nutrition. Methods In this two-arm cluster randomised trial, each arm constitutes an equal-attention control group for the other. SMS messages were developed systematically, participatory and theory-driven and cover either sexual and reproductive health (WHISPER) or nutrition (SHOUT). Messages are sent to participants 2–3 times/week for 12 months and include fact-based and motivational content as well as role model stories. Participants can send reply texts to obtain additional information. Sex work venues (clusters) in Mombasa, Kenya, were randomly sampled with a probability proportionate to venue size. Up to 10 women were recruited from each venue to enrol 860 women. FSWs aged 16–35 years, who owned a mobile phone and were not pregnant at enrolment were eligible. Structured questionnaires, pregnancy tests, HIV and syphilis rapid tests and full blood counts were performed at enrolment, with subsequent visits at 6 and 12 months. Analysis The primary outcomes of WHISPER and SHOUT are unintended pregnancy incidence and prevalence of anaemia at 12 months, respectively. Each will be compared between study groups using discrete-time survival analysis. Potential limitations Contamination may occur if participants discuss their intervention with those in the other trial arm. This is mitigated by cluster recruitment and only sampling a small proportion of sex work venues from the sampling frame. Conclusions The design allows for the simultaneous testing of two independent mHealth interventions for which messaging frequency and study procedures are identical. This trial may guide future mHealth initiatives and provide methodological insights into use of reciprocal control groups. Trial registration number ACTRN12616000852459; Pre-results. PMID:28821530

  19. The effect of clustering on lot quality assurance sampling: a probabilistic model to calculate sample sizes for quality assessments

    PubMed Central

    2013-01-01

    Background Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. Results To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations. The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. Conclusions We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs. PMID:24160725

  20. The effect of clustering on lot quality assurance sampling: a probabilistic model to calculate sample sizes for quality assessments.

    PubMed

    Hedt-Gauthier, Bethany L; Mitsunaga, Tisha; Hund, Lauren; Olives, Casey; Pagano, Marcello

    2013-10-26

    Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations.The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs.

  1. Stratified Sampling Design Based on Data Mining

    PubMed Central

    Kim, Yeonkook J.; Oh, Yoonhwan; Park, Sunghoon; Cho, Sungzoon

    2013-01-01

    Objectives To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. Methods We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. Results Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. Conclusions This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea. PMID:24175117

  2. Dimensional assessment of personality pathology in patients with eating disorders.

    PubMed

    Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L

    1999-02-22

    This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.

  3. Effectiveness of the Comprehensive Approach to Rehabilitation (CARe) methodology: design of a cluster randomized controlled trial.

    PubMed

    Bitter, Neis A; Roeg, Diana P K; van Nieuwenhuizen, Chijs; van Weeghel, Jaap

    2015-07-22

    There is an increasing amount of evidence for the effectiveness of rehabilitation interventions for people with severe mental illness (SMI). In the Netherlands, a rehabilitation methodology that is well known and often applied is the Comprehensive Approach to Rehabilitation (CARe) methodology. The overall goal of the CARe methodology is to improve the client's quality of life by supporting the client in realizing his/her goals and wishes, handling his/her vulnerability and improving the quality of his/her social environment. The methodology is strongly influenced by the concept of 'personal recovery' and the 'strengths case management model'. No controlled effect studies have been conducted hitherto regarding the CARe methodology. This study is a two-armed cluster randomized controlled trial (RCT) that will be executed in teams from three organizations for sheltered and supported housing, which provide services to people with long-term severe mental illness. Teams in the intervention group will receive the multiple-day CARe methodology training from a specialized institute and start working according the CARe Methodology guideline. Teams in the control group will continue working in their usual way. Standardized questionnaires will be completed at baseline (T0), and 10 (T1) and 20 months (T2) post baseline. Primary outcomes are recovery, social functioning and quality of life. The model fidelity of the CARe methodology will be assessed at T1 and T2. This study is the first controlled effect study on the CARe methodology and one of the few RCTs on a broad rehabilitation method or strength-based approach. This study is relevant because mental health care organizations have become increasingly interested in recovery and rehabilitation-oriented care. The trial registration number is ISRCTN77355880 .

  4. Automatic classification of atypical lymphoid B cells using digital blood image processing.

    PubMed

    Alférez, S; Merino, A; Mujica, L E; Ruiz, M; Bigorra, L; Rodellar, J

    2014-08-01

    There are automated systems for digital peripheral blood (PB) cell analysis, but they operate most effectively in nonpathological blood samples. The objective of this work was to design a methodology to improve the automatic classification of abnormal lymphoid cells. We analyzed 340 digital images of individual lymphoid cells from PB films obtained in the CellaVision DM96:150 chronic lymphocytic leukemia (CLL) cells, 100 hairy cell leukemia (HCL) cells, and 90 normal lymphocytes (N). We implemented the Watershed Transformation to segment the nucleus, the cytoplasm, and the peripheral cell region. We extracted 44 features and then the clustering Fuzzy C-Means (FCM) was applied in two steps for the lymphocyte classification. The images were automatically clustered in three groups, one of them with 98% of the HCL cells. The set of the remaining cells was clustered again using FCM and texture features. The two new groups contained 83.3% of the N cells and 71.3% of the CLL cells, respectively. The approach has been able to automatically classify with high precision three types of lymphoid cells. The addition of more descriptors and other classification techniques will allow extending the classification to other classes of atypical lymphoid cells. © 2013 John Wiley & Sons Ltd.

  5. VariantSpark: population scale clustering of genotype information.

    PubMed

    O'Brien, Aidan R; Saunders, Neil F W; Guo, Yi; Buske, Fabian A; Scott, Rodney J; Bauer, Denis C

    2015-12-10

    Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VARIANTSPARK provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VARIANTSPARK, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VARIANTSPARK is 80 % faster than the SPARK-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as ADMIXTURE, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. The benefits of speed, resource consumption and scalability enables VARIANTSPARK to open up the usage of advanced, efficient machine learning algorithms to genomic data.

  6. Quantum wavepacket ab initio molecular dynamics: an approach for computing dynamically averaged vibrational spectra including critical nuclear quantum effects.

    PubMed

    Sumner, Isaiah; Iyengar, Srinivasan S

    2007-10-18

    We have introduced a computational methodology to study vibrational spectroscopy in clusters inclusive of critical nuclear quantum effects. This approach is based on the recently developed quantum wavepacket ab initio molecular dynamics method that combines quantum wavepacket dynamics with ab initio molecular dynamics. The computational efficiency of the dynamical procedure is drastically improved (by several orders of magnitude) through the utilization of wavelet-based techniques combined with the previously introduced time-dependent deterministic sampling procedure measure to achieve stable, picosecond length, quantum-classical dynamics of electrons and nuclei in clusters. The dynamical information is employed to construct a novel cumulative flux/velocity correlation function, where the wavepacket flux from the quantized particle is combined with classical nuclear velocities to obtain the vibrational density of states. The approach is demonstrated by computing the vibrational density of states of [Cl-H-Cl]-, inclusive of critical quantum nuclear effects, and our results are in good agreement with experiment. A general hierarchical procedure is also provided, based on electronic structure harmonic frequencies, classical ab initio molecular dynamics, computation of nuclear quantum-mechanical eigenstates, and employing quantum wavepacket ab initio dynamics to understand vibrational spectroscopy in hydrogen-bonded clusters that display large degrees of anharmonicities.

  7. Legal Research in the Context of Educational Leadership and Policy Studies

    ERIC Educational Resources Information Center

    Sughrue, Jennifer; Driscoll, Lisa G.

    2012-01-01

    Legal research methodology is not included in the cluster of research and design courses offered to undergraduate and graduate students in education by traditional departments of research and foundations, so it becomes the responsibility of education law faculty to instruct students in legal methodology. This narrow corridor of opportunity for…

  8. CoINcIDE: A framework for discovery of patient subtypes across multiple datasets.

    PubMed

    Planey, Catherine R; Gevaert, Olivier

    2016-03-09

    Patient disease subtypes have the potential to transform personalized medicine. However, many patient subtypes derived from unsupervised clustering analyses on high-dimensional datasets are not replicable across multiple datasets, limiting their clinical utility. We present CoINcIDE, a novel methodological framework for the discovery of patient subtypes across multiple datasets that requires no between-dataset transformations. We also present a high-quality database collection, curatedBreastData, with over 2,500 breast cancer gene expression samples. We use CoINcIDE to discover novel breast and ovarian cancer subtypes with prognostic significance and novel hypothesized ovarian therapeutic targets across multiple datasets. CoINcIDE and curatedBreastData are available as R packages.

  9. Fuzzy Clustering Analysis in Environmental Impact Assessment--A Complement Tool to Environmental Quality Index.

    ERIC Educational Resources Information Center

    Kung, Hsiang-Te; And Others

    1993-01-01

    In spite of rapid progress achieved in the methodological research underlying environmental impact assessment (EIA), the problem of weighting various parameters has not yet been solved. This paper presents a new approach, fuzzy clustering analysis, which is illustrated with an EIA case study on Baoshan-Wusong District in Shanghai, China. (Author)

  10. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    ERIC Educational Resources Information Center

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  11. The Atacama Cosmology Telescope: Physical Properties and Purity of a Galaxy Cluster Sample Selected Via the Sunyaev-Zel'Dovich Effect

    NASA Technical Reports Server (NTRS)

    Menanteau, Felipe; Gonzalez, Jorge; Juin, Jean-Baptiste; Marriage, Tobias; Reese, Erik D.; Acquaviva, Viviana; Aguirre, Paula; Appel, John Willam; Baker, Andrew J.; Barrientos, L. Felipe; hide

    2010-01-01

    We present optical and X-ray properties for the first confirmed galaxy cluster sample selected by the Sunyaev-Zel'dovich Effect from 148 GHz maps over 455 square degrees of sky made with the Atacama Cosmology Telescope. These maps. coupled with multi-band imaging on 4-meter-class optical telescopes, have yielded a sample of 23 galaxy clusters with redshifts between 0.118 and 1.066. Of these 23 clusters, 10 are newly discovered. The selection of this sample is approximately mass limited and essentially independent of redshift. We provide optical positions, images, redshifts and X-ray fluxes and luminosities for the full sample, and X-ray temperatures of an important subset. The mass limit of the full sample is around 8.0 x 10(exp 14) Stellar Mass. with a number distribution that peaks around a redshift of 0.4. For the 10 highest significance SZE-selected cluster candidates, all of which are optically confirmed, the mass threshold is 1 x 10(exp 15) Stellar Mass and the redshift range is 0.167 to 1.066. Archival observations from Chandra, XMM-Newton. and ROSAT provide X-ray luminosities and temperatures that are broadly consistent with this mass threshold. Our optical follow-up procedure also allowed us to assess the purity of the ACT cluster sample. Eighty (one hundred) percent of the 148 GHz candidates with signal-to-noise ratios greater than 5.1 (5.7) are confirmed as massive clusters. The reported sample represents one of the largest SZE-selected sample of massive clusters over all redshifts within a cosmologically-significant survey volume, which will enable cosmological studies as well as future studies on the evolution, morphology, and stellar populations in the most massive clusters in the Universe.

  12. Workplace cluster of Bell’s palsy in Lima, Peru

    PubMed Central

    2014-01-01

    Background We report on a workplace cluster of Bell’s palsy that occurred within a four-month period in 2011 among employees of a three-story office building in Lima, Peru and our investigation to determine the etiology and associated risk factors. Findings An outbreak investigation was conducted to identify possible common infectious or environmental exposures and included patient interviews, reviews of medical records, an epidemiologic survey, serological analysis for IgM and IgG antibodies to putative Bell’s palsy-inducing pathogens, and an environmental exposure assessment of the office building. Three cases of Bell’s palsy were reported among 65 at-risk employees, attack rate 4.6%. Although two patients had underlying risk factors, there was no clear association or common identifiable risk factor among all cases. Serologic analysis showed no evidence of recent infections, and air and water sample measures of all known chemical or neurotoxins were below maximum allowable concentrations for exposure. Conclusions An infection spread among workplace employees could not be excluded as a potential cause of this cluster; however, it was unlikely a pathogen commonly associated with individual cases of Bell’s palsy. Although a specific etiology was not identified among all cases, we believe this methodology will aid future outbreak investigations of Bell’s palsy and a better understanding of its etiology. While environmental assessments may be useful in their ability to ascertain the cause of clusters of Bell’s palsy, future investigations should prioritize focus on common infectious etiology. PMID:24885256

  13. The evaluation of alternate methodologies for land cover classification in an urbanizing area

    NASA Technical Reports Server (NTRS)

    Smekofski, R. M.

    1981-01-01

    The usefulness of LANDSAT in classifying land cover and in identifying and classifying land use change was investigated using an urbanizing area as the study area. The question of what was the best technique for classification was the primary focus of the study. The many computer-assisted techniques available to analyze LANDSAT data were evaluated. Techniques of statistical training (polygons from CRT, unsupervised clustering, polygons from digitizer and binary masks) were tested with minimum distance to the mean, maximum likelihood and canonical analysis with minimum distance to the mean classifiers. The twelve output images were compared to photointerpreted samples, ground verified samples and a current land use data base. Results indicate that for a reconnaissance inventory, the unsupervised training with canonical analysis-minimum distance classifier is the most efficient. If more detailed ground truth and ground verification is available, the polygons from the digitizer training with the canonical analysis minimum distance is more accurate.

  14. Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation

    PubMed Central

    Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J

    2009-01-01

    Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. PMID:20011037

  15. Multivariate Analysis and Prediction of Dioxin-Furan ...

    EPA Pesticide Factsheets

    Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE

  16. Response to traumatic brain injury neurorehabilitation through an artificial intelligence and statistics hybrid knowledge discovery from databases methodology.

    PubMed

    Gibert, Karina; García-Rudolph, Alejandro; García-Molina, Alberto; Roig-Rovira, Teresa; Bernabeu, Montse; Tormos, José María

    2008-01-01

    Develop a classificatory tool to identify different populations of patients with Traumatic Brain Injury based on the characteristics of deficit and response to treatment. A KDD framework where first, descriptive statistics of every variable was done, data cleaning and selection of relevant variables. Then data was mined using a generalization of Clustering based on rules (CIBR), an hybrid AI and Statistics technique which combines inductive learning (AI) and clustering (Statistics). A prior Knowledge Base (KB) is considered to properly bias the clustering; semantic constraints implied by the KB hold in final clusters, guaranteeing interpretability of the resultis. A generalization (Exogenous Clustering based on rules, ECIBR) is presented, allowing to define the KB in terms of variables which will not be considered in the clustering process itself, to get more flexibility. Several tools as Class panel graph are introduced in the methodology to assist final interpretation. A set of 5 classes was recommended by the system and interpretation permitted profiles labeling. From the medical point of view, composition of classes is well corresponding with different patterns of increasing level of response to rehabilitation treatments. All the patients initially assessable conform a single group. Severe impaired patients are subdivided in four profiles which clearly distinct response patterns. Particularly interesting the partial response profile, where patients could not improve executive functions. Meaningful classes were obtained and, from a semantics point of view, the results were sensibly improved regarding classical clustering, according to our opinion that hybrid AI & Stats techniques are more powerful for KDD than pure ones.

  17. Simulation modeling for stratified breast cancer screening - a systematic review of cost and quality of life assumptions.

    PubMed

    Arnold, Matthias

    2017-12-02

    The economic evaluation of stratified breast cancer screening gains momentum, but produces also very diverse results. Systematic reviews so far focused on modeling techniques and epidemiologic assumptions. However, cost and utility parameters received only little attention. This systematic review assesses simulation models for stratified breast cancer screening based on their cost and utility parameters in each phase of breast cancer screening and care. A literature review was conducted to compare economic evaluations with simulation models of personalized breast cancer screening. Study quality was assessed using reporting guidelines. Cost and utility inputs were extracted, standardized and structured using a care delivery framework. Studies were then clustered according to their study aim and parameters were compared within the clusters. Eighteen studies were identified within three study clusters. Reporting quality was very diverse in all three clusters. Only two studies in cluster 1, four studies in cluster 2 and one study in cluster 3 scored high in the quality appraisal. In addition to the quality appraisal, this review assessed if the simulation models were consistent in integrating all relevant phases of care, if utility parameters were consistent and methodological sound and if cost were compatible and consistent in the actual parameters used for screening, diagnostic work up and treatment. Of 18 studies, only three studies did not show signs of potential bias. This systematic review shows that a closer look into the cost and utility parameter can help to identify potential bias. Future simulation models should focus on integrating all relevant phases of care, using methodologically sound utility parameters and avoiding inconsistent cost parameters.

  18. Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-04-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  19. Healthy Learning Mind - a school-based mindfulness and relaxation program: a study protocol for a cluster randomized controlled trial.

    PubMed

    Volanen, Salla-Maarit; Lassander, Maarit; Hankonen, Nelli; Santalahti, Päivi; Hintsanen, Mirka; Simonsen, Nina; Raevuori, Anu; Mullola, Sari; Vahlberg, Tero; But, Anna; Suominen, Sakari

    2016-07-11

    Mindfulness has shown positive effects on mental health, mental capacity and well-being among adult population. Among children and adolescents, previous research on the effectiveness of mindfulness interventions on health and well-being has shown promising results, but studies with methodologically sound designs have been called for. Few intervention studies in this population have compared the effectiveness of mindfulness programs to alternative intervention programs with adequate sample sizes. Our primary aim is to explore the effectiveness of a school-based mindfulness intervention program compared to a standard relaxation program among a non-clinical children and adolescent sample, and a non-treatment control group in school context. In this study, we systematically examine the effects of mindfulness intervention on mental well-being (primary outcomes being resilience; existence/absence of depressive symptoms; experienced psychological strengths and difficulties), cognitive functions, psychophysiological responses, academic achievements, and motivational determinants of practicing mindfulness. The design is a cluster randomized controlled trial with three arms (mindfulness intervention group, active control group, non-treatment group) and the sample includes 59 Finnish schools and approx. 3 000 students aged 12-15 years. Intervention consists of nine mindfulness based lessons, 45 mins per week, for 9 weeks, the dose being identical in active control group receiving standard relaxation program called Relax. The programs are delivered by 14 educated facilitators. Students, their teachers and parents will fill-in the research questionnaires before and after the intervention, and they will all be followed up 6 months after baseline. Additionally, students will be followed 12 months after baseline. For longer follow-up, consent to linking the data to the main health registers has been asked from students and their parents. The present study examines systematically the effectiveness of a school-based mindfulness program compared to a standard relaxation program, and a non-treatment control group. A strength of the current study lies in its methodologically rigorous, randomized controlled study design, which allows novel evidence on the effectiveness of mindfulness over and above a standard relaxation program. ISRCTN18642659 . Retrospectively registered 13 October 2015.

  20. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nurgaliev, D.; McDonald, M.; Benson, B. A.

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  1. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE PAGES

    Nurgaliev, D.; McDonald, M.; Benson, B. A.; ...

    2017-05-16

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  2. Re-estimating sample size in cluster randomised trials with active recruitment within clusters.

    PubMed

    van Schie, S; Moerbeek, M

    2014-08-30

    Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.

  3. A PRIOR EVALUATION OF TWO-STAGE CLUSTER SAMPLING FOR ACCURACY ASSESSMENT OF LARGE-AREA LAND-COVER MAPS

    EPA Science Inventory

    Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, withi...

  4. 2-Way k-Means as a Model for Microbiome Samples.

    PubMed

    Jackson, Weston J; Agarwal, Ipsita; Pe'er, Itsik

    2017-01-01

    Motivation . Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k -means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project.

  5. 2-Way k-Means as a Model for Microbiome Samples

    PubMed Central

    2017-01-01

    Motivation. Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k-means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project. PMID:29177026

  6. Evaluation of the procedure 1A component of the 1980 US/Canada wheat and barley exploratory experiment

    NASA Technical Reports Server (NTRS)

    Chapman, G. M. (Principal Investigator); Carnes, J. G.

    1981-01-01

    Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.

  7. X-Ray Morphological Analysis of the Planck ESZ Clusters

    NASA Astrophysics Data System (ADS)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev-Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev-Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  8. X-Ray Morphological Analysis of the Planck ESZ Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper wemore » determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.« less

  9. Sampling procedures for inventory of commercial volume tree species in Amazon Forest.

    PubMed

    Netto, Sylvio P; Pelissari, Allan L; Cysneiros, Vinicius C; Bonazza, Marcelo; Sanquetta, Carlos R

    2017-01-01

    The spatial distribution of tropical tree species can affect the consistency of the estimators in commercial forest inventories, therefore, appropriate sampling procedures are required to survey species with different spatial patterns in the Amazon Forest. For this, the present study aims to evaluate the conventional sampling procedures and introduce the adaptive cluster sampling for volumetric inventories of Amazonian tree species, considering the hypotheses that the density, the spatial distribution and the zero-plots affect the consistency of the estimators, and that the adaptive cluster sampling allows to obtain more accurate volumetric estimation. We use data from a census carried out in Jamari National Forest, Brazil, where trees with diameters equal to or higher than 40 cm were measured in 1,355 plots. Species with different spatial patterns were selected and sampled with simple random sampling, systematic sampling, linear cluster sampling and adaptive cluster sampling, whereby the accuracy of the volumetric estimation and presence of zero-plots were evaluated. The sampling procedures applied to species were affected by the low density of trees and the large number of zero-plots, wherein the adaptive clusters allowed concentrating the sampling effort in plots with trees and, thus, agglutinating more representative samples to estimate the commercial volume.

  10. Adaptive Cluster Sampling for Forest Inventories

    Treesearch

    Francis A. Roesch

    1993-01-01

    Adaptive cluster sampling is shown to be a viable alternative for sampling forests when there are rare characteristics of the forest trees which are of interest and occur on clustered trees. The ideas of recent work in Thompson (1990) have been extended to the case in which the initial sample is selected with unequal probabilities. An example is given in which the...

  11. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    PubMed

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  12. Searching for the 3.5 keV Line in the Stacked Suzaku Observations of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Bulbul, Esra; Markevitch, Maxim; Foster, Adam; Miller, Eric; Bautz, Mark; Lowenstein, Mike; Randall, Scott W.; Smith, Randall K.

    2016-01-01

    We perform a detailed study of the stacked Suzaku observations of 47 galaxy clusters, spanning a redshift range of 0.01-0.45, to search for the unidentified 3.5 keV line. This sample provides an independent test for the previously detected line. We detect a 2sigma-significant spectral feature at 3.5 keV in the spectrum of the full sample. When the sample is divided into two subsamples (cool-core and non-cool core clusters), the cool-core subsample shows no statistically significant positive residuals at the line energy. A very weak (approx. 2sigma confidence) spectral feature at 3.5 keV is permitted by the data from the non-cool-core clusters sample. The upper limit on a neutrino decay mixing angle of sin(sup 2)(2theta) = 6.1 x 10(exp -11) from the full Suzaku sample is consistent with the previous detections in the stacked XMM-Newton sample of galaxy clusters (which had a higher statistical sensitivity to faint lines), M31, and Galactic center, at a 90% confidence level. However, the constraint from the present sample, which does not include the Perseus cluster, is in tension with previously reported line flux observed in the core of the Perseus cluster with XMM-Newton and Suzaku.

  13. The Meaning of Work among Chinese University Students: Findings from Prototype Research Methodology

    ERIC Educational Resources Information Center

    Zhou, Sili; Leung, S. Alvin; Li, Xu

    2012-01-01

    This study examined Chinese university students' conceptualization of the meaning of work. One hundred and ninety students (93 male, 97 female) from Beijing, China, participated in the study. Prototype research methodology (J. Li, 2001) was used to explore the meaning of work and the associations among the identified meanings. Cluster analysis was…

  14. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.

    Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,”more » has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer« less

  15. CA II TRIPLET SPECTROSCOPY OF SMALL MAGELLANIC CLOUD RED GIANTS. III. ABUNDANCES AND VELOCITIES FOR A SAMPLE OF 14 CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parisi, M. C.; Clariá, J. J.; Marcionni, N.

    2015-05-15

    We obtained spectra of red giants in 15 Small Magellanic Cloud (SMC) clusters in the region of the Ca ii lines with FORS2 on the Very Large Telescope. We determined the mean metallicity and radial velocity with mean errors of 0.05 dex and 2.6 km s{sup −1}, respectively, from a mean of 6.5 members per cluster. One cluster (B113) was too young for a reliable metallicity determination and was excluded from the sample. We combined the sample studied here with 15 clusters previously studied by us using the same technique, and with 7 clusters whose metallicities determined by other authorsmore » are on a scale similar to ours. This compilation of 36 clusters is the largest SMC cluster sample currently available with accurate and homogeneously determined metallicities. We found a high probability that the metallicity distribution is bimodal, with potential peaks at −1.1 and −0.8 dex. Our data show no strong evidence of a metallicity gradient in the SMC clusters, somewhat at odds with recent evidence from Ca ii triplet spectra of a large sample of field stars. This may be revealing possible differences in the chemical history of clusters and field stars. Our clusters show a significant dispersion of metallicities, whatever age is considered, which could be reflecting the lack of a unique age–metallicity relation in this galaxy. None of the chemical evolution models currently available in the literature satisfactorily represents the global chemical enrichment processes of SMC clusters.« less

  16. Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization.

    PubMed

    Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa

    2017-01-01

    The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.

  17. Spectroscopic characterization of galaxy clusters in RCS-1: spectroscopic confirmation, redshift accuracy, and dynamical mass-richness relation

    NASA Astrophysics Data System (ADS)

    Gilbank, David G.; Barrientos, L. Felipe; Ellingson, Erica; Blindert, Kris; Yee, H. K. C.; Anguita, T.; Gladders, M. D.; Hall, P. B.; Hertling, G.; Infante, L.; Yan, R.; Carrasco, M.; Garcia-Vergara, Cristina; Dawson, K. S.; Lidman, C.; Morokuma, T.

    2018-05-01

    We present follow-up spectroscopic observations of galaxy clusters from the first Red-sequence Cluster Survey (RCS-1). This work focuses on two samples, a lower redshift sample of ˜30 clusters ranging in redshift from z ˜ 0.2-0.6 observed with multiobject spectroscopy (MOS) on 4-6.5-m class telescopes and a z ˜ 1 sample of ˜10 clusters 8-m class telescope observations. We examine the detection efficiency and redshift accuracy of the now widely used red-sequence technique for selecting clusters via overdensities of red-sequence galaxies. Using both these data and extended samples including previously published RCS-1 spectroscopy and spectroscopic redshifts from SDSS, we find that the red-sequence redshift using simple two-filter cluster photometric redshifts is accurate to σz ≈ 0.035(1 + z) in RCS-1. This accuracy can potentially be improved with better survey photometric calibration. For the lower redshift sample, ˜5 per cent of clusters show some (minor) contamination from secondary systems with the same red-sequence intruding into the measurement aperture of the original cluster. At z ˜ 1, the rate rises to ˜20 per cent. Approximately ten per cent of projections are expected to be serious, where the two components contribute significant numbers of their red-sequence galaxies to another cluster. Finally, we present a preliminary study of the mass-richness calibration using velocity dispersions to probe the dynamical masses of the clusters. We find a relation broadly consistent with that seen in the local universe from the WINGS sample at z ˜ 0.05.

  18. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    PubMed

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  19. Applying a Resources Framework to Analysis of the Force and Motion Conceptual Evaluation

    ERIC Educational Resources Information Center

    Smith, Trevor I.; Wittman, Michael C.

    2008-01-01

    We suggest one redefinition of common clusters of questions used to analyze student responses on the Force and Motion Conceptual Evaluation. Our goal is to propose a methodology that moves beyond an analysis of student learning defined by correct responses, either on the overall test or on clusters of questions defined solely by content. We use…

  20. The clustering of galaxies in the completed SDSS-III Baryon Oscillation Spectroscopic Survey: combining correlated Gaussian posterior distributions

    DOE PAGES

    Sánchez, Ariel G.; Grieb, Jan Niklas; Salazar-Albornoz, Salvador; ...

    2016-09-30

    The cosmological information contained in anisotropic galaxy clustering measurements can often be compressed into a small number of parameters whose posterior distribution is well described by a Gaussian. Here, we present a general methodology to combine these estimates into a single set of consensus constraints that encode the total information of the individual measurements, taking into account the full covariance between the different methods. We also illustrate this technique by applying it to combine the results obtained from different clustering analyses, including measurements of the signature of baryon acoustic oscillations and redshift-space distortions, based on a set of mock cataloguesmore » of the final SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS). Our results show that the region of the parameter space allowed by the consensus constraints is smaller than that of the individual methods, highlighting the importance of performing multiple analyses on galaxy surveys even when the measurements are highly correlated. Our paper is part of a set that analyses the final galaxy clustering data set from BOSS. The methodology presented here is used in Alam et al. to produce the final cosmological constraints from BOSS.« less

  1. Clustering Molecular Dynamics Trajectories for Optimizing Docking Experiments

    PubMed Central

    De Paris, Renata; Quevedo, Christian V.; Ruiz, Duncan D.; Norberto de Souza, Osmar; Barros, Rodrigo C.

    2015-01-01

    Molecular dynamics simulations of protein receptors have become an attractive tool for rational drug discovery. However, the high computational cost of employing molecular dynamics trajectories in virtual screening of large repositories threats the feasibility of this task. Computational intelligence techniques have been applied in this context, with the ultimate goal of reducing the overall computational cost so the task can become feasible. Particularly, clustering algorithms have been widely used as a means to reduce the dimensionality of molecular dynamics trajectories. In this paper, we develop a novel methodology for clustering entire trajectories using structural features from the substrate-binding cavity of the receptor in order to optimize docking experiments on a cloud-based environment. The resulting partition was selected based on three clustering validity criteria, and it was further validated by analyzing the interactions between 20 ligands and a fully flexible receptor (FFR) model containing a 20 ns molecular dynamics simulation trajectory. Our proposed methodology shows that taking into account features of the substrate-binding cavity as input for the k-means algorithm is a promising technique for accurately selecting ensembles of representative structures tailored to a specific ligand. PMID:25873944

  2. Time-Resolved Transposon Insertion Sequencing Reveals Genome-Wide Fitness Dynamics during Infection.

    PubMed

    Yang, Guanhua; Billings, Gabriel; Hubbard, Troy P; Park, Joseph S; Yin Leung, Ka; Liu, Qin; Davis, Brigid M; Zhang, Yuanxing; Wang, Qiyao; Waldor, Matthew K

    2017-10-03

    Transposon insertion sequencing (TIS) is a powerful high-throughput genetic technique that is transforming functional genomics in prokaryotes, because it enables genome-wide mapping of the determinants of fitness. However, current approaches for analyzing TIS data assume that selective pressures are constant over time and thus do not yield information regarding changes in the genetic requirements for growth in dynamic environments (e.g., during infection). Here, we describe structured analysis of TIS data collected as a time series, termed pattern analysis of conditional essentiality (PACE). From a temporal series of TIS data, PACE derives a quantitative assessment of each mutant's fitness over the course of an experiment and identifies mutants with related fitness profiles. In so doing, PACE circumvents major limitations of existing methodologies, specifically the need for artificial effect size thresholds and enumeration of bacterial population expansion. We used PACE to analyze TIS samples of Edwardsiella piscicida (a fish pathogen) collected over a 2-week infection period from a natural host (the flatfish turbot). PACE uncovered more genes that affect E. piscicida 's fitness in vivo than were detected using a cutoff at a terminal sampling point, and it identified subpopulations of mutants with distinct fitness profiles, one of which informed the design of new live vaccine candidates. Overall, PACE enables efficient mining of time series TIS data and enhances the power and sensitivity of TIS-based analyses. IMPORTANCE Transposon insertion sequencing (TIS) enables genome-wide mapping of the genetic determinants of fitness, typically based on observations at a single sampling point. Here, we move beyond analysis of endpoint TIS data to create a framework for analysis of time series TIS data, termed pattern analysis of conditional essentiality (PACE). We applied PACE to identify genes that contribute to colonization of a natural host by the fish pathogen Edwardsiella piscicida. PACE uncovered more genes that affect E. piscicida 's fitness in vivo than were detected using a terminal sampling point, and its clustering of mutants with related fitness profiles informed design of new live vaccine candidates. PACE yields insights into patterns of fitness dynamics and circumvents major limitations of existing methodologies. Finally, the PACE method should be applicable to additional "omic" time series data, including screens based on clustered regularly interspaced short palindromic repeats with Cas9 (CRISPR/Cas9). Copyright © 2017 Yang et al.

  3. Spatial cluster detection for repeatedly measured outcomes while accounting for residential history.

    PubMed

    Cook, Andrea J; Gold, Diane R; Li, Yi

    2009-10-01

    Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.

  4. X-ray and optical substructures of the DAFT/FADA survey clusters

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.

    2013-04-01

    We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.

  5. Clustering approaches to feature change detection

    NASA Astrophysics Data System (ADS)

    G-Michael, Tesfaye; Gunzburger, Max; Peterson, Janet

    2018-05-01

    The automated detection of changes occurring between multi-temporal images is of significant importance in a wide range of medical, environmental, safety, as well as many other settings. The usage of k-means clustering is explored as a means for detecting objects added to a scene. The silhouette score for the clustering is used to define the optimal number of clusters that should be used. For simple images having a limited number of colors, new objects can be detected by examining the change between the optimal number of clusters for the original and modified images. For more complex images, new objects may need to be identified by examining the relative areas covered by corresponding clusters in the original and modified images. Which method is preferable depends on the composition and range of colors present in the images. In addition to describing the clustering and change detection methodology of our proposed approach, we provide some simple illustrations of its application.

  6. The methodology of multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    One of the greatest challenges facing the software engineering community is the ability to produce large and complex computer systems, such as ground support systems for unmanned scientific missions, that are reliable and cost effective. In order to build and maintain these systems, it is important that the knowledge in the system be suitably abstracted, structured, and otherwise clustered in a manner which facilitates its understanding, manipulation, testing, and utilization. Development of complex mission-critical systems will require the ability to abstract overall concepts in the system at various levels of detail and to consider the system from different points of view. Multi-ViewPoint - Clustering Analysis MVP-CA methodology has been developed to provide multiple views of large, complicated systems. MVP-CA provides an ability to discover significant structures by providing an automated mechanism to structure both hierarchically (from detail to abstract) and orthogonally (from different perspectives). We propose to integrate MVP/CA into an overall software engineering life cycle to support the development and evolution of complex mission critical systems.

  7. Variation in Research Designs Used to Test the Effectiveness of Dissemination and Implementation Strategies: A Review.

    PubMed

    Mazzucca, Stephanie; Tabak, Rachel G; Pilar, Meagan; Ramsey, Alex T; Baumann, Ana A; Kryzer, Emily; Lewis, Ericka M; Padek, Margaret; Powell, Byron J; Brownson, Ross C

    2018-01-01

    The need for optimal study designs in dissemination and implementation (D&I) research is increasingly recognized. Despite the wide range of study designs available for D&I research, we lack understanding of the types of designs and methodologies that are routinely used in the field. This review assesses the designs and methodologies in recently proposed D&I studies and provides resources to guide design decisions. We reviewed 404 study protocols published in the journal Implementation Science from 2/2006 to 9/2017. Eligible studies tested the efficacy or effectiveness of D&I strategies (i.e., not effectiveness of the underlying clinical or public health intervention); had a comparison by group and/or time; and used ≥1 quantitative measure. Several design elements were extracted: design category (e.g., randomized); design type [e.g., cluster randomized controlled trial (RCT)]; data type (e.g., quantitative); D&I theoretical framework; levels of treatment assignment, intervention, and measurement; and country in which the research was conducted. Each protocol was double-coded, and discrepancies were resolved through discussion. Of the 404 protocols reviewed, 212 (52%) studies tested one or more implementation strategy across 208 manuscripts, therefore meeting inclusion criteria. Of the included studies, 77% utilized randomized designs, primarily cluster RCTs. The use of alternative designs (e.g., stepped wedge) increased over time. Fewer studies were quasi-experimental (17%) or observational (6%). Many study design categories (e.g., controlled pre-post, matched pair cluster design) were represented by only one or two studies. Most articles proposed quantitative and qualitative methods (61%), with the remaining 39% proposing only quantitative. Half of protocols (52%) reported using a theoretical framework to guide the study. The four most frequently reported frameworks were Consolidated Framework for Implementing Research and RE-AIM ( n  = 16 each), followed by Promoting Action on Research Implementation in Health Services and Theoretical Domains Framework ( n  = 12 each). While several novel designs for D&I research have been proposed (e.g., stepped wedge, adaptive designs), the majority of the studies in our sample employed RCT designs. Alternative study designs are increasing in use but may be underutilized for a variety of reasons, including preference of funders or lack of awareness of these designs. Promisingly, the prevalent use of quantitative and qualitative methods together reflects methodological innovation in newer D&I research.

  8. Brief report: Academic amotivation in light of the dark side of identity formation.

    PubMed

    Cannard, Christine; Lannegrand-Willems, Lyda; Safont-Mottay, Claire; Zimmermann, Grégoire

    2016-02-01

    The study intended to determine motivational profiles of first-year undergraduates and aimed their characterization in terms of identity processes. First, a cluster analysis revealed five motivational profiles: combined (i.e., high quantity of motivation, low amotivation); intrinsic (i.e., high intrinsic, low introjected and external regulation, low amotivation); "demotivated" (i.e., very low quantity of motivation and amotivation); extrinsic (i.e., high extrinsic and identified regulation and low intrinsic and amotivation); and "amotivated" (i.e., low intrinsic and identified, very high amotivation). Second, using Lebart's (2000) methodology, the most characteristic identity processes were listed for each motivational cluster. Demotivated and amotivated profiles were refined in terms of adaptive and maladaptive forms of exploration. Notably, exploration in breadth and in depth were underrepresented in demotivated students compared to the total sample; commitment and ruminative exploration were under and overrepresented respectively in amotivated students. Educational and clinical implications are proposed and future research is suggested. Copyright © 2015 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  9. Comorbid forms of psychopathology: key patterns and future research directions.

    PubMed

    Cerdá, Magdalena; Sagdeo, Aditi; Galea, Sandro

    2008-01-01

    The purpose of this review is to systematically appraise the peer-reviewed literature about clustered forms of psychopathology and to present a framework that can be useful for studying comorbid psychiatric disorders. The review focuses on four of the most prevalent types of mental health problems: anxiety, depression, conduct disorder, and substance abuse. The authors summarize existing empirical research on the distribution of concurrent and sequential comorbidity in children and adolescents and in adults, and they review existing knowledge about exogenous risk factors that influence comorbidity. The authors include articles that used a longitudinal study design and used psychiatric definitions of the disorders. A total of 58 articles met the inclusion criteria and were assessed. Current evidence demonstrates a reciprocal, sequential relation between most comorbid pairs, although the mechanisms that mediate such links remain to be explained. Methodological concerns include the inconsistency of measurement of the disorders across studies, small sample sizes, and restricted follow-up times. Given the significant mental health burden placed by comorbid disorders, and their high prevalence across populations, research on the key risk factors for clustering of psychopathology is needed.

  10. An improved initialization center k-means clustering algorithm based on distance and density

    NASA Astrophysics Data System (ADS)

    Duan, Yanling; Liu, Qun; Xia, Shuyin

    2018-04-01

    Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.

  11. Weak lensing magnification of SpARCS galaxy clusters

    NASA Astrophysics Data System (ADS)

    Tudorica, A.; Hildebrandt, H.; Tewes, M.; Hoekstra, H.; Morrison, C. B.; Muzzin, A.; Wilson, G.; Yee, H. K. C.; Lidman, C.; Hicks, A.; Nantais, J.; Erben, T.; van der Burg, R. F. J.; Demarco, R.

    2017-12-01

    Context. Measuring and calibrating relations between cluster observables is critical for resource-limited studies. The mass-richness relation of clusters offers an observationally inexpensive way of estimating masses. Its calibration is essential for cluster and cosmological studies, especially for high-redshift clusters. Weak gravitational lensing magnification is a promising and complementary method to shear studies, that can be applied at higher redshifts. Aims: We aim to employ the weak lensing magnification method to calibrate the mass-richness relation up to a redshift of 1.4. We used the Spitzer Adaptation of the Red-Sequence Cluster Survey (SpARCS) galaxy cluster candidates (0.2 < z < 1.4) and optical data from the Canada France Hawaii Telescope (CFHT) to test whether magnification can be effectively used to constrain the mass of high-redshift clusters. Methods: Lyman-break galaxies (LBGs) selected using the u-band dropout technique and their colours were used as a background sample of sources. LBG positions were cross-correlated with the centres of the sample of SpARCS clusters to estimate the magnification signal, which was optimally-weighted using an externally-calibrated LBG luminosity function. The signal was measured for cluster sub-samples, binned in both redshift and richness. Results: We measured the cross-correlation between the positions of galaxy cluster candidates and LBGs and detected a weak lensing magnification signal for all bins at a detection significance of 2.6-5.5σ. In particular, the significance of the measurement for clusters with z> 1.0 is 4.1σ; for the entire cluster sample we obtained an average M200 of 1.28 -0.21+0.23 × 1014 M⊙. Conclusions: Our measurements demonstrated the feasibility of using weak lensing magnification as a viable tool for determining the average halo masses for samples of high redshift galaxy clusters. The results also established the success of using galaxy over-densities to select massive clusters at z > 1. Additional studies are necessary for further modelling of the various systematic effects we discussed.

  12. High Prevalence of Intermediate Leptospira spp. DNA in Febrile Humans from Urban and Rural Ecuador.

    PubMed

    Chiriboga, Jorge; Barragan, Verónica; Arroyo, Gabriela; Sosa, Andrea; Birdsell, Dawn N; España, Karool; Mora, Ana; Espín, Emilia; Mejía, María Eugenia; Morales, Melba; Pinargote, Carmina; Gonzalez, Manuel; Hartskeerl, Rudy; Keim, Paul; Bretas, Gustavo; Eisenberg, Joseph N S; Trueba, Gabriel

    2015-12-01

    Leptospira spp., which comprise 3 clusters (pathogenic, saprophytic, and intermediate) that vary in pathogenicity, infect >1 million persons worldwide each year. The disease burden of the intermediate leptospires is unclear. To increase knowledge of this cluster, we used new molecular approaches to characterize Leptospira spp. in 464 samples from febrile patients in rural, semiurban, and urban communities in Ecuador; in 20 samples from nonfebrile persons in the rural community; and in 206 samples from animals in the semiurban community. We observed a higher percentage of leptospiral DNA-positive samples from febrile persons in rural (64%) versus urban (21%) and semiurban (25%) communities; no leptospires were detected in nonfebrile persons. The percentage of intermediate cluster strains in humans (96%) was higher than that of pathogenic cluster strains (4%); strains in animal samples belonged to intermediate (49%) and pathogenic (51%) clusters. Intermediate cluster strains may be causing a substantial amount of fever in coastal Ecuador.

  13. High Prevalence of Intermediate Leptospira spp. DNA in Febrile Humans from Urban and Rural Ecuador

    PubMed Central

    Chiriboga, Jorge; Barragan, Verónica; Arroyo, Gabriela; Sosa, Andrea; Birdsell, Dawn N.; España, Karool; Mora, Ana; Espín, Emilia; Mejía, María Eugenia; Morales, Melba; Pinargote, Carmina; Gonzalez, Manuel; Hartskeerl, Rudy; Keim, Paul; Bretas, Gustavo; Eisenberg, Joseph N.S.

    2015-01-01

    Leptospira spp., which comprise 3 clusters (pathogenic, saprophytic, and intermediate) that vary in pathogenicity, infect >1 million persons worldwide each year. The disease burden of the intermediate leptospires is unclear. To increase knowledge of this cluster, we used new molecular approaches to characterize Leptospira spp. in 464 samples from febrile patients in rural, semiurban, and urban communities in Ecuador; in 20 samples from nonfebrile persons in the rural community; and in 206 samples from animals in the semiurban community. We observed a higher percentage of leptospiral DNA–positive samples from febrile persons in rural (64%) versus urban (21%) and semiurban (25%) communities; no leptospires were detected in nonfebrile persons. The percentage of intermediate cluster strains in humans (96%) was higher than that of pathogenic cluster strains (4%); strains in animal samples belonged to intermediate (49%) and pathogenic (51%) clusters. Intermediate cluster strains may be causing a substantial amount of fever in coastal Ecuador. PMID:26583534

  14. An approach to accidents modeling based on compounds road environments.

    PubMed

    Fernandes, Ana; Neves, Jose

    2013-04-01

    The most common approach to study the influence of certain road features on accidents has been the consideration of uniform road segments characterized by a unique feature. However, when an accident is related to the road infrastructure, its cause is usually not a single characteristic but rather a complex combination of several characteristics. The main objective of this paper is to describe a methodology developed in order to consider the road as a complete environment by using compound road environments, overcoming the limitations inherented in considering only uniform road segments. The methodology consists of: dividing a sample of roads into segments; grouping them into quite homogeneous road environments using cluster analysis; and identifying the influence of skid resistance and texture depth on road accidents in each environment by using generalized linear models. The application of this methodology is demonstrated for eight roads. Based on real data from accidents and road characteristics, three compound road environments were established where the pavement surface properties significantly influence the occurrence of accidents. Results have showed clearly that road environments where braking maneuvers are more common or those with small radii of curvature and high speeds require higher skid resistance and texture depth as an important contribution to the accident prevention. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Optimising cluster survey design for planning schistosomiasis preventive chemotherapy

    PubMed Central

    Sturrock, Hugh J. W.; Turner, Hugo; Whitton, Jane M.; Gower, Charlotte M.; Jemu, Samuel; Phillips, Anna E.; Meite, Aboulaye; Thomas, Brent; Kollie, Karsor; Thomas, Catherine; Rebollo, Maria P.; Styles, Ben; Clements, Michelle; Fenwick, Alan; Harrison, Wendy E.; Fleming, Fiona M.

    2017-01-01

    Background The cornerstone of current schistosomiasis control programmes is delivery of praziquantel to at-risk populations. Such preventive chemotherapy requires accurate information on the geographic distribution of infection, yet the performance of alternative survey designs for estimating prevalence and converting this into treatment decisions has not been thoroughly evaluated. Methodology/Principal findings We used baseline schistosomiasis mapping surveys from three countries (Malawi, Côte d’Ivoire and Liberia) to generate spatially realistic gold standard datasets, against which we tested alternative two-stage cluster survey designs. We assessed how sampling different numbers of schools per district (2–20) and children per school (10–50) influences the accuracy of prevalence estimates and treatment class assignment, and we compared survey cost-efficiency using data from Malawi. Due to the focal nature of schistosomiasis, up to 53% simulated surveys involving 2–5 schools per district failed to detect schistosomiasis in low endemicity areas (1–10% prevalence). Increasing the number of schools surveyed per district improved treatment class assignment far more than increasing the number of children sampled per school. For Malawi, surveys of 15 schools per district and 20–30 children per school reliably detected endemic schistosomiasis and maximised cost-efficiency. In sensitivity analyses where treatment costs and the country considered were varied, optimal survey size was remarkably consistent, with cost-efficiency maximised at 15–20 schools per district. Conclusions/Significance Among two-stage cluster surveys for schistosomiasis, our simulations indicated that surveying 15–20 schools per district and 20–30 children per school optimised cost-efficiency and minimised the risk of under-treatment, with surveys involving more schools of greater cost-efficiency as treatment costs rose. PMID:28552961

  16. Noninvasive measurement of plasma glucose from exhaled breath in healthy and type 1 diabetic subjects

    PubMed Central

    Oliver, Stacy R.; Ngo, Jerry; Flores, Rebecca; Midyett, Jason; Meinardi, Simone; Carlson, Matthew K.; Rowland, F. Sherwood; Blake, Donald R.; Galassetti, Pietro R.

    2011-01-01

    Effective management of diabetes mellitus, affecting tens of millions of patients, requires frequent assessment of plasma glucose. Patient compliance for sufficient testing is often reduced by the unpleasantness of current methodologies, which require blood samples and often cause pain and skin callusing. We propose that the analysis of volatile organic compounds (VOCs) in exhaled breath can be used as a novel, alternative, noninvasive means to monitor glycemia in these patients. Seventeen healthy (9 females and 8 males, 28.0 ± 1.0 yr) and eight type 1 diabetic (T1DM) volunteers (5 females and 3 males, 25.8 ± 1.7 yr) were enrolled in a 240-min triphasic intravenous dextrose infusion protocol (baseline, hyperglycemia, euglycemia-hyperinsulinemia). In T1DM patients, insulin was also administered (using differing protocols on 2 repeated visits to separate the effects of insulinemia on breath composition). Exhaled breath and room air samples were collected at 12 time points, and concentrations of ∼100 VOCs were determined by gas chromatography and matched with direct plasma glucose measurements. Standard least squares regression was used on several subsets of exhaled gases to generate multilinear models to predict plasma glucose for each subject. Plasma glucose estimates based on two groups of four gases each (cluster A: acetone, methyl nitrate, ethanol, and ethyl benzene; cluster B: 2-pentyl nitrate, propane, methanol, and acetone) displayed very strong correlations with glucose concentrations (0.883 and 0.869 for clusters A and B, respectively) across nearly 300 measurements. Our study demonstrates the feasibility to accurately predict glycemia through exhaled breath analysis over a broad range of clinically relevant concentrations in both healthy and T1DM subjects. PMID:21467303

  17. Active learning for semi-supervised clustering based on locally linear propagation reconstruction.

    PubMed

    Chang, Chin-Chun; Lin, Po-Yi

    2015-03-01

    The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  19. "A Richness Study of 14 Distant X-Ray Clusters from the 160 Square Degree Survey"

    NASA Technical Reports Server (NTRS)

    Jones, Christine; West, Donald (Technical Monitor)

    2001-01-01

    We have measured the surface density of galaxies toward 14 X-ray-selected cluster candidates at redshifts z(sub i) 0.46, and we show that they are associated with rich galaxy concentrations. These clusters, having X-ray luminosities of Lx(0.5-2 keV) approx. (0.5 - 2.6) x 10(exp 44) ergs/ sec are among the most distant and luminous in our 160 deg(exp 2) ROSAT Position Sensitive Proportional Counter cluster survey. We find that the clusters range between Abell richness classes 0 and 2 and have a most probable richness class of 1. We compare the richness distribution of our distant clusters to those for three samples of nearby clusters with similar X-ray luminosities. We find that the nearby and distant samples have similar richness distributions, which shows that clusters have apparently not evolved substantially in richness since redshift z=0.5. There is, however, a marginal tendency for the distant clusters to be slightly poorer than nearby clusters, although deeper multicolor data for a large sample would be required to confirm this trend. We compare the distribution of distant X-ray clusters in the L(sub X)-richness plane to the distribution of optically selected clusters from the Palomar Distant Cluster Survey. The optically selected clusters appear overly rich for their X-ray luminosities, when compared to X-ray-selected clusters. Apparently, X-ray and optical surveys do not necessarily sample identical mass concentrations at large redshifts. This may indicate the existence of a population of optically rich clusters with anomalously low X-ray emission, More likely, however, it reflects the tendency for optical surveys to select unvirialized mass concentrations, as might be expected when peering along large-scale filaments.

  20. The XXL survey XV: evidence for dry merger driven BCG growth in XXL-100-GC X-ray clusters

    NASA Astrophysics Data System (ADS)

    Lavoie, S.; Willis, J. P.; Démoclès, J.; Eckert, D.; Gastaldello, F.; Smith, G. P.; Lidman, C.; Adami, C.; Pacaud, F.; Pierre, M.; Clerc, N.; Giles, P.; Lieu, M.; Chiappetti, L.; Altieri, B.; Ardila, F.; Baldry, I.; Bongiorno, A.; Desai, S.; Elyiv, A.; Faccioli, L.; Gardner, B.; Garilli, B.; Groote, M. W.; Guennou, L.; Guzzo, L.; Hopkins, A. M.; Liske, J.; McGee, S.; Melnyk, O.; Owers, M. S.; Poggianti, B.; Ponman, T. J.; Scodeggio, M.; Spitler, L.; Tuffs, R. J.

    2016-11-01

    The growth of brightest cluster galaxies (BCGs) is closely related to the properties of their host cluster. We present evidence for dry mergers as the dominant source of BCG mass growth at z ≲ 1 in the XXL 100 brightest cluster sample. We use the global red sequence, Hα emission and mean star formation history to show that BCGs in the sample possess star formation levels comparable to field ellipticals of similar stellar mass and redshift. XXL 100 brightest clusters are less massive on average than those in other X-ray selected samples such as LoCuSS or HIFLUGCS. Few clusters in the sample display high central gas concentration, rendering inefficient the growth of BCGs via star formation resulting from the accretion of cool gas. Using measures of the relaxation state of their host clusters, we show that BCGs grow as relaxation proceeds. We find that the BCG stellar mass corresponds to a relatively constant fraction 1 per cent of the total cluster mass in relaxed systems. We also show that, following a cluster scale merger event, the BCG stellar mass lags behind the expected value from the Mcluster-MBCG relation but subsequently accretes stellar mass via dry mergers as the BCG and cluster evolve towards a relaxed state.

  1. A Wavelet-Based Methodology for Grinding Wheel Condition Monitoring

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, T. W.; Ting, C.F.; Qu, Jun

    2007-01-01

    Grinding wheel surface condition changes as more material is removed. This paper presents a wavelet-based methodology for grinding wheel condition monitoring based on acoustic emission (AE) signals. Grinding experiments in creep feed mode were conducted to grind alumina specimens with a resinoid-bonded diamond wheel using two different conditions. During the experiments, AE signals were collected when the wheel was 'sharp' and when the wheel was 'dull'. Discriminant features were then extracted from each raw AE signal segment using the discrete wavelet decomposition procedure. An adaptive genetic clustering algorithm was finally applied to the extracted features in order to distinguish differentmore » states of grinding wheel condition. The test results indicate that the proposed methodology can achieve 97% clustering accuracy for the high material removal rate condition, 86.7% for the low material removal rate condition, and 76.7% for the combined grinding conditions if the base wavelet, the decomposition level, and the GA parameters are properly selected.« less

  2. The Atacama Cosmology Telescope: Cosmology from Galaxy Clusters Detected Via the Sunyaev-Zel'dovich Effect

    NASA Technical Reports Server (NTRS)

    Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; hide

    2010-01-01

    We present constraints on cosmological parameters based on a sample of Sunyaev-Zel'dovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148 GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives (sigma)8 = 0.851 +/- 0.115 and w = -1.14 +/- 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find (sigma)8 + 0.821 +/- 0.044 and w = -1.05 +/- 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernova which give (sigma)8 = 0.802 +/- 0.038 and w = -0.98 +/- 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.

  3. One-step estimation of networked population size: Respondent-driven capture-recapture with anonymity.

    PubMed

    Khan, Bilal; Lee, Hsuan-Wei; Fellows, Ian; Dombrowski, Kirk

    2018-01-01

    Size estimation is particularly important for populations whose members experience disproportionate health issues or pose elevated health risks to the ambient social structures in which they are embedded. Efforts to derive size estimates are often frustrated when the population is hidden or hard-to-reach in ways that preclude conventional survey strategies, as is the case when social stigma is associated with group membership or when group members are involved in illegal activities. This paper extends prior research on the problem of network population size estimation, building on established survey/sampling methodologies commonly used with hard-to-reach groups. Three novel one-step, network-based population size estimators are presented, for use in the context of uniform random sampling, respondent-driven sampling, and when networks exhibit significant clustering effects. We give provably sufficient conditions for the consistency of these estimators in large configuration networks. Simulation experiments across a wide range of synthetic network topologies validate the performance of the estimators, which also perform well on a real-world location-based social networking data set with significant clustering. Finally, the proposed schemes are extended to allow them to be used in settings where participant anonymity is required. Systematic experiments show favorable tradeoffs between anonymity guarantees and estimator performance. Taken together, we demonstrate that reasonable population size estimates are derived from anonymous respondent driven samples of 250-750 individuals, within ambient populations of 5,000-40,000. The method thus represents a novel and cost-effective means for health planners and those agencies concerned with health and disease surveillance to estimate the size of hidden populations. We discuss limitations and future work in the concluding section.

  4. Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

    2018-07-01

    Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.

  5. Uranium hydrogeochemical and stream sediment reconnaissance of the Arminto NTMS quadrangle, Wyoming, including concentrations of forty-three additional elements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morgan, T.L.

    1979-11-01

    During the summers of 1976 and 1977, 570 water and 1249 sediment samples were collected from 1517 locations within the 18,000-km/sup 2/ area of the Arminto NTMS quadrangle of central Wyoming. Water samples were collected from wells, springs, streams, and artifical ponds; sediment samples were collected from wet and dry streams, springs, and wet and dry ponds. All water samples were analyzed for 13 elements, including uranium, and each sediment sample was analyzed for 43 elements, including uranium and thorium. Uranium concentrations in water samples range from below the detection limit to 84.60 parts per billion (ppb) with a meanmore » of 4.32 ppb. All water sample types except pond water samples were considered as a single population in interpreting the data. Pond water samples were excluded due to possible concentration of uranium by evaporation. Most of the water samples containing greater than 20 ppb uranium grouped into six clusters that indicate possible areas of interest for further investigation. One cluster is associated with the Pumpkin Buttes District, and two others are near the Kaycee and Mayoworth areas of uranium mineralization. The largest cluster is located on the west side of the Powder River Basin. One cluster is located in the central Big Horn Basin and another is in the Wind River Basin; both are in areas underlain by favorable host units. Uranium concentrations in sediment samples range from 0.08 parts per million (ppm) to 115.50 ppm with a mean of 3.50 ppm. Two clusters of sediment samples over 7 ppm were delineated. The first, containing the two highest-concentration samples, corresponds with the Copper Mountain District. Many of the high uranium concentrations in samples in this cluster may be due to contamination from mining or prospecting activity upstream from the sample sites. The second cluster encompasses a wide area in the Wind River Basin along the southern boundary of the quadrangle.« less

  6. Planck/SDSS Cluster Mass and Gas Scaling Relations for a Volume-Complete redMaPPer Sample

    NASA Astrophysics Data System (ADS)

    Jimeno, Pablo; Diego, Jose M.; Broadhurst, Tom; De Martino, I.; Lazkoz, Ruth

    2018-04-01

    Using Planck satellite data, we construct Sunyaev-Zel'dovich (SZ) gas pressure profiles for a large, volume-complete sample of optically selected clusters. We have defined a sample of over 8,000 redMaPPer clusters from the Sloan Digital Sky Survey (SDSS), within the volume-complete redshift region 0.100 < z < 0.325, for which we construct SZ effect maps by stacking Planck data over the full range of richness. Dividing the sample into richness bins we simultaneously solve for the mean cluster mass in each bin together with the corresponding radial pressure profile parameters, employing an MCMC analysis. These profiles are well detected over a much wider range of cluster mass and radius than previous work, showing a clear trend towards larger break radius with increasing cluster mass. Our SZ-based masses fall ˜16% below the mass-richness relations from weak lensing, in a similar fashion as the "hydrostatic bias" related with X-ray derived masses. Finally, we derive a tight Y500-M500 relation over a wide range of cluster mass, with a power law slope equal to 1.70 ± 0.07, that agrees well with the independent slope obtained by the Planck team with an SZ-selected cluster sample, but extends to lower masses with higher precision.

  7. Extending cluster Lot Quality Assurance Sampling designs for surveillance programs

    PubMed Central

    Hund, Lauren; Pagano, Marcello

    2014-01-01

    Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible non-parametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. PMID:24633656

  8. Extending cluster lot quality assurance sampling designs for surveillance programs.

    PubMed

    Hund, Lauren; Pagano, Marcello

    2014-07-20

    Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance on the basis of the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible nonparametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. Copyright © 2014 John Wiley & Sons, Ltd.

  9. OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

    PubMed

    Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

    2014-10-16

    The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.

  10. Book Lovers, Technophiles, Pragmatists, and Printers: The Social and Demographic Structure of User Attitudes toward E-Books

    ERIC Educational Resources Information Center

    Revelle, Andy; Messner, Kevin; Shrimplin, Aaron; Hurst, Susan

    2012-01-01

    Q-methodology was used to identify clusters of opinions about e-books at Miami University. The research identified four distinct opinion types among those investigated: Book Lovers, Technophiles, Pragmatists, and Printers. The initial Q-methodology study results were then used as a basis for a large-n survey of undergraduates, graduate students,…

  11. The Sex Determination Gene Shows No Founder Effect in the Giant Honey Bee, Apis dorsata

    PubMed Central

    Yan, Wei Yu; Wu, Xiao Bo; Zeng, Zhi Jiang; Huang, Zachary Y.

    2012-01-01

    Background All honey bee species (Apis spp) share the same sex determination mechanism using the complementary sex determination (csd) gene. Only individuals heterogeneous at the csd allele develop into females, and the homozygous develop into diploid males, which do not survive. The honeybees are therefore under selection pressure to generate new csd alleles. Previous studies have shown that the csd gene is under balancing selection. We hypothesize that due to the long separation from the mainland of Hainan Island, China, that the giant honey bees (Apis dorsata) should show a founder effect for the csd gene, with many different alleles clustered together, and these would be absent on the mainland. Methodology/Principal Findings We sampled A. dorsata workers from both Hainan and Guangxi Provinces and then cloned and sequenced region 3 of the csd gene and constructed phylogenetic trees. We failed to find any clustering of the csd alleles according to their geographical origin, i.e. the Hainan and Guangxi samples did not form separate clades. Further analysis by including previously published csd sequences also failed to show any clade-forming in both the Philippines and Malaysia. Conclusions/Significance Results from this study and those from previous studies did not support the expectations of a founder effect. We conclude that because of the extremely high mating frequency of A. dorsata queens, a founder effect does not apply in this species. PMID:22511940

  12. Landscape Changes Influence the Occurrence of the Melioidosis Bacterium Burkholderia pseudomallei in Soil in Northern Australia

    PubMed Central

    Kaestli, Mirjam; Mayo, Mark; Harrington, Glenda; Ward, Linda; Watt, Felicity; Hill, Jason V.; Cheng, Allen C.; Currie, Bart J.

    2009-01-01

    Background The soil-dwelling saprophyte bacterium Burkholderia pseudomallei is the cause of melioidosis, a severe disease of humans and animals in southeast Asia and northern Australia. Despite the detection of B. pseudomallei in various soil and water samples from endemic areas, the environmental habitat of B. pseudomallei remains unclear. Methodology/Principal Findings We performed a large survey in the Darwin area in tropical Australia and screened 809 soil samples for the presence of these bacteria. B. pseudomallei were detected by using a recently developed and validated protocol involving soil DNA extraction and real-time PCR targeting the B. pseudomallei–specific Type III Secretion System TTS1 gene cluster. Statistical analyses such as multivariable cluster logistic regression and principal component analysis were performed to assess the association of B. pseudomallei with environmental factors. The combination of factors describing the habitat of B. pseudomallei differed between undisturbed sites and environmentally manipulated areas. At undisturbed sites, the occurrence of B. pseudomallei was found to be significantly associated with areas rich in grasses, whereas at environmentally disturbed sites, B. pseudomallei was associated with the presence of livestock animals, lower soil pH and different combinations of soil texture and colour. Conclusions/Significance This study contributes to the elucidation of environmental factors influencing the occurrence of B. pseudomallei and raises concerns that B. pseudomallei may spread due to changes in land use. PMID:19156200

  13. Cluster randomised trials in the medical literature: two bibliometric surveys

    PubMed Central

    Bland, J Martin

    2004-01-01

    Background Several reviews of published cluster randomised trials have reported that about half did not take clustering into account in the analysis, which was thus incorrect and potentially misleading. In this paper I ask whether cluster randomised trials are increasing in both number and quality of reporting. Methods Computer search for papers on cluster randomised trials since 1980, hand search of trial reports published in selected volumes of the British Medical Journal over 20 years. Results There has been a large increase in the numbers of methodological papers and of trial reports using the term 'cluster random' in recent years, with about equal numbers of each type of paper. The British Medical Journal contained more such reports than any other journal. In this journal there was a corresponding increase over time in the number of trials where subjects were randomised in clusters. In 2003 all reports showed awareness of the need to allow for clustering in the analysis. In 1993 and before clustering was ignored in most such trials. Conclusion Cluster trials are becoming more frequent and reporting is of higher quality. Perhaps statistician pressure works. PMID:15310402

  14. Uncertainties in the cluster-cluster correlation function

    NASA Astrophysics Data System (ADS)

    Ling, E. N.; Frenk, C. S.; Barrow, J. D.

    1986-12-01

    The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.

  15. [Applying the clustering technique for characterising maintenance outsourcing].

    PubMed

    Cruz, Antonio M; Usaquén-Perilla, Sandra P; Vanegas-Pabón, Nidia N; Lopera, Carolina

    2010-06-01

    Using clustering techniques for characterising companies providing health institutions with maintenance services. The study analysed seven pilot areas' equipment inventory (264 medical devices). Clustering techniques were applied using 26 variables. Response time (RT), operation duration (OD), availability and turnaround time (TAT) were amongst the most significant ones. Average biomedical equipment obsolescence value was 0.78. Four service provider clusters were identified: clusters 1 and 3 had better performance, lower TAT, RT and DR values (56 % of the providers coded O, L, C, B, I, S, H, F and G, had 1 to 4 day TAT values:

  16. Dynamics of cD Clusters of Galaxies. 4; Conclusion of a Survey of 25 Abell Clusters

    NASA Technical Reports Server (NTRS)

    Oegerle, William R.; Hill, John M.; Fisher, Richard R. (Technical Monitor)

    2001-01-01

    We present the final results of a spectroscopic study of a sample of cD galaxy clusters. The goal of this program has been to study the dynamics of the clusters, with emphasis on determining the nature and frequency of cD galaxies with peculiar velocities. Redshifts measured with the MX Spectrometer have been combined with those obtained from the literature to obtain typically 50 - 150 observed velocities in each of 25 galaxy clusters containing a central cD galaxy. We present a dynamical analysis of the final 11 clusters to be observed in this sample. All 25 clusters are analyzed in a uniform manner to test for the presence of substructure, and to determine peculiar velocities and their statistical significance for the central cD galaxy. These peculiar velocities were used to determine whether or not the central cD galaxy is at rest in the cluster potential well. We find that 30 - 50% of the clusters in our sample possess significant subclustering (depending on the cluster radius used in the analysis), which is in agreement with other studies of non-cD clusters. Hence, the dynamical state of cD clusters is not different than other present-day clusters. After careful study, four of the clusters appear to have a cD galaxy with a significant peculiar velocity. Dressler-Shectman tests indicate that three of these four clusters have statistically significant substructure within 1.5/h(sub 75) Mpc of the cluster center. The dispersion 75 of the cD peculiar velocities is 164 +41/-34 km/s around the mean cluster velocity. This represents a significant detection of peculiar cD velocities, but at a level which is far below the mean velocity dispersion for this sample of clusters. The picture that emerges is one in which cD galaxies are nearly at rest with respect to the cluster potential well, but have small residual velocities due to subcluster mergers.

  17. Clustering Methods with Qualitative Data: A Mixed Methods Approach for Prevention Research with Small Samples

    PubMed Central

    Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.

    2016-01-01

    Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969

  18. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    PubMed

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  19. X-Ray Temperatures, Luminosities, and Masses from XMM-Newton Follow-up of the First Shear-selected Galaxy Cluster Sample

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deshpande, Amruta J.; Hughes, John P.; Wittman, David, E-mail: amrejd@physics.rutgers.edu, E-mail: jph@physics.rutgers.edu, E-mail: dwittman@physics.ucdavis.edu

    We continue the study of the first sample of shear-selected clusters from the initial 8.6 square degrees of the Deep Lens Survey (DLS); a sample with well-defined selection criteria corresponding to the highest ranked shear peaks in the survey area. We aim to characterize the weak lensing selection by examining the sample’s X-ray properties. There are multiple X-ray clusters associated with nearly all the shear peaks: 14 X-ray clusters corresponding to seven DLS shear peaks. An additional three X-ray clusters cannot be definitively associated with shear peaks, mainly due to large positional offsets between the X-ray centroid and the shearmore » peak. Here we report on the XMM-Newton properties of the 17 X-ray clusters. The X-ray clusters display a wide range of luminosities and temperatures; the L {sub X} − T {sub X} relation we determine for the shear-associated X-ray clusters is consistent with X-ray cluster samples selected without regard to dynamical state, while it is inconsistent with self-similarity. For a subset of the sample, we measure X-ray masses using temperature as a proxy, and compare to weak lensing masses determined by the DLS team. The resulting mass comparison is consistent with equality. The X-ray and weak lensing masses show considerable intrinsic scatter (∼48%), which is consistent with X-ray selected samples when their X-ray and weak lensing masses are independently determined.« less

  20. 75 FR 16424 - Proposed Information Collection; Comment Request; Census Coverage Measurement Final Housing Unit...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-01

    ... unit is a block cluster, which consists of one or more geographically contiguous census blocks. As in... a number of distinct processes, ranging from forming block clusters, selecting the block clusters... sample of block clusters, while the E Sample is the census of housing units and enumerations in the same...

  1. White wines aroma recovery and enrichment: Sensory-led aroma selection and consumer perception.

    PubMed

    Lezaeta, Alvaro; Bordeu, Edmundo; Agosin, Eduardo; Pérez-Correa, J Ricardo; Varela, Paula

    2018-06-01

    We developed a sensory-based methodology to aromatically enrich wines using different aromatic fractions recovered during fermentations of Sauvignon Blanc must. By means of threshold determination and generic descriptive analysis using a trained sensory panel, the aromatic fractions were characterized, selected, and clustered. The selected fractions were grouped, re-assessed, and validated by the trained panel. A consumer panel assessed overall liking and answered a CATA question on some enriched wines and their ideal sample. Differences in elicitation rates between non-enriched and enriched wines with respect to the ideal product highlighted product optimization and the role of aromatic enrichment. Enrichment with aromatic fractions increased the aromatic quality of wines and enhanced consumer appreciation. Copyright © 2018. Published by Elsevier Ltd.

  2. Non-proportional odds multivariate logistic regression of ordinal family data.

    PubMed

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. MicroRNA Expression in Formalin-fixed Paraffin-embedded Cancer Tissue: Identifying Reference MicroRNAs and Variability.

    PubMed

    Boisen, Mogens Karsbøl; Dehlendorff, Christian; Linnemann, Dorte; Schultz, Nicolai Aagaard; Jensen, Benny Vittrup; Høgdall, Estrid Vilma Solyom; Johansen, Julia Sidenius

    2015-12-29

    Archival formalin-fixed paraffin-embedded (FFPE) cancer tissue samples are a readily available resource for microRNA (miRNA) biomarker identification. No established standard for reference miRNAs in FFPE tissue exists. We sought to identify stable reference miRNAs for normalization of miRNA expression in FFPE tissue samples from patients with colorectal (CRC) and pancreatic (PC) cancer and to quantify the variability associated with sample age and fixation. High-throughput miRNA profiling results from 203 CRC and 256 PC FFPE samples as well as from 37 paired frozen/FFPE samples from nine other CRC tumors (methodological samples) were used. Candidate reference miRNAs were identified by their correlation with global mean expression. The stability of reference genes was analyzed according to published methods. The association between sample age and global mean miRNA expression was tested using linear regression. Variability was described using correlation coefficients and linear mixed effects models. Normalization effects were determined by changes in standard deviation and by hierarchical clustering. We created lists of 20 miRNAs with the best correlation to global mean expression in each cancer type. Nine of these miRNAs were present in both lists, and miR-103a-3p was the most stable reference miRNA for both CRC and PC FFPE tissue. The optimal number of reference miRNAs was 4 in CRC and 10 in PC. Sample age had a significant effect on global miRNA expression in PC (50% reduction over 20 years) but not in CRC. Formalin fixation for 2-6 days decreased miRNA expression 30-65%. Normalization using global mean expression reduced variability for technical and biological replicates while normalization using the expression of the identified reference miRNAs reduced variability only for biological replicates. Normalization only had a minor impact on clustering results. We identified suitable reference miRNAs for future miRNA expression experiments using CRC- and PC FFPE tissue samples. Formalin fixation decreased miRNA expression considerably, while the effect of increasing sample age was estimated to be negligible in a clinical setting.

  4. Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali.

    PubMed

    Minetti, Andrea; Riera-Montes, Margarita; Nackers, Fabienne; Roederer, Thomas; Koudika, Marie Hortense; Sekkenes, Johanne; Taconet, Aurore; Fermon, Florence; Touré, Albouhary; Grais, Rebecca F; Checchi, Francesco

    2012-10-12

    Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes.

  5. Cholera Epidemic in Guinea-Bissau (2008): The Importance of “Place”

    PubMed Central

    Luquero, Francisco J.; Banga, Cunhate Na; Remartínez, Daniel; Palma, Pedro Pablo; Baron, Emanuel; Grais, Rebeca F.

    2011-01-01

    Background As resources are limited when responding to cholera outbreaks, knowledge about where to orient interventions is crucial. We describe the cholera epidemic affecting Guinea-Bissau in 2008 focusing on the geographical spread in order to guide prevention and control activities. Methodology/Principal Findings We conducted two studies: 1) a descriptive analysis of the cholera epidemic in Guinea-Bissau focusing on its geographical spread (country level and within the capital); and 2) a cross-sectional study to measure the prevalence of houses with at least one cholera case in the most affected neighbourhood of the capital (Bairro Bandim) to detect clustering of households with cases (cluster analysis). All cholera cases attending the cholera treatment centres in Guinea-Bissau who fulfilled a modified World Health Organization clinical case definition during the epidemic were included in the descriptive study. For the cluster analysis, a sample of houses was selected from a satellite photo (Google Earth™); 140 houses (and the four closest houses) were assessed from the 2,202 identified structures. We applied K-functions and Kernel smoothing to detect clustering. We confirmed the clustering using Kulldorff's spatial scan statistic. A total of 14,222 cases and 225 deaths were reported in the country (AR = 0.94%, CFR = 1.64%). The more affected regions were Biombo, Bijagos and Bissau (the capital). Bairro Bandim was the most affected neighborhood of the capital (AR = 4.0). We found at least one case in 22.7% of the houses (95%CI: 19.5–26.2) in this neighborhood. The cluster analysis identified two areas within Bairro Bandim at highest risk: a market and an intersection where runoff accumulates waste (p<0.001). Conclusions/Significance Our analysis allowed for the identification of the most affected regions in Guinea-Bissau during the 2008 cholera outbreak, and the most affected areas within the capital. This information was essential for making decisions on where to reinforce treatment and to guide control and prevention activities. PMID:21572530

  6. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

    PubMed

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-12-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Prediction, Detection, and Validation of Isotope Clusters in Mass Spectrometry Data

    PubMed Central

    Treutler, Hendrik; Neumann, Steffen

    2016-01-01

    Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we present an approach for the improved detection of isotope clusters using chemical prior knowledge and the validation of detected isotope clusters depending on the substance mass using database statistics. We find remarkable improvements regarding the number of detected isotope clusters and are able to predict the correct molecular formula in the top three ranks in 92% of the cases. We make our methodology freely available as part of the Bioconductor packages xcms version 1.50.0 and CAMERA version 1.30.0. PMID:27775610

  8. Clustering of financial time series with application to index and enhanced index tracking portfolio

    NASA Astrophysics Data System (ADS)

    Dose, Christian; Cincotti, Silvano

    2005-09-01

    A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.

  9. Anonymous nuclear markers reveal taxonomic incongruence and long-term disjunction in a cactus species complex with continental-island distribution in South America.

    PubMed

    Perez, Manolo F; Carstens, Bryan C; Rodrigues, Gustavo L; Moraes, Evandro M

    2016-02-01

    The Pilosocereus aurisetus complex consists of eight cactus species with a fragmented distribution associated to xeric enclaves within the Cerrado biome in eastern South America. The phylogeny of these species is incompletely resolved, and this instability complicates evolutionary analyses. Previous analyses based on both plastid and microsatellite markers suggested that this complex contained species with inherent phylogeographic structure, which was attributed to recent diversification and recurring range shifts. However, limitations of the molecular markers used in these analyses prevented some questions from being properly addressed. In order to better understand the relationship among these species and make a preliminary assessment of the genetic structure within them, we developed anonymous nuclear loci from pyrosequencing data of 40 individuals from four species in the P. aurisetus complex. The data obtained from these loci were used to identify genetic clusters within species, and to investigate the phylogenetic relationship among these inferred clusters using a species tree methodology. Coupled with a palaeodistributional modelling, our results reveal a deep phylogenetic and climatic disjunction between two geographic lineages. Our results highlight the importance of sampling more regions from the genome to gain better insights on the evolution of species with an intricate evolutionary history. The methodology used here provides a feasible approach to develop numerous genealogical molecular markers throughout the genome for non-model species. These data provide a more robust hypothesis for the relationship among the lineages of the P. aurisetus complex. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  11. Characterization of Differential Toll-Like Receptor Responses below the Optical Diffraction Limit**

    PubMed Central

    Aaron, Jesse S.; Carson, Bryan D.; Timlin, Jerilyn A.

    2013-01-01

    Many membrane receptors are recruited to specific cell surface domains to form nanoscale clusters upon ligand activation. This step appears to be necessary to initiate signaling, including pathways in innate immune system activation. However, virulent pathogens such as Yersinia pestis (the causative agent of plague) are known to evade innate immune detection, in contrast to similar microbes (such as E. coli) that elicit a robust response. This disparity has been partly attributed to the structure of lipopolysaccharides (LPS) on the bacterial cell wall, which are recognized by the innate immune receptor TLR4. As such, we hypothesized that nanoscale differences would exist between the spatial clustering of TLR4 upon binding of LPS derived from Y. pestis and E. coli. Although optical imaging can provide exquisite details of the spatial organization of biomolecules, there is a mismatch between the scale at which receptor clustering occurs (<300 nm) and the optical diffraction limit (>400 nm). The last decade has seen the emergence of super-resolution imaging methods that effectively break the optical diffraction barrier to yield truly nanoscale information in intact biological samples. This study reports the first visualizations of TLR4 distributions on intact cells at image resolutions of <30 nm using a novel, dual-color stochastic optical reconstruction microscopy (STORM) technique. This methodology permits distinction between receptors containing bound LPS from those without at the nanoscale. Importantly, we also show that LPS derived from immuno-stimulatory bacteria resulted in significantly higher LPS-TLR4 cluster sizes and a nearly two-fold greater ligand/receptor colocalization as compared to immuno-evading LPS. PMID:22807232

  12. No galaxy left behind: accurate measurements with the faintest objects in the Dark Energy Survey

    NASA Astrophysics Data System (ADS)

    Suchyta, E.; Huff, E. M.; Aleksić, J.; Melchior, P.; Jouvel, S.; MacCrann, N.; Ross, A. J.; Crocce, M.; Gaztanaga, E.; Honscheid, K.; Leistedt, B.; Peiris, H. V.; Rykoff, E. S.; Sheldon, E.; Abbott, T.; Abdalla, F. B.; Allam, S.; Banerji, M.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Burke, D. L.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Cunha, C. E.; D'Andrea, C. B.; da Costa, L. N.; DePoy, D. L.; Desai, S.; Diehl, H. T.; Dietrich, J. P.; Doel, P.; Eifler, T. F.; Estrada, J.; Evrard, A. E.; Flaugher, B.; Fosalba, P.; Frieman, J.; Gerdes, D. W.; Gruen, D.; Gruendl, R. A.; James, D. J.; Jarvis, M.; Kuehn, K.; Kuropatkin, N.; Lahav, O.; Lima, M.; Maia, M. A. G.; March, M.; Marshall, J. L.; Miller, C. J.; Miquel, R.; Neilsen, E.; Nichol, R. C.; Nord, B.; Ogando, R.; Percival, W. J.; Reil, K.; Roodman, A.; Sako, M.; Sanchez, E.; Scarpine, V.; Sevilla-Noarbe, I.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Swanson, M. E. C.; Tarle, G.; Thaler, J.; Thomas, D.; Vikram, V.; Walker, A. R.; Wechsler, R. H.; Zhang, Y.; DES Collaboration

    2016-03-01

    Accurate statistical measurement with large imaging surveys has traditionally required throwing away a sizable fraction of the data. This is because most measurements have relied on selecting nearly complete samples, where variations in the composition of the galaxy population with seeing, depth, or other survey characteristics are small. We introduce a new measurement method that aims to minimize this wastage, allowing precision measurement for any class of detectable stars or galaxies. We have implemented our proposal in BALROG, software which embeds fake objects in real imaging to accurately characterize measurement biases. We demonstrate this technique with an angular clustering measurement using Dark Energy Survey (DES) data. We first show that recovery of our injected galaxies depends on a variety of survey characteristics in the same way as the real data. We then construct a flux-limited sample of the faintest galaxies in DES, chosen specifically for their sensitivity to depth and seeing variations. Using the synthetic galaxies as randoms in the Landy-Szalay estimator suppresses the effects of variable survey selection by at least two orders of magnitude. With this correction, our measured angular clustering is found to be in excellent agreement with that of a matched sample from much deeper, higher resolution space-based Cosmological Evolution Survey (COSMOS) imaging; over angular scales of 0.004° < θ < 0.2°, we find a best-fitting scaling amplitude between the DES and COSMOS measurements of 1.00 ± 0.09. We expect this methodology to be broadly useful for extending measurements' statistical reach in a variety of upcoming imaging surveys.

  13. Potential use of MCR-ALS for the identification of coeliac-related biochemical changes in hyperspectral Raman maps from pediatric intestinal biopsies.

    PubMed

    Fornasaro, Stefano; Vicario, Annalisa; De Leo, Luigina; Bonifacio, Alois; Not, Tarcisio; Sergo, Valter

    2018-05-14

    Raman hyperspectral imaging is an emerging practice in biological and biomedical research for label free analysis of tissues and cells. Using this method, both spatial distribution and spectral information of analyzed samples can be obtained. The current study reports the first Raman microspectroscopic characterisation of colon tissues from patients with Coeliac Disease (CD). The aim was to assess if Raman imaging coupled with hyperspectral multivariate image analysis is capable of detecting the alterations in the biochemical composition of intestinal tissues associated with CD. The analytical approach was based on a multi-step methodology: duodenal biopsies from healthy and coeliac patients were measured and processed with Multivariate Curve Resolution Alternating Least Squares (MCR-ALS). Based on the distribution maps and the pure spectra of the image constituents obtained from MCR-ALS, interesting biochemical differences between healthy and coeliac patients has been derived. Noticeably, a reduced distribution of complex lipids in the pericryptic space, and a different distribution and abundance of proteins rich in beta-sheet structures was found in CD patients. The output of the MCR-ALS analysis was then used as a starting point for two clustering algorithms (k-means clustering and hierarchical clustering methods). Both methods converged with similar results providing precise segmentation over multiple Raman images of studied tissues.

  14. Resuscitation Outcomes Consortium (ROC) PRIMED Cardiac Arrest Trial Methods Part 2: Rationale and Methodology for “Analyze Later” Protocol

    PubMed Central

    Stiell, Ian G.; Callaway, Clif; Davis, Dan; Terndrup, Tom; Powell, Judy; Cook, Andrea; Kudenchuk, Peter J.; Daya, Mohamud; Kerber, Richard; Idris, Ahamed; Morrison, Laurie J.; Aufderheide, Tom

    2008-01-01

    Objective The primary objective of the trial is to compare survival to hospital discharge with Modified Rankin Score (MRS) ≤3 between a strategy that prioritizes a specified period of CPR before rhythm analysis (Analyze Later) versus a strategy of minimal CPR followed by early rhythm analysis (Analyze Early) in patients with out-of-hospital cardiac arrest. Methods   Design Cluster randomized trial with cluster units defined by geographic region, or monitor/defibrillator machine. Population Adults treated by Emergency Medical Service (EMS) providers for non-traumatic out-of-hospital cardiac arrest not witnessed by EMS. Setting EMS systems participating in the Resuscitation Outcomes Consortium and agreeing to cluster randomization to the Analyze Later versus Analyze Early intervention in a crossover fashion. Sample Size Based on a two-sided significance level of 0.05, a maximum of 13,239 evaluable patients will allow statistical power of 0.996 to detect a hypothesized improvement in the probability of survival to discharge with MRS ≤ 3 rate from 5.41% after Analyze Early to 7.45% after Analyze Later (2.04% absolute increase in primary outcome). Conclusion If this trial demonstrates a significant improvement in survival with a strategy of Analyze Later, it is estimated that 4,000 premature deaths from cardiac arrest would be averted annually in North America alone. PMID:18487004

  15. Resuscitation Outcomes Consortium (ROC) PRIMED cardiac arrest trial methods part 2: rationale and methodology for "Analyze Later vs. Analyze Early" protocol.

    PubMed

    Stiell, Ian G; Callaway, Clif; Davis, Dan; Terndrup, Tom; Powell, Judy; Cook, Andrea; Kudenchuk, Peter J; Daya, Mohamud; Kerber, Richard; Idris, Ahamed; Morrison, Laurie J; Aufderheide, Tom

    2008-08-01

    The primary objective of the trial is to compare survival to hospital discharge with modified Rankin score (MRS) < or =3 between a strategy that prioritizes a specified period of CPR before rhythm analysis (Analyze Later) versus a strategy of minimal CPR followed by early rhythm analysis (Analyze Early) in patients with out-of-hospital cardiac arrest. Design-Cluster randomized trial with cluster units defined by geographic region, or monitor/defibrillator machine. Population-Adults treated by emergency medical service (EMS) providers for non-traumatic out-of-hospital cardiac arrest not witnessed by EMS. Setting-EMS systems participating in the Resuscitation Outcomes Consortium and agreeing to cluster randomization to the Analyze Later versus Analyze Early intervention in a crossover fashion. Sample size-Based on a two-sided significance level of 0.05, a maximum of 13,239 evaluable patients will allow statistical power of 0.996 to detect a hypothesized improvement in the probability of survival to discharge with MRS < or =3 rate from 5.41% after Analyze Early to 7.45% after Analyze Later (2.04% absolute increase in primary outcome). If this trial demonstrates a significant improvement in survival with a strategy of Analyze Later, it is estimated that 4000 premature deaths from cardiac arrest would be averted annually in North America alone.

  16. Sample size calculation in cost-effectiveness cluster randomized trials: optimal and maximin approaches.

    PubMed

    Manju, Md Abu; Candel, Math J J M; Berger, Martijn P F

    2014-07-10

    In this paper, the optimal sample sizes at the cluster and person levels for each of two treatment arms are obtained for cluster randomized trials where the cost-effectiveness of treatments on a continuous scale is studied. The optimal sample sizes maximize the efficiency or power for a given budget or minimize the budget for a given efficiency or power. Optimal sample sizes require information on the intra-cluster correlations (ICCs) for effects and costs, the correlations between costs and effects at individual and cluster levels, the ratio of the variance of effects translated into costs to the variance of the costs (the variance ratio), sampling and measuring costs, and the budget. When planning, a study information on the model parameters usually is not available. To overcome this local optimality problem, the current paper also presents maximin sample sizes. The maximin sample sizes turn out to be rather robust against misspecifying the correlation between costs and effects at the cluster and individual levels but may lose much efficiency when misspecifying the variance ratio. The robustness of the maximin sample sizes against misspecifying the ICCs depends on the variance ratio. The maximin sample sizes are robust under misspecification of the ICC for costs for realistic values of the variance ratio greater than one but not robust under misspecification of the ICC for effects. Finally, we show how to calculate optimal or maximin sample sizes that yield sufficient power for a test on the cost-effectiveness of an intervention.

  17. VizieR Online Data Catalog: LAMOST survey of star clusters in M31. II. (Chen+, 2016)

    NASA Astrophysics Data System (ADS)

    Chen, B.; Liu, X.; Xiang, M.; Yuan, H.; Huang, Y.; Shi, J.; Fan, Z.; Huo, Z.; Wang, C.; Ren, J.; Tian, Z.; Zhang, H.; Liu, G.; Cao, Z.; Zhang, Y.; Hou, Y.; Wang, Y.

    2016-09-01

    We select a sample of 306 massive star clusters observed with the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) in the vicinity fields of M31 and M33. Massive clusters in our sample are all selected from the catalog presented in Paper I (Chen et al. 2015, Cat. J/other/RAA/15.1392), including five newly discovered clusters selected with the SDSS photometry, three newly confirmed, and 298 previously known clusters from Revised Bologna Catalogue (RBC; Galleti et al. 2012, Cat. V/143; http://www.bo.astro.it/M31/). Since then another two objects, B341 and B207, have also been observed with LAMOST, and they are included in the current analysis. The current sample does not include those listed in Paper I but is selected from Johnson et al. 2012 (Cat. J/ApJ/752/95) since most of them are young but not so massive. All objects are observed with LAMOST between 2011 September and 2014 June. Table1 lists the name, position, and radial velocity of all sample clusters analyzed in the current work. The LAMOST spectra cover the wavelength range 3700-9000Å at a resolving power of R~1800. Details about the observations and data reduction can be found in Paper I. The median signal-to-noise ratio (S/N) per pixel at 4750 and 7450Å of spectra of all clusters in the current sample are, respectively, 14 and 37. Essentially all spectra have S/N(4750Å)>5 except for the spectra of 18 clusters. The latter have S/N(7540Å)>10. Peacock et al. 2010 (Cat. J/MNRAS/402/803) retrieved images of M31 star clusters and candidates from the SDSS archive and extracted ugriz aperture photometric magnitudes from those objects using the SExtractor. They present a catalog containing homogeneous ugriz photometry of 572 star clusters and 373 candidates. Among them, 299 clusters are in our sample. (2 data files).

  18. CHEERS: The chemical evolution RGS sample

    NASA Astrophysics Data System (ADS)

    de Plaa, J.; Kaastra, J. S.; Werner, N.; Pinto, C.; Kosec, P.; Zhang, Y.-Y.; Mernier, F.; Lovisari, L.; Akamatsu, H.; Schellenberger, G.; Hofmann, F.; Reiprich, T. H.; Finoguenov, A.; Ahoranta, J.; Sanders, J. S.; Fabian, A. C.; Pols, O.; Simionescu, A.; Vink, J.; Böhringer, H.

    2017-11-01

    Context. The chemical yields of supernovae and the metal enrichment of the intra-cluster medium (ICM) are not well understood. The hot gas in clusters of galaxies has been enriched with metals originating from billions of supernovae and provides a fair sample of large-scale metal enrichment in the Universe. High-resolution X-ray spectra of clusters of galaxies provide a unique way of measuring abundances in the hot intracluster medium (ICM). The abundance measurements can provide constraints on the supernova explosion mechanism and the initial-mass function of the stellar population. This paper introduces the CHEmical Enrichment RGS Sample (CHEERS), which is a sample of 44 bright local giant ellipticals, groups, and clusters of galaxies observed with XMM-Newton. Aims: The CHEERS project aims to provide the most accurate set of cluster abundances measured in X-rays using this sample. This paper focuses specifically on the abundance measurements of O and Fe using the reflection grating spectrometer (RGS) on board XMM-Newton. We aim to thoroughly discuss the cluster to cluster abundance variations and the robustness of the measurements. Methods: We have selected the CHEERS sample such that the oxygen abundance in each cluster is detected at a level of at least 5σ in the RGS. The dispersive nature of the RGS limits the sample to clusters with sharp surface brightness peaks. The deep exposures and the size of the sample allow us to quantify the intrinsic scatter and the systematic uncertainties in the abundances using spectral modeling techniques. Results: We report the oxygen and iron abundances as measured with RGS in the core regions of all 44 clusters in the sample. We do not find a significant trend of O/Fe as a function of cluster temperature, but we do find an intrinsic scatter in the O and Fe abundances from cluster to cluster. The level of systematic uncertainties in the O/Fe ratio is estimated to be around 20-30%, while the systematic uncertainties in the absolute O and Fe abundances can be as high as 50% in extreme cases. Thanks to the high statistics of the observations, we were able to identify and correct a systematic bias in the oxygen abundance determination that was due to an inaccuracy in the spectral model. Conclusions: The lack of dependence of O/Fe on temperature suggests that the enrichment of the ICM does not depend on cluster mass and that most of the enrichment likely took place before the ICM was formed. We find that the observed scatter in the O/Fe ratio is due to a combination of intrinsic scatter in the source and systematic uncertainties in the spectral fitting, which we are unable to separate. The astrophysical source of intrinsic scatter could be due to differences in active galactic nucleus activity and ongoing star formation in the brightest cluster galaxy. The systematic scatter is due to uncertainties in the spatial line broadening, absorption column, multi-temperature structure, and the thermal plasma models.

  19. Designing trials for pressure ulcer risk assessment research: methodological challenges.

    PubMed

    Balzer, K; Köpke, S; Lühmann, D; Haastert, B; Kottner, J; Meyer, G

    2013-08-01

    For decades various pressure ulcer risk assessment scales (PURAS) have been developed and implemented into nursing practice despite uncertainty whether use of these tools helps to prevent pressure ulcers. According to current methodological standards, randomised controlled trials (RCTs) are required to conclusively determine the clinical efficacy and safety of this risk assessment strategy. In these trials, PURAS-aided risk assessment has to be compared to nurses' clinical judgment alone in terms of its impact on pressure ulcer incidence and adverse outcomes. However, RCTs evaluating diagnostic procedures are prone to specific risks of bias and threats to the statistical power which may challenge their validity and feasibility. This discussion paper critically reflects on the rigour and feasibility of experimental research needed to substantiate the clinical efficacy of PURAS-aided risk assessment. Based on reflections of the methodological literature, a critical appraisal of available trials on this subject and an analysis of a protocol developed for a methodologically robust cluster-RCT, this paper arrives at the following conclusions: First, available trials do not provide reliable estimates of the impact of PURAS-aided risk assessment on pressure ulcer incidence compared to nurses' clinical judgement alone due to serious risks of bias and insufficient sample size. Second, it seems infeasible to assess this impact by means of rigorous experimental studies since sample size would become extremely high if likely threats to validity and power are properly taken into account. Third, means of evidence linkages seem to currently be the most promising approaches for evaluating the clinical efficacy and safety of PURAS-aided risk assessment. With this kind of secondary research, the downstream effect of use of PURAS on pressure ulcer incidence could be modelled by combining best available evidence for single parts of this pathway. However, to yield reliable modelling results, more robust experimental research evaluating specific parts of the pressure ulcer risk assessment-prevention pathway is needed. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Cluster Masses Derived from X-ray and Sunyaev-Zeldovich Effect Measurements

    NASA Technical Reports Server (NTRS)

    Laroque, S.; Joy, Marshall; Bonamente, M.; Carlstrom, J.; Dawson, K.

    2003-01-01

    We infer the gas mass and total gravitational mass of 11 clusters using two different methods; analysis of X-ray data from the Chandra X-ray Observatory and analysis of centimeter-wave Sunyaev-Zel'dovich Effect (SZE) data from the BIMA and OVRO interferometers. This flux-limited sample of clusters from the BCS cluster catalogue was chosen so as to be well above the surface brightness limit of the ROSAT All Sky Survey; this is therefore an orientation unbiased sample. The gas mass fraction, f_g, is calculated for each cluster using both X-ray and SZE data, and the results are compared at a fiducial radius of r_500. Comparison of the X-ray and SZE results for this orientation unbiased sample allows us to constrain cluster systematics, such as clumping of the intracluster medium. We derive an upper limit on Omega_M assuming that the mass composition of clusters within r_500 reflects the universal mass composition Omega_M h_100 is greater than Omega _B / f-g. We also demonstrate how the mean f_g derived from the sample can be used to estimate the masses of clusters discovered by upcoming deep SZE surveys.

  1. Physical properties of star clusters in the outer LMC as observed by the DES

    DOE PAGES

    Pieres, A.; Santiago, B.; Balbinot, E.; ...

    2016-05-26

    The Large Magellanic Cloud (LMC) harbors a rich and diverse system of star clusters, whose ages, chemical abundances, and positions provide information about the LMC history of star formation. We use Science Verification imaging data from the Dark Energy Survey to increase the census of known star clusters in the outer LMC and to derive physical parameters for a large sample of such objects using a spatially and photometrically homogeneous data set. Our sample contains 255 visually identified cluster candidates, of which 109 were not listed in any previous catalog. We quantify the crowding effect for the stellar sample producedmore » by the DES Data Management pipeline and conclude that the stellar completeness is < 10% inside typical LMC cluster cores. We therefore develop a pipeline to sample and measure stellar magnitudes and positions around the cluster candidates using DAOPHOT. We also implement a maximum-likelihood method to fit individual density profiles and colour-magnitude diagrams. For 117 (from a total of 255) of the cluster candidates (28 uncatalogued clusters), we obtain reliable ages, metallicities, distance moduli and structural parameters, confirming their nature as physical systems. The distribution of cluster metallicities shows a radial dependence, with no clusters more metal-rich than [Fe/H] ~ -0.7 beyond 8 kpc from the LMC center. Furthermore, the age distribution has two peaks at ≃ 1.2 Gyr and ≃ 2.7 Gyr.« less

  2. Physical properties of star clusters in the outer LMC as observed by the DES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pieres, A.; Santiago, B.; Balbinot, E.

    The Large Magellanic Cloud (LMC) harbors a rich and diverse system of star clusters, whose ages, chemical abundances, and positions provide information about the LMC history of star formation. We use Science Verification imaging data from the Dark Energy Survey to increase the census of known star clusters in the outer LMC and to derive physical parameters for a large sample of such objects using a spatially and photometrically homogeneous data set. Our sample contains 255 visually identified cluster candidates, of which 109 were not listed in any previous catalog. We quantify the crowding effect for the stellar sample producedmore » by the DES Data Management pipeline and conclude that the stellar completeness is < 10% inside typical LMC cluster cores. We therefore develop a pipeline to sample and measure stellar magnitudes and positions around the cluster candidates using DAOPHOT. We also implement a maximum-likelihood method to fit individual density profiles and colour-magnitude diagrams. For 117 (from a total of 255) of the cluster candidates (28 uncatalogued clusters), we obtain reliable ages, metallicities, distance moduli and structural parameters, confirming their nature as physical systems. The distribution of cluster metallicities shows a radial dependence, with no clusters more metal-rich than [Fe/H] ~ -0.7 beyond 8 kpc from the LMC center. Furthermore, the age distribution has two peaks at ≃ 1.2 Gyr and ≃ 2.7 Gyr.« less

  3. A Pareto frontier intersection-based approach for efficient multiobjective optimization of competing concept alternatives

    NASA Astrophysics Data System (ADS)

    Rousis, Damon A.

    The expected growth of civil aviation over the next twenty years places significant emphasis on revolutionary technology development aimed at mitigating the environmental impact of commercial aircraft. As the number of technology alternatives grows along with model complexity, current methods for Pareto finding and multiobjective optimization quickly become computationally infeasible. Coupled with the large uncertainty in the early stages of design, optimal designs are sought while avoiding the computational burden of excessive function calls when a single design change or technology assumption could alter the results. This motivates the need for a robust and efficient evaluation methodology for quantitative assessment of competing concepts. This research presents a novel approach that combines Bayesian adaptive sampling with surrogate-based optimization to efficiently place designs near Pareto frontier intersections of competing concepts. Efficiency is increased over sequential multiobjective optimization by focusing computational resources specifically on the location in the design space where optimality shifts between concepts. At the intersection of Pareto frontiers, the selection decisions are most sensitive to preferences place on the objectives, and small perturbations can lead to vastly different final designs. These concepts are incorporated into an evaluation methodology that ultimately reduces the number of failed cases, infeasible designs, and Pareto dominated solutions across all concepts. A set of algebraic samples along with a truss design problem are presented as canonical examples for the proposed approach. The methodology is applied to the design of ultra-high bypass ratio turbofans to guide NASA's technology development efforts for future aircraft. Geared-drive and variable geometry bypass nozzle concepts are explored as enablers for increased bypass ratio and potential alternatives over traditional configurations. The method is shown to improve sampling efficiency and provide clusters of feasible designs that motivate a shift towards revolutionary technologies that reduce fuel burn, emissions, and noise on future aircraft.

  4. Cluster Randomised Trials in Cochrane Reviews: Evaluation of Methodological and Reporting Practice.

    PubMed

    Richardson, Marty; Garner, Paul; Donegan, Sarah

    2016-01-01

    Systematic reviews can include cluster-randomised controlled trials (C-RCTs), which require different analysis compared with standard individual-randomised controlled trials. However, it is not known whether review authors follow the methodological and reporting guidance when including these trials. The aim of this study was to assess the methodological and reporting practice of Cochrane reviews that included C-RCTs against criteria developed from existing guidance. Criteria were developed, based on methodological literature and personal experience supervising review production and quality. Criteria were grouped into four themes: identifying, reporting, assessing risk of bias, and analysing C-RCTs. The Cochrane Database of Systematic Reviews was searched (2nd December 2013), and the 50 most recent reviews that included C-RCTs were retrieved. Each review was then assessed using the criteria. The 50 reviews we identified were published by 26 Cochrane Review Groups between June 2013 and November 2013. For identifying C-RCTs, only 56% identified that C-RCTs were eligible for inclusion in the review in the eligibility criteria. For reporting C-RCTs, only eight (24%) of the 33 reviews reported the method of cluster adjustment for their included C-RCTs. For assessing risk of bias, only one review assessed all five C-RCT-specific risk-of-bias criteria. For analysing C-RCTs, of the 27 reviews that presented unadjusted data, only nine (33%) provided a warning that confidence intervals may be artificially narrow. Of the 34 reviews that reported data from unadjusted C-RCTs, only 13 (38%) excluded the unadjusted results from the meta-analyses. The methodological and reporting practices in Cochrane reviews incorporating C-RCTs could be greatly improved, particularly with regard to analyses. Criteria developed as part of the current study could be used by review authors or editors to identify errors and improve the quality of published systematic reviews incorporating C-RCTs.

  5. Mass spectrometric identification of intermediates in the O2-driven [4Fe-4S] to [2Fe-2S] cluster conversion in FNR

    PubMed Central

    Crack, Jason C.; Thomson, Andrew J.

    2017-01-01

    The iron-sulfur cluster containing protein Fumarate and Nitrate Reduction (FNR) is the master regulator for the switch between anaerobic and aerobic respiration in Escherichia coli and many other bacteria. The [4Fe-4S] cluster functions as the sensory module, undergoing reaction with O2 that leads to conversion to a [2Fe-2S] form with loss of high-affinity DNA binding. Here, we report studies of the FNR cluster conversion reaction using time-resolved electrospray ionization mass spectrometry. The data provide insight into the reaction, permitting the detection of cluster conversion intermediates and products, including a [3Fe-3S] cluster and persulfide-coordinated [2Fe-2S] clusters [[2Fe-2S](S)n, where n = 1 or 2]. Analysis of kinetic data revealed a branched mechanism in which cluster sulfide oxidation occurs in parallel with cluster conversion and not as a subsequent, secondary reaction to generate [2Fe-2S](S)n species. This methodology shows great potential for broad application to studies of protein cofactor–small molecule interactions. PMID:28373574

  6. X-ray morphological study of galaxy cluster catalogues

    NASA Astrophysics Data System (ADS)

    Democles, Jessica; Pierre, Marguerite; Arnaud, Monique

    2016-07-01

    Context : The intra-cluster medium distribution as probed by X-ray morphology based analysis gives good indication of the system dynamical state. In the race for the determination of precise scaling relations and understanding their scatter, the dynamical state offers valuable information. Method : We develop the analysis of the centroid-shift so that it can be applied to characterize galaxy cluster surveys such as the XXL survey or high redshift cluster samples. We use it together with the surface brightness concentration parameter and the offset between X-ray peak and brightest cluster galaxy in the context of the XXL bright cluster sample (Pacaud et al 2015) and a set of high redshift massive clusters detected by Planck and SPT and observed by both XMM-Newton and Chandra observatories. Results : Using the wide redshift coverage of the XXL sample, we see no trend between the dynamical state of the systems with the redshift.

  7. Planck/SDSS cluster mass and gas scaling relations for a volume-complete redMaPPer sample

    NASA Astrophysics Data System (ADS)

    Jimeno, Pablo; Diego, Jose M.; Broadhurst, Tom; De Martino, I.; Lazkoz, Ruth

    2018-07-01

    Using Planck satellite data, we construct Sunyaev-Zel'dovich (SZ) gas pressure profiles for a large, volume-complete sample of optically selected clusters. We have defined a sample of over 8000 redMaPPer clusters from the Sloan Digital Sky Survey, within the volume-complete redshift region 0.100

  8. Toward An Understanding of Cluster Evolution: A Deep X-Ray Selected Cluster Catalog from ROSAT

    NASA Technical Reports Server (NTRS)

    Jones, Christine; Oliversen, Ronald (Technical Monitor)

    2002-01-01

    In the past year, we have focussed on studying individual clusters found in this sample with Chandra, as well as using Chandra to measure the luminosity-temperature relation for a sample of distant clusters identified through the ROSAT study, and finally we are continuing our study of fossil groups. For the luminosity-temperature study, we compared a sample of nearby clusters with a sample of distant clusters and, for the first time, measured a significant change in the relation as a function of redshift (Vikhlinin et al. in final preparation for submission to Cape). We also used our ROSAT analysis to select and propose for Chandra observations of individual clusters. We are now analyzing the Chandra observations of the distant cluster A520, which appears to have undergone a recent merger. Finally, we have completed the analysis of the fossil groups identified in ROM observations. In the past few months, we have derived X-ray fluxes and luminosities as well as X-ray extents for an initial sample of 89 objects. Based on the X-ray extents and the lack of bright galaxies, we have identified 16 fossil groups. We are comparing their X-ray and optical properties with those of optically rich groups. A paper is being readied for submission (Jones, Forman, and Vikhlinin in preparation).

  9. Ecological tolerances of Miocene larger benthic foraminifera from Indonesia

    NASA Astrophysics Data System (ADS)

    Novak, Vibor; Renema, Willem

    2018-01-01

    To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.

  10. Identifying consumer segments in health services markets: an application of conjoint and cluster analyses to the ambulatory care pharmacy market.

    PubMed

    Carrol, N V; Gagon, J P

    1983-01-01

    Because of increasing competition, it is becoming more important that health care providers pursue consumer-based market segmentation strategies. This paper presents a methodology for identifying and describing consumer segments in health service markets, and demonstrates the use of the methodology by presenting a study of consumer segments in the ambulatory care pharmacy market.

  11. X-Ray Temperatures, Luminosities, and Masses from XMM-Newton Follow-upof the First Shear-selected Galaxy Cluster Sample

    NASA Astrophysics Data System (ADS)

    Deshpande, Amruta J.; Hughes, John P.; Wittman, David

    2017-04-01

    We continue the study of the first sample of shear-selected clusters from the initial 8.6 square degrees of the Deep Lens Survey (DLS); a sample with well-defined selection criteria corresponding to the highest ranked shear peaks in the survey area. We aim to characterize the weak lensing selection by examining the sample’s X-ray properties. There are multiple X-ray clusters associated with nearly all the shear peaks: 14 X-ray clusters corresponding to seven DLS shear peaks. An additional three X-ray clusters cannot be definitively associated with shear peaks, mainly due to large positional offsets between the X-ray centroid and the shear peak. Here we report on the XMM-Newton properties of the 17 X-ray clusters. The X-ray clusters display a wide range of luminosities and temperatures; the L X -T X relation we determine for the shear-associated X-ray clusters is consistent with X-ray cluster samples selected without regard to dynamical state, while it is inconsistent with self-similarity. For a subset of the sample, we measure X-ray masses using temperature as a proxy, and compare to weak lensing masses determined by the DLS team. The resulting mass comparison is consistent with equality. The X-ray and weak lensing masses show considerable intrinsic scatter (˜48%), which is consistent with X-ray selected samples when their X-ray and weak lensing masses are independently determined. Some of the data presented herein were obtained at the W.M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California, and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W. M. Keck Foundation.

  12. Phenetic Comparison of Prokaryotic Genomes Using k-mers

    PubMed Central

    Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques

    2017-01-01

    Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508

  13. Relationship between solitary pulmonary nodule lung cancer and CT image features based on gradual clustering

    NASA Astrophysics Data System (ADS)

    Zhang, Weipeng

    2017-06-01

    The relationship between the medical characteristics of lung cancers and computer tomography (CT) images are explored so as to improve the early diagnosis rate of lung cancers. This research collected CT images of patients with solitary pulmonary nodule lung cancer, and used gradual clustering methodology to classify them. Preliminary classifications were made, followed by continuous modification and iteration to determine the optimal condensation point, until iteration stability was achieved. Reasonable classification results were obtained. the clustering results fell into 3 categories. The first type of patients was mostly female, with ages between 50 and 65 years. CT images of solitary pulmonary nodule lung cancer for this group contain complete lobulation and burr, with pleural indentation; The second type of patients was mostly male with ages between 50 and 80 years. CT images of solitary pulmonary nodule lung cancer for this group contain complete lobulation and burr, but with no pleural indentation; The third type of patients was also mostly male with ages between 50 and 80 years. CT images for this group showed no abnormalities. the application of gradual clustering methodology can scientifically classify CT image features of patients with lung cancer in the initial lesion stage. These findings provide the basis for early detection and treatment of malignant lesions in patients with lung cancer.

  14. Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations.

    PubMed

    Jolani, Shahab

    2018-03-01

    In health and medical sciences, multiple imputation (MI) is now becoming popular to obtain valid inferences in the presence of missing data. However, MI of clustered data such as multicenter studies and individual participant data meta-analysis requires advanced imputation routines that preserve the hierarchical structure of data. In clustered data, a specific challenge is the presence of systematically missing data, when a variable is completely missing in some clusters, and sporadically missing data, when it is partly missing in some clusters. Unfortunately, little is known about how to perform MI when both types of missing data occur simultaneously. We develop a new class of hierarchical imputation approach based on chained equations methodology that simultaneously imputes systematically and sporadically missing data while allowing for arbitrary patterns of missingness among them. Here, we use a random effect imputation model and adopt a simplification over fully Bayesian techniques such as Gibbs sampler to directly obtain draws of parameters within each step of the chained equations. We justify through theoretical arguments and extensive simulation studies that the proposed imputation methodology has good statistical properties in terms of bias and coverage rates of parameter estimates. An illustration is given in a case study with eight individual participant datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis.

    PubMed

    Turner, Elizabeth L; Prague, Melanie; Gallis, John A; Li, Fan; Murray, David M

    2017-07-01

    In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.

  16. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design

    PubMed Central

    Li, Fan; Gallis, John A.; Prague, Melanie; Murray, David M.

    2017-01-01

    In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis. PMID:28426295

  17. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design.

    PubMed

    Turner, Elizabeth L; Li, Fan; Gallis, John A; Prague, Melanie; Murray, David M

    2017-06-01

    In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.

  18. Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali

    PubMed Central

    2012-01-01

    Background Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. Methods We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. Results VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Conclusions Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes. PMID:23057445

  19. Disease clusters, exact distributions of maxima, and P-values.

    PubMed

    Grimson, R C

    1993-10-01

    This paper presents combinatorial (exact) methods that are useful in the analysis of disease cluster data obtained from small environments, such as buildings and neighbourhoods. Maxwell-Boltzmann and Fermi-Dirac occupancy models are compared in terms of appropriateness of representation of disease incidence patterns (space and/or time) in these environments. The methods are illustrated by a statistical analysis of the incidence pattern of bone fractures in a setting wherein fracture clustering was alleged to be occurring. One of the methodological results derived in this paper is the exact distribution of the maximum cell frequency in occupancy models.

  20. Clustering behavior in microbial communities from acute endodontic infections.

    PubMed

    Montagner, Francisco; Jacinto, Rogério C; Signoretti, Fernanda G C; Sanches, Paula F; Gomes, Brenda P F A

    2012-02-01

    Acute endodontic infections harbor heterogeneous microbial communities in both the root canal (RC) system and apical tissues. Data comparing the microbial structure and diversity in endodontic infections in related ecosystems, such as RC with necrotic pulp and acute apical abscess (AAA), are scarce in the literature. The aim of this study was to examine the presence of selected endodontic pathogens in paired samples from necrotic RC and AAA using polymerase chain reaction (PCR) followed by the construction of cluster profiles. Paired samples of RC and AAA exudates were collected from 20 subjects and analyzed by PCR for the presence of selected strict and facultative anaerobic strains. The frequency of species was compared between the RC and the AAA samples. A stringent neighboring clustering algorithm was applied to investigate the existence of similar high-order groups of samples. A dendrogram was constructed to show the arrangement of the sample groups produced by the hierarchical clustering. All samples harbored bacterial DNA. Porphyromonas endodontalis, Prevotella nigrescens, Filifactor alocis, and Tannerela forsythia were frequently detected in both RC and AAA samples. The selected anaerobic species were distributed in diverse small bacteria consortia. The samples of RC and AAA that presented at least one of the targeted microorganisms were grouped in small clusters. Anaerobic species were frequently detected in acute endodontic infections and heterogeneous microbial communities with low clustering behavior were observed in paired samples of RC and AAA. Copyright © 2012. Published by Elsevier Inc.

  1. Efficacy of a strategy for implementing a guideline for the control of cardiovascular risk in a primary healthcare setting: the SIRVA2 study a controlled, blinded community intervention trial randomised by clusters

    PubMed Central

    2011-01-01

    This work describes the methodology used to assess a strategy for implementing clinical practice guidelines (CPG) for cardiovascular risk control in a health area of Madrid. Background The results on clinical practice of introducing CPGs have been little studied in Spain. The strategy used to implement a CPG is known to influence its final use. Strategies based on the involvement of opinion leaders and that are easily executed appear to be among the most successful. Aim The main aim of the present work was to compare the effectiveness of two strategies for implementing a CPG designed to reduce cardiovascular risk in the primary healthcare setting, measured in terms of improvements in the recording of calculated cardiovascular risk or specific risk factors in patients' medical records, the control of cardiovascular risk factors, and the incidence of cardiovascular events. Methods This study involved a controlled, blinded community intervention in which the 21 health centres of the Number 2 Health Area of Madrid were randomly assigned by clusters to be involved in either a proposed CPG implementation strategy to reduce cardiovascular risk, or the normal dissemination strategy. The study subjects were patients ≥ 45 years of age whose health cards showed them to belong to the studied health area. The main variable examined was the proportion of patients whose medical histories included the calculation of their cardiovascular risk or that explicitly mentioned the presence of variables necessary for its calculation. The sample size was calculated for a comparison of proportions with alpha = 0.05 and beta = 0.20, and assuming that the intervention would lead to a 15% increase in the measured variables. Corrections were made for the design effect, assigning a sample size to each cluster proportional to the size of the population served by the corresponding health centre, and assuming losses of 20%. This demanded a final sample size of 620 patients. Data were analysed using summary measures for each cluster, both in making estimates and for hypothesis testing. Analysis of the variables was made on an intention-to-treat basis. Trial Registration ClinicalTrials.gov: NCT01270022 PMID:21504570

  2. Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

    PubMed Central

    Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

    2012-01-01

    Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360

  3. Spatial and temporal clustering of dengue virus transmission in Thai villages.

    PubMed

    Mammen, Mammen P; Pimgate, Chusak; Koenraadt, Constantianus J M; Rothman, Alan L; Aldstadt, Jared; Nisalak, Ananda; Jarman, Richard G; Jones, James W; Srikiatkhachorn, Anon; Ypil-Butac, Charity Ann; Getis, Arthur; Thammapalo, Suwich; Morrison, Amy C; Libraty, Daniel H; Green, Sharone; Scott, Thomas W

    2008-11-04

    Transmission of dengue viruses (DENV), the leading cause of arboviral disease worldwide, is known to vary through time and space, likely owing to a combination of factors related to the human host, virus, mosquito vector, and environment. An improved understanding of variation in transmission patterns is fundamental to conducting surveillance and implementing disease prevention strategies. To test the hypothesis that DENV transmission is spatially and temporally focal, we compared geographic and temporal characteristics within Thai villages where DENV are and are not being actively transmitted. Cluster investigations were conducted within 100 m of homes where febrile index children with (positive clusters) and without (negative clusters) acute dengue lived during two seasons of peak DENV transmission. Data on human infection and mosquito infection/density were examined to precisely (1) define the spatial and temporal dimensions of DENV transmission, (2) correlate these factors with variation in DENV transmission, and (3) determine the burden of inapparent and symptomatic infections. Among 556 village children enrolled as neighbors of 12 dengue-positive and 22 dengue-negative index cases, all 27 DENV infections (4.9% of enrollees) occurred in positive clusters (p < 0.01; attributable risk [AR] = 10.4 per 100; 95% confidence interval 1-19.8 per 100]. In positive clusters, 12.4% of enrollees became infected in a 15-d period and DENV infections were aggregated centrally near homes of index cases. As only 1 of 217 pairs of serologic specimens tested in positive clusters revealed a recent DENV infection that occurred prior to cluster initiation, we attribute the observed DENV transmission subsequent to cluster investigation to recent DENV transmission activity. Of the 1,022 female adult Ae. aegypti collected, all eight (0.8%) dengue-infected mosquitoes came from houses in positive clusters; none from control clusters or schools. Distinguishing features between positive and negative clusters were greater availability of piped water in negative clusters (p < 0.01) and greater number of Ae. aegypti pupae per person in positive clusters (p = 0.04). During primarily DENV-4 transmission seasons, the ratio of inapparent to symptomatic infections was nearly 1:1 among child enrollees. Study limitations included inability to sample all children and mosquitoes within each cluster and our reliance on serologic rather than virologic evidence of interval infections in enrollees given restrictions on the frequency of blood collections in children. Our data reveal the remarkably focal nature of DENV transmission within a hyperendemic rural area of Thailand. These data suggest that active school-based dengue case detection prompting local spraying could contain recent virus introductions and reduce the longitudinal risk of virus spread within rural areas. Our results should prompt future cluster studies to explore how host immune and behavioral aspects may impact DENV transmission and prevention strategies. Cluster methodology could serve as a useful research tool for investigation of other temporally and spatially clustered infectious diseases.

  4. Spatial and Temporal Clustering of Dengue Virus Transmission in Thai Villages

    PubMed Central

    Mammen, Mammen P; Pimgate, Chusak; Koenraadt, Constantianus J. M; Rothman, Alan L; Aldstadt, Jared; Nisalak, Ananda; Jarman, Richard G; Jones, James W; Srikiatkhachorn, Anon; Ypil-Butac, Charity Ann; Getis, Arthur; Thammapalo, Suwich; Morrison, Amy C; Libraty, Daniel H; Green, Sharone; Scott, Thomas W

    2008-01-01

    Background Transmission of dengue viruses (DENV), the leading cause of arboviral disease worldwide, is known to vary through time and space, likely owing to a combination of factors related to the human host, virus, mosquito vector, and environment. An improved understanding of variation in transmission patterns is fundamental to conducting surveillance and implementing disease prevention strategies. To test the hypothesis that DENV transmission is spatially and temporally focal, we compared geographic and temporal characteristics within Thai villages where DENV are and are not being actively transmitted. Methods and Findings Cluster investigations were conducted within 100 m of homes where febrile index children with (positive clusters) and without (negative clusters) acute dengue lived during two seasons of peak DENV transmission. Data on human infection and mosquito infection/density were examined to precisely (1) define the spatial and temporal dimensions of DENV transmission, (2) correlate these factors with variation in DENV transmission, and (3) determine the burden of inapparent and symptomatic infections. Among 556 village children enrolled as neighbors of 12 dengue-positive and 22 dengue-negative index cases, all 27 DENV infections (4.9% of enrollees) occurred in positive clusters (p < 0.01; attributable risk [AR] = 10.4 per 100; 95% confidence interval 1–19.8 per 100]. In positive clusters, 12.4% of enrollees became infected in a 15-d period and DENV infections were aggregated centrally near homes of index cases. As only 1 of 217 pairs of serologic specimens tested in positive clusters revealed a recent DENV infection that occurred prior to cluster initiation, we attribute the observed DENV transmission subsequent to cluster investigation to recent DENV transmission activity. Of the 1,022 female adult Ae. aegypti collected, all eight (0.8%) dengue-infected mosquitoes came from houses in positive clusters; none from control clusters or schools. Distinguishing features between positive and negative clusters were greater availability of piped water in negative clusters (p < 0.01) and greater number of Ae. aegypti pupae per person in positive clusters (p = 0.04). During primarily DENV-4 transmission seasons, the ratio of inapparent to symptomatic infections was nearly 1:1 among child enrollees. Study limitations included inability to sample all children and mosquitoes within each cluster and our reliance on serologic rather than virologic evidence of interval infections in enrollees given restrictions on the frequency of blood collections in children. Conclusions Our data reveal the remarkably focal nature of DENV transmission within a hyperendemic rural area of Thailand. These data suggest that active school-based dengue case detection prompting local spraying could contain recent virus introductions and reduce the longitudinal risk of virus spread within rural areas. Our results should prompt future cluster studies to explore how host immune and behavioral aspects may impact DENV transmission and prevention strategies. Cluster methodology could serve as a useful research tool for investigation of other temporally and spatially clustered infectious diseases. PMID:18986209

  5. Quantifying innovation in surgery.

    PubMed

    Hughes-Hallett, Archie; Mayer, Erik K; Marcus, Hani J; Cundy, Thomas P; Pratt, Philip J; Parston, Greg; Vale, Justin A; Darzi, Ara W

    2014-08-01

    The objectives of this study were to assess the applicability of patents and publications as metrics of surgical technology and innovation; evaluate the historical relationship between patents and publications; develop a methodology that can be used to determine the rate of innovation growth in any given health care technology. The study of health care innovation represents an emerging academic field, yet it is limited by a lack of valid scientific methods for quantitative analysis. This article explores and cross-validates 2 innovation metrics using surgical technology as an exemplar. Electronic patenting databases and the MEDLINE database were searched between 1980 and 2010 for "surgeon" OR "surgical" OR "surgery." Resulting patent codes were grouped into technology clusters. Growth curves were plotted for these technology clusters to establish the rate and characteristics of growth. The initial search retrieved 52,046 patents and 1,801,075 publications. The top performing technology cluster of the last 30 years was minimally invasive surgery. Robotic surgery, surgical staplers, and image guidance were the most emergent technology clusters. When examining the growth curves for these clusters they were found to follow an S-shaped pattern of growth, with the emergent technologies lying on the exponential phases of their respective growth curves. In addition, publication and patent counts were closely correlated in areas of technology expansion. This article demonstrates the utility of publically available patent and publication data to quantify innovations within surgical technology and proposes a novel methodology for assessing and forecasting areas of technological innovation.

  6. An analysis of the impact of pre-analytical factors on the urine proteome: Sample processing time, temperature, and proteolysis.

    PubMed

    Hepburn, Sophie; Cairns, David A; Jackson, David; Craven, Rachel A; Riley, Beverley; Hutchinson, Michelle; Wood, Steven; Smith, Matthew Welberry; Thompson, Douglas; Banks, Rosamonde E

    2015-06-01

    We have examined the impact of sample processing time delay, temperature, and the addition of protease inhibitors (PIs) on the urinary proteome and peptidome, an important aspect of biomarker studies. Ten urine samples from patients with varying pathologies were each divided and PIs added to one-half, with aliquots of each then processed and frozen immediately, or after a delay of 6 h at 4°C or room temperature (20-22°C), effectively yielding 60 samples in total. Samples were then analyzed by 2D-PAGE, SELDI-TOF-MS, and immunoassay. Interindividual variability in profiles was the dominant feature in all analyses. Minimal changes were observed by 2D-PAGE as a result of delay in processing, temperature, or PIs and no changes were seen in IgG, albumin, β2 -microglobulin, or α1 -microglobulin measured by immunoassay. Analysis of peptides showed clustering of some samples by presence/absence of PIs but the extent was very patient-dependent with most samples showing minimal effects. The extent of processing-induced changes and the benefit of PI addition are patient- and sample-dependent. A consistent processing methodology is essential within a study to avoid any confounding of the results. © 2014 The Authors PROTEOMICS Clinical Applications Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  7. Autogrid-based clustering of kinases: selection of representative conformations for docking purposes.

    PubMed

    Marzaro, Giovanni; Ferrarese, Alessandro; Chilin, Adriana

    2014-08-01

    The selection of the most appropriate protein conformation is a crucial aspect in molecular docking experiments. In order to reduce the errors arising from the use of a single protein conformation, several authors suggest the use of several tridimensional structures for the target. However, the selection of the most appropriate protein conformations still remains a challenging goal. The protein 3D-structures selection is mainly performed based on pairwise root-mean-square-deviation (RMSD) values computation, followed by hierarchical clustering. Herein we report an alternative strategy, based on the computation of only two atom affinity map for each protein conformation, followed by multivariate analysis and hierarchical clustering. This methodology was applied on seven different kinases of pharmaceutical interest. The comparison with the classical RMSD-based strategy was based on cross-docking of co-crystallized ligands. In the case of epidermal growth factor receptor kinase, also the docking performance on 220 known ligands were evaluated, followed by 3D-QSAR studies. In all the cases, the herein proposed methodology outperformed the RMSD-based one.

  8. Clustering by well-being in workplace social networks: Homophily and social contagion.

    PubMed

    Chancellor, Joseph; Layous, Kristin; Margolis, Seth; Lyubomirsky, Sonja

    2017-12-01

    Social interaction among employees is crucial at both an organizational and individual level. Demonstrating the value of recent methodological advances, 2 studies conducted in 2 workplaces and 2 countries sought to answer the following questions: (a) Do coworkers interact more with coworkers who have similar well-being? and, if yes, (b) what are the processes by which such affiliation occurs? Affiliation was assessed via 2 methodologies: a commonly used self-report measure (i.e., mutual nominations by coworkers) complemented by a behavioral measure (i.e., sociometric badges that track physical proximity and social interaction). We found that individuals who share similar levels of well-being (e.g., positive affect, life satisfaction, need satisfaction, and job satisfaction) were more likely to socialize with one another. Furthermore, time-lagged analyses suggested that clustering in need satisfaction arises from mutual attraction (homophily), whereas clustering in job satisfaction and organizational prosocial behavior results from emotional contagion. These results suggest ways in which organizations can physically and socially improve their workplace. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  9. X-ray emission from a complete sample of Abell clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Briel, Ulrich G.; Henry, J. Patrick

    1993-11-01

    The ROSAT All-Sky Survey (RASS) is used to investigate the X-ray properties of a complete sample of Abell clusters with measured redshifts and accurate positions. The sample comprises the 145 clusters within a 561 square degree region at high galactic latitude. The mean redshift is 0.17. This sample is especially well suited to be studied within the RASS since the mean exposure time is higher than average and the mean galactic column density is very low. These together produce a flux limit of about 4.2 x 10-13 erg/sq cm/s in the 0.5 to 2.5 keV energy band. Sixty-six (46%) individual clusters are detected at a significance level higher than 99.7% of which 7 could be chance coincidences of background or foreground sources. At redshifts greater than 0.3 six clusters out of seven (86%) are detected at the same significance level. The detected objects show a clear X-ray luminosity -- galaxy count relation with a dispersion consistent with other external estimates of the error in the counts. By analyzing the excess of positive fluctuations of the X-ray flux at the cluster positions, compared with the fluctuations of randomly drawn background fields, it is possible to extend these results below the nominal flux limit. We find 80% of richness R greater than or = 0 and 86% of R greater than or = 1 clusters are X-ray emitters with fluxes above 1 x 10-13 erg/sq cm/s. Nearly 90% of the clusters meeting the requirements to be in Abell's statistical sample emit above the same level. We therefore conclude that almost all Abell clusters are real clusters and the Abell catalog is not strongly contaminated by projection effects. We use the Kaplan-Meier product limit estimator to calculate the cumulative X-ray luminosity function. We show that the shape of the luminosity functions are similiar for different richness classes, but the characteristic luminosities of richness 2 clusters are about twice those of richness 1 clusters which are in turn about twice those of richness 0 clusters. This result is another manifestation of the luminosity -- richness elation for Abell clusters.

  10. Copula based flexible modeling of associations between clustered event times.

    PubMed

    Geerdens, Candida; Claeskens, Gerda; Janssen, Paul

    2016-07-01

    Multivariate survival data are characterized by the presence of correlation between event times within the same cluster. First, we build multi-dimensional copulas with flexible and possibly symmetric dependence structures for such data. In particular, clustered right-censored survival data are modeled using mixtures of max-infinitely divisible bivariate copulas. Second, these copulas are fit by a likelihood approach where the vast amount of copula derivatives present in the likelihood is approximated by finite differences. Third, we formulate conditions for clustered right-censored survival data under which an information criterion for model selection is either weakly consistent or consistent. Several of the familiar selection criteria are included. A set of four-dimensional data on time-to-mastitis is used to demonstrate the developed methodology.

  11. Approximate solution of coupled cluster equations: application to the coupled cluster doubles method and non-covalent interacting systems.

    PubMed

    Smiga, Szymon; Fabiano, Eduardo

    2017-11-15

    We have developed a simplified coupled cluster (SCC) methodology, using the basic idea of scaled MP2 methods. The scheme has been applied to the coupled cluster double equations and implemented in three different non-iterative variants. This new method (especially the SCCD[3] variant, which utilizes a spin-resolved formalism) has been found to be very efficient and to yield an accurate approximation of the reference CCD results for both total and interaction energies of different atoms and molecules. Furthermore, we demonstrate that the equations determining the scaling coefficients for the SCCD[3] approach can generate non-empirical SCS-MP2 scaling coefficients which are in good agreement with previous theoretical investigations.

  12. Combinatorial clustering and Its Application to 3D Polygonal Traffic Sign Reconstruction From Multiple Images

    NASA Astrophysics Data System (ADS)

    Vallet, B.; Soheilian, B.; Brédif, M.

    2014-08-01

    The 3D reconstruction of similar 3D objects detected in 2D faces a major issue when it comes to grouping the 2D detections into clusters to be used to reconstruct the individual 3D objects. Simple clustering heuristics fail as soon as similar objects are close. This paper formulates a framework to use the geometric quality of the reconstruction as a hint to do a proper clustering. We present a methodology to solve the resulting combinatorial optimization problem with some simplifications and approximations in order to make it tractable. The proposed method is applied to the reconstruction of 3D traffic signs from their 2D detections to demonstrate its capacity to solve ambiguities.

  13. New PLS analysis approach to wine volatile compounds characterization by near infrared spectroscopy (NIR).

    PubMed

    Genisheva, Z; Quintelas, C; Mesquita, D P; Ferreira, E C; Oliveira, J M; Amaral, A L

    2018-04-25

    This work aims to explore the potential of near infrared (NIR) spectroscopy to quantify volatile compounds in Vinho Verde wines, commonly determined by gas chromatography. For this purpose, 105 Vinho Verde wine samples were analyzed using Fourier transform near infrared (FT-NIR) transmission spectroscopy in the range of 5435 cm -1 to 6357 cm -1 . Boxplot and principal components analysis (PCA) were performed for clusters identification and outliers removal. A partial least square (PLS) regression was then applied to develop the calibration models, by a new iterative approach. The predictive ability of the models was confirmed by an external validation procedure with an independent sample set. The obtained results could be considered as quite good with coefficients of determination (R 2 ) varying from 0.94 to 0.97. The current methodology, using NIR spectroscopy and chemometrics, can be seen as a promising rapid tool to determine volatile compounds in Vinho Verde wines. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Spatially explicit population estimates for black bears based on cluster sampling

    USGS Publications Warehouse

    Humm, J.; McCown, J. Walter; Scheick, B.K.; Clark, Joseph D.

    2017-01-01

    We estimated abundance and density of the 5 major black bear (Ursus americanus) subpopulations (i.e., Eglin, Apalachicola, Osceola, Ocala-St. Johns, Big Cypress) in Florida, USA with spatially explicit capture-mark-recapture (SCR) by extracting DNA from hair samples collected at barbed-wire hair sampling sites. We employed a clustered sampling configuration with sampling sites arranged in 3 × 3 clusters spaced 2 km apart within each cluster and cluster centers spaced 16 km apart (center to center). We surveyed all 5 subpopulations encompassing 38,960 km2 during 2014 and 2015. Several landscape variables, most associated with forest cover, helped refine density estimates for the 5 subpopulations we sampled. Detection probabilities were affected by site-specific behavioral responses coupled with individual capture heterogeneity associated with sex. Model-averaged bear population estimates ranged from 120 (95% CI = 59–276) bears or a mean 0.025 bears/km2 (95% CI = 0.011–0.44) for the Eglin subpopulation to 1,198 bears (95% CI = 949–1,537) or 0.127 bears/km2 (95% CI = 0.101–0.163) for the Ocala-St. Johns subpopulation. The total population estimate for our 5 study areas was 3,916 bears (95% CI = 2,914–5,451). The clustered sampling method coupled with information on land cover was efficient and allowed us to estimate abundance across extensive areas that would not have been possible otherwise. Clustered sampling combined with spatially explicit capture-recapture methods has the potential to provide rigorous population estimates for a wide array of species that are extensive and heterogeneous in their distribution.

  15. Tracing Large Scale Structure with a Redshift Survey of Rich Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Batuski, D.; Slinglend, K.; Haase, S.; Hill, J. M.

    1993-12-01

    Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and hold promise of confirming the existence of structure in the more immediate universe on scales corresponding to COBE results (i.e., on the order of 10% or more of the horizon size of the universe). However, most Abell clusters do not as yet have measured redshifts (or, in the case of most low redshift clusters, have only one or two galaxies measured), so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters, perhaps even to the point of spurious identifications of some of the clusters themselves. Our approach in this effort has been to use the MX multifiber spectrometer to measure redshifts of at least ten galaxies in each of about 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8. This work will result in a somewhat deeper, much more complete (and reliable) sample of positions of rich clusters. Our primary use for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 40 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.

  16. The Mass Function in h+(chi) Persei

    NASA Astrophysics Data System (ADS)

    Bragg, Ann; Kenyon, Scott

    2000-08-01

    Knowledge of the stellar initial mass function (IMF) is critical to understanding star formation and galaxy evolution. Past studies of the IMF in open clusters have primarily used luminosity functions to determine mass functions, frequently in relatively sparse clusters. Our goal with this project is to derive a reliable, well- sampled IMF for a pair of very dense young clusters (h+(chi) Persei) with ages, 1-2 × 10^7 yr (e.g., Vogt A& A 11:359), where stellar evolution theory is robust. We will construct the HR diagram using both photometry and spectral types to derive more accurate stellar masses and ages than are possible using photometry alone. Results from the two clusters will be compared to examine the universality of the IMF. We currently have a spectroscopic sample covering an area within 9 arc-minutes of the center of each cluster taken with the FAST Spectrograph. The sample is complete to V=15.4 and contains ~ 1000 stars. We request 2 nights at WIYN/HYDRA to extend this sample to deeper magnitudes, allowing us to determine the IMF of the clusters to a lower limiting mass and to search for a pre-main sequence, theoretically predicted to be present for clusters of this age. Note that both clusters are contained within a single HYDRA field.

  17. Using preoperative unsupervised cluster analysis of chronic rhinosinusitis to inform patient decision and endoscopic sinus surgery outcome.

    PubMed

    Adnane, Choaib; Adouly, Taoufik; Khallouk, Amine; Rouadi, Sami; Abada, Redallah; Roubal, Mohamed; Mahtar, Mohamed

    2017-02-01

    The purpose of this study is to use unsupervised cluster methodology to identify phenotype and mucosal eosinophilia endotype subgroups of patients with medical refractory chronic rhinosinusitis (CRS), and evaluate the difference in quality of life (QOL) outcomes after endoscopic sinus surgery (ESS) between these clusters for better surgical case selection. A prospective cohort study included 131 patients with medical refractory CRS who elected ESS. The Sino-Nasal Outcome Test (SNOT-22) was used to evaluate QOL before and 12 months after surgery. Unsupervised two-step clustering method was performed. One hundred and thirteen subjects were retained in this study: 46 patients with CRS without nasal polyps and 67 patients with nasal polyps. Nasal polyps, gender, mucosal eosinophilia profile, and prior sinus surgery were the most discriminating factors in the generated clusters. Three clusters were identified. A significant clinical improvement was observed in all clusters 12 months after surgery with a reduction of SNOT-22 scores. There was a significant difference in QOL outcomes between clusters; cluster 1 had the worst QOL improvement after FESS in comparison with the other clusters 2 and 3. All patients in cluster 1 presented CRSwNP with the highest mucosal eosinophilia endotype. Clustering method is able to classify CRS phenotypes and endotypes with different associated surgical outcomes.

  18. Recognition of genetically modified product based on affinity propagation clustering and terahertz spectroscopy

    NASA Astrophysics Data System (ADS)

    Liu, Jianjun; Kan, Jianquan

    2018-04-01

    In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.

  19. Open star clusters and Galactic structure

    NASA Astrophysics Data System (ADS)

    Joshi, Yogesh C.

    2018-04-01

    In order to understand the Galactic structure, we perform a statistical analysis of the distribution of various cluster parameters based on an almost complete sample of Galactic open clusters yet available. The geometrical and physical characteristics of a large number of open clusters given in the MWSC catalogue are used to study the spatial distribution of clusters in the Galaxy and determine the scale height, solar offset, local mass density and distribution of reddening material in the solar neighbourhood. We also explored the mass-radius and mass-age relations in the Galactic open star clusters. We find that the estimated parameters of the Galactic disk are largely influenced by the choice of cluster sample.

  20. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures

    PubMed Central

    Chen, Yun; Yang, Hui

    2016-01-01

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581

  1. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures.

    PubMed

    Chen, Yun; Yang, Hui

    2016-12-14

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.

  2. Declustering of clustered preferential sampling for histogram and semivariogram inference

    USGS Publications Warehouse

    Olea, R.A.

    2007-01-01

    Measurements of attributes obtained more as a consequence of business ventures than sampling design frequently result in samplings that are preferential both in location and value, typically in the form of clusters along the pay. Preferential sampling requires preprocessing for the purpose of properly inferring characteristics of the parent population, such as the cumulative distribution and the semivariogram. Consideration of the distance to the nearest neighbor allows preparation of resampled sets that produce comparable results to those from previously proposed methods. Clustered sampling of size 140, taken from an exhaustive sampling, is employed to illustrate this approach. ?? International Association for Mathematical Geology 2007.

  3. ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

    PubMed Central

    Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.

    2010-01-01

    Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420

  4. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

    PubMed Central

    Wernisch, Lorenz

    2017-01-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190

  5. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

    PubMed

    Gabasova, Evelina; Reid, John; Wernisch, Lorenz

    2017-10-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.

  6. Analysis methods for Thematic Mapper data of urban regions

    NASA Technical Reports Server (NTRS)

    Wang, S. C.

    1984-01-01

    Studies have indicated the difficulty in deriving a detailed land-use/land-cover classification for heterogeneous metropolitan areas with Landsat MSS and TM data. The major methodological issues of digital analysis which possibly have effected the results of classification are examined. In response to these methodological issues, a multichannel hierarchical clustering algorithm has been developed and tested for a more complete analysis of the data for urban areas.

  7. Towards the use of computationally inserted lesions for mammographic CAD assessment

    NASA Astrophysics Data System (ADS)

    Ghanian, Zahra; Pezeshk, Aria; Petrick, Nicholas; Sahiner, Berkman

    2018-03-01

    Computer-aided detection (CADe) devices used for breast cancer detection on mammograms are typically first developed and assessed for a specific "original" acquisition system, e.g., a specific image detector. When CADe developers are ready to apply their CADe device to a new mammographic acquisition system, they typically assess the CADe device with images acquired using the new system. Collecting large repositories of clinical images containing verified cancer locations and acquired by the new image acquisition system is costly and time consuming. Our goal is to develop a methodology to reduce the clinical data burden in the assessment of a CADe device for use with a different image acquisition system. We are developing an image blending technique that allows users to seamlessly insert lesions imaged using an original acquisition system into normal images or regions acquired with a new system. In this study, we investigated the insertion of microcalcification clusters imaged using an original acquisition system into normal images acquired with that same system utilizing our previously-developed image blending technique. We first performed a reader study to assess whether experienced observers could distinguish between computationally inserted and native clusters. For this purpose, we applied our insertion technique to clinical cases taken from the University of South Florida Digital Database for Screening Mammography (DDSM) and the Breast Cancer Digital Repository (BCDR). Regions of interest containing microcalcification clusters from one breast of a patient were inserted into the contralateral breast of the same patient. The reader study included 55 native clusters and their 55 inserted counterparts. Analysis of the reader ratings using receiver operating characteristic (ROC) methodology indicated that inserted clusters cannot be reliably distinguished from native clusters (area under the ROC curve, AUC=0.58±0.04). Furthermore, CADe sensitivity was evaluated on mammograms with native and inserted microcalcification clusters using a commercial CADe system. For this purpose, we used full field digital mammograms (FFDMs) from 68 clinical cases, acquired at the University of Michigan Health System. The average sensitivities for native and inserted clusters were equal, 85.3% (58/68). These results demonstrate the feasibility of using the inserted microcalcification clusters for assessing mammographic CAD devices.

  8. Spectroscopic studies of clusterization of methanol molecules isolated in a nitrogen matrix

    NASA Astrophysics Data System (ADS)

    Vaskivskyi, Ye.; Doroshenko, I.; Chernolevska, Ye.; Pogorelov, V.; Pitsevich, G.

    2017-12-01

    IR absorption spectra of methanol isolated in a nitrogen matrix are recorded at temperatures ranging from 9 to 34 K. The changes in the spectra with increasing matrix temperature are analyzed. Based on quantum-chemical calculations of the geometric and spectral parameters of different methanol clusters, the observed absorption bands are identified. The cluster composition of the sample is determined at each temperature. It is shown that as the matrix is heated there is a redistribution among the different cluster structures in the sample, from smaller to larger clusters.

  9. Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

    PubMed

    Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

    2017-11-01

    The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.

  10. EVIDENCE FOR THE UNIVERSALITY OF PROPERTIES OF RED-SEQUENCE GALAXIES IN X-RAY- AND RED-SEQUENCE-SELECTED CLUSTERS AT z ∼ 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foltz, R.; Wilson, G.; DeGroot, A.

    We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable tomore » detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.« less

  11. Measuring consistent masses for 25 Milky Way globular clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kimmig, Brian; Seth, Anil; Ivans, Inese I.

    2015-02-01

    We present central velocity dispersions, masses, mass-to-light ratios (M/Ls ), and rotation strengths for 25 Galactic globular clusters (GCs). We derive radial velocities of 1951 stars in 12 GCs from single order spectra taken with Hectochelle on the MMT telescope. To this sample we add an analysis of available archival data of individual stars. For the full set of data we fit King models to derive consistent dynamical parameters for the clusters. We find good agreement between single-mass King models and the observed radial dispersion profiles. The large, uniform sample of dynamical masses we derive enables us to examine trendsmore » of M/L with cluster mass and metallicity. The overall values of M/L and the trends with mass and metallicity are consistent with existing measurements from a large sample of M31 clusters. This includes a clear trend of increasing M/L with cluster mass and lower than expected M/Ls for the metal-rich clusters. We find no clear trend of increasing rotation with increasing cluster metallicity suggested in previous work.« less

  12. Lessons Learned From an Epidemiologist-Led Countywide Community Assessment for Public Health Emergency Response (CASPER) in Oregon.

    PubMed

    Repp, Kimberly K; Hawes, Eva; Rees, Kathleen J; Vorderstrasse, Beth; Mohnkern, Sue

    2018-06-07

    Conducting a large-scale Community Assessment for Public Health Emergency Response (CASPER) in a geographically and linguistically diverse county presents significant methodological challenges that require advance planning. The Centers for Disease Control and Prevention (CDC) has adapted methodology and provided a toolkit for a rapid needs assessment after a disaster. The assessment provides representative data of the sampling frame to help guide effective distribution of resources. This article describes methodological considerations and lessons learned from a CASPER exercise conducted by Washington County Public Health in June 2016 to assess community emergency preparedness. The CDC's CASPER toolkit provides detailed guidance for exercises in urban areas where city blocks are well defined with many single family homes. Converting the exercise to include rural areas with challenging geographical terrain, including accessing homes without public roads, required considerable adjustments in planning. Adequate preparations for vulnerable populations with English linguistic barriers required additional significant resources. Lessons learned are presented from the first countywide CASPER exercise in Oregon. Approximately 61% of interviews were completed, and 85% of volunteers reported they would participate in another CASPER exercise. Results from the emergency preparedness survey will be presented elsewhere. This experience indicates the most important considerations for conducting a CASPER exercise are oversampling clusters, overrecruiting volunteers, anticipating the actual cost of staff time, and ensuring timely language services are available during the event.

  13. Mixture Hidden Markov Models in Finance Research

    NASA Astrophysics Data System (ADS)

    Dias, José G.; Vermunt, Jeroen K.; Ramos, Sofia

    Finite mixture models have proven to be a powerful framework whenever unobserved heterogeneity cannot be ignored. We introduce in finance research the Mixture Hidden Markov Model (MHMM) that takes into account time and space heterogeneity simultaneously. This approach is flexible in the sense that it can deal with the specific features of financial time series data, such as asymmetry, kurtosis, and unobserved heterogeneity. This methodology is applied to model simultaneously 12 time series of Asian stock markets indexes. Because we selected a heterogeneous sample of countries including both developed and emerging countries, we expect that heterogeneity in market returns due to country idiosyncrasies will show up in the results. The best fitting model was the one with two clusters at country level with different dynamics between the two regimes.

  14. An analysis of the impact of pre‐analytical factors on the urine proteome: Sample processing time, temperature, and proteolysis

    PubMed Central

    Hepburn, Sophie; Cairns, David A.; Jackson, David; Craven, Rachel A.; Riley, Beverley; Hutchinson, Michelle; Wood, Steven; Smith, Matthew Welberry; Thompson, Douglas

    2015-01-01

    Purpose We have examined the impact of sample processing time delay, temperature, and the addition of protease inhibitors (PIs) on the urinary proteome and peptidome, an important aspect of biomarker studies. Experimental design Ten urine samples from patients with varying pathologies were each divided and PIs added to one‐half, with aliquots of each then processed and frozen immediately, or after a delay of 6 h at 4°C or room temperature (20–22°C), effectively yielding 60 samples in total. Samples were then analyzed by 2D‐PAGE, SELDI‐TOF‐MS, and immunoassay. Results Interindividual variability in profiles was the dominant feature in all analyses. Minimal changes were observed by 2D‐PAGE as a result of delay in processing, temperature, or PIs and no changes were seen in IgG, albumin, β2‐microglobulin, or α1‐microglobulin measured by immunoassay. Analysis of peptides showed clustering of some samples by presence/absence of PIs but the extent was very patient‐dependent with most samples showing minimal effects. Conclusions and clinical relevance The extent of processing‐induced changes and the benefit of PI addition are patient‐ and sample‐dependent. A consistent processing methodology is essential within a study to avoid any confounding of the results. PMID:25400092

  15. Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

    DOE PAGES

    Old, L.; Wojtak, R.; Pearce, F. R.; ...

    2017-12-20

    With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

  16. Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Old, L.; Wojtak, R.; Pearce, F. R.

    With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

  17. HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T. H.

    2017-08-01

    The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).

  18. Testing spectral models for stellar populations with star clusters - II. Results

    NASA Astrophysics Data System (ADS)

    González Delgado, Rosa M.; Cid Fernandes, Roberto

    2010-04-01

    High spectral resolution evolutionary synthesis models have become a routinely used ingredient in extragalactic work, and as such deserve thorough testing. Star clusters are ideal laboratories for such tests. This paper applies the spectral fitting methodology outlined in Paper I to a sample of clusters, mainly from the Magellanic Clouds and spanning a wide range in age and metallicity, fitting their integrated light spectra with a suite of modern evolutionary synthesis models for single stellar populations. The combinations of model plus spectral library employed in this investigation are Galaxev/STELIB, Vazdekis/MILES, SED@/GRANADA and Galaxev/MILES+GRANADA, which provide a representative sample of models currently available for spectral fitting work. A series of empirical tests are performed with these models, comparing the quality of the spectral fits and the values of age, metallicity and extinction obtained with each of them. A comparison is also made between the properties derived from these spectral fits and literature data on these nearby, well studied clusters. These comparisons are done with the general goal of providing useful feedback for model makers, as well as guidance to the users of such models. We find the following. (i) All models are able to derive ages that are in good agreement both with each other and with literature data, although ages derived from spectral fits are on average slightly older than those based on the S-colour-magnitude diagram (S-CMD) method as calibrated by Girardi et al. (ii) There is less agreement between the models for the metallicity and extinction. In particular, Galaxev/STELIB models underestimate the metallicity by ~0.6 dex, and the extinction is overestimated by 0.1 mag. (iii) New generations of models using the GRANADA and MILES libraries are superior to STELIB-based models both in terms of spectral fit quality and regarding the accuracy with which age and metallicity are retrieved. Accuracies of about 0.1 dex in age and 0.3 dex in metallicity can be achieved as long as the models are not extrapolated beyond their expected range of validity.

  19. The clustering of galaxies in the completed SDSS-III Baryon Oscillation Spectroscopic Survey: Double-probe measurements from BOSS galaxy clustering and Planck data – towards an analysis without informative priors

    DOE PAGES

    Pellejero-Ibanez, Marco; Chuang, Chia -Hsun; Rubino-Martin, J. A.; ...

    2016-03-28

    Here, we develop a new methodology called double-probe analysis with the aim of minimizing informative priors in the estimation of cosmological parameters. We extract the dark-energy-model-independent cosmological constraints from the joint data sets of Baryon Oscillation Spectroscopic Survey (BOSS) galaxy sample and Planck cosmic microwave background (CMB) measurement. We measure the mean values and covariance matrix of {R, l a, Ω bh 2, n s, log(A s), Ω k, H(z), D A(z), f(z)σ 8(z)}, which give an efficient summary of Planck data and 2-point statistics from BOSS galaxy sample, where R = √Ω mH 2 0, and l a =more » πr(z *)/r s(z *), z * is the redshift at the last scattering surface, and r(z *) and r s(z *) denote our comoving distance to z * and sound horizon at z * respectively. The advantage of this method is that we do not need to put informative priors on the cosmological parameters that galaxy clustering is not able to constrain well, i.e. Ω bh 2 and n s. Using our double-probe results, we obtain Ω m = 0.304 ± 0.009, H 0 = 68.2 ± 0.7, and σ 8 = 0.806 ± 0.014 assuming ΛCDM; and Ω k = 0.002 ± 0.003 and w = –1.00 ± 0.07 assuming owCDM. The results show no tension with the flat ΛCDM cosmological paradigm. By comparing with the full-likelihood analyses with fixed dark energy models, we demonstrate that the double-probe method provides robust cosmological parameter constraints which can be conveniently used to study dark energy models. We extend our study to measure the sum of neutrino mass and obtain Σm ν < 0.10/0.22 (68%/95%) assuming ΛCDM and Σm ν < 0.26/0.52 (68%/95%) assuming wCDM. This paper is part of a set that analyses the final galaxy clustering dataset from BOSS.« less

  20. 21 CFR 118.7 - Sampling methodology for Salmonella Enteritidis (SE).

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 21 Food and Drugs 2 2013-04-01 2013-04-01 false Sampling methodology for Salmonella Enteritidis (SE). 118.7 Section 118.7 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN....7 Sampling methodology for Salmonella Enteritidis (SE). (a) Environmental sampling. An environmental...

  1. 21 CFR 118.7 - Sampling methodology for Salmonella Enteritidis (SE).

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 21 Food and Drugs 2 2014-04-01 2014-04-01 false Sampling methodology for Salmonella Enteritidis (SE). 118.7 Section 118.7 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN....7 Sampling methodology for Salmonella Enteritidis (SE). (a) Environmental sampling. An environmental...

  2. 21 CFR 118.7 - Sampling methodology for Salmonella Enteritidis (SE).

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 21 Food and Drugs 2 2012-04-01 2012-04-01 false Sampling methodology for Salmonella Enteritidis (SE). 118.7 Section 118.7 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN....7 Sampling methodology for Salmonella Enteritidis (SE). (a) Environmental sampling. An environmental...

  3. 21 CFR 118.7 - Sampling methodology for Salmonella Enteritidis (SE).

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 2 2010-04-01 2010-04-01 false Sampling methodology for Salmonella Enteritidis (SE). 118.7 Section 118.7 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN....7 Sampling methodology for Salmonella Enteritidis (SE). (a) Environmental sampling. An environmental...

  4. 21 CFR 118.7 - Sampling methodology for Salmonella Enteritidis (SE).

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 2 2011-04-01 2011-04-01 false Sampling methodology for Salmonella Enteritidis (SE). 118.7 Section 118.7 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN....7 Sampling methodology for Salmonella Enteritidis (SE). (a) Environmental sampling. An environmental...

  5. The Facebook Influence Model: A Concept Mapping Approach

    PubMed Central

    Kota, Rajitha; Schoohs, Shari; Whitehill, Jennifer M.

    2013-01-01

    Abstract Facebook is a popular social media Web site that has been hypothesized to exert potential influence over users' attitudes, intentions, or behaviors. The purpose of this study was to develop a conceptual framework to explain influential aspects of Facebook. This mixed methods study applied concept mapping methodology, a validated five-step method to visually represent complex topics. The five steps comprise preparation, brainstorming, sort and rank, analysis, and interpretation. College student participants were identified using purposeful sampling. The 80 participants had a mean age of 20.5 years, and included 36% males. A total of 169 statements were generated during brainstorming, and sorted into between 6 and 22 groups. The final concept map included 13 clusters. Interpretation data led to grouping of clusters into four final domains, including connection, comparison, identification, and Facebook as an experience. The Facebook Influence Concept Map illustrates key constructs that contribute to influence, incorporating perspectives of older adolescent Facebook users. While Facebook provides a novel lens through which to consider behavioral influence, it can best be considered in the context of existing behavioral theory. The concept map may be used toward development of potential future intervention efforts. PMID:23621717

  6. Genetic Heterogeneity of Self-Reported Ancestry Groups in an Admixed Brazilian Population

    PubMed Central

    Lins, Tulio C; Vieira, Rodrigo G; Abreu, Breno S; Gentil, Paulo; Moreno-Lima, Ricardo; Oliveira, Ricardo J; Pereira, Rinaldo W

    2011-01-01

    Background Population stratification is the main source of spurious results and poor reproducibility in genetic association findings. Population heterogeneity can be controlled for by grouping individuals in ethnic clusters; however, in admixed populations, there is evidence that such proxies do not provide efficient stratification control. The aim of this study was to evaluate the relation of self-reported with genetic ancestry and the statistical risk of grouping an admixed sample based on self-reported ancestry. Methods A questionnaire that included an item on self-reported ancestry was completed by 189 female volunteers from an admixed Brazilian population. Individual genetic ancestry was then determined by genotyping ancestry informative markers. Results Self-reported ancestry was classified as white, intermediate, and black. The mean difference among self-reported groups was significant for European and African, but not Amerindian, genetic ancestry. Pairwise fixation index analysis revealed a significant difference among groups. However, the increase in the chance of type 1 error was estimated to be 14%. Conclusions Self-reporting of ancestry was not an appropriate methodology to cluster groups in a Brazilian population, due to high variance at the individual level. Ancestry informative markers are more useful for quantitative measurement of biological ancestry. PMID:21498954

  7. The Facebook influence model: a concept mapping approach.

    PubMed

    Moreno, Megan A; Kota, Rajitha; Schoohs, Shari; Whitehill, Jennifer M

    2013-07-01

    Facebook is a popular social media Web site that has been hypothesized to exert potential influence over users' attitudes, intentions, or behaviors. The purpose of this study was to develop a conceptual framework to explain influential aspects of Facebook. This mixed methods study applied concept mapping methodology, a validated five-step method to visually represent complex topics. The five steps comprise preparation, brainstorming, sort and rank, analysis, and interpretation. College student participants were identified using purposeful sampling. The 80 participants had a mean age of 20.5 years, and included 36% males. A total of 169 statements were generated during brainstorming, and sorted into between 6 and 22 groups. The final concept map included 13 clusters. Interpretation data led to grouping of clusters into four final domains, including connection, comparison, identification, and Facebook as an experience. The Facebook Influence Concept Map illustrates key constructs that contribute to influence, incorporating perspectives of older adolescent Facebook users. While Facebook provides a novel lens through which to consider behavioral influence, it can best be considered in the context of existing behavioral theory. The concept map may be used toward development of potential future intervention efforts.

  8. Assessment of metal pollution based on multivariate statistical modeling of 'hot spot' sediments from the Black Sea.

    PubMed

    Simeonov, V; Massart, D L; Andreev, G; Tsakovski, S

    2000-11-01

    The paper deals with application of different statistical methods like cluster and principal components analysis (PCA), partial least squares (PLSs) modeling. These approaches are an efficient tool in achieving better understanding about the contamination of two gulf regions in Black Sea. As objects of the study, a collection of marine sediment samples from Varna and Bourgas "hot spots" gulf areas are used. In the present case the use of cluster and PCA make it possible to separate three zones of the marine environment with different levels of pollution by interpretation of the sediment analysis (Bourgas gulf, Varna gulf and lake buffer zone). Further, the extraction of four latent factors offers a specific interpretation of the possible pollution sources and separates natural from anthropogenic factors, the latter originating from contamination by chemical, oil refinery and steel-work enterprises. Finally, the PLSs modeling gives a better opportunity in predicting contaminant concentration on tracer (or tracers) element as compared to the one-dimensional approach of the baseline models. The results of the study are important not only in local aspect as they allow quick response in finding solutions and decision making but also in broader sense as a useful environmetrical methodology.

  9. VizieR Online Data Catalog: 44 SZ-selected galaxy clusters ACT observations (Sifon+, 2016)

    NASA Astrophysics Data System (ADS)

    Sifon, C.; Battaglia, N.; Hasselfield, M.; Menanteau, F.; Barrientos, L. F.; Bond, J. R.; Crichton, D.; Devlin, M. J.; Dunner, R.; Hilton, M.; Hincks, A. D.; Hlozek, R.; Huffenberger, K. M.; Hughes, J. P.; Infante, L.; Kosowsky, A.; Marsden, D.; Marriage, T. A.; Moodley, K.; Niemack, M. D.; Page, L. A.; Spergel, D. N.; Staggs, S. T.; Trac, H.; Wollack, E. J.

    2017-11-01

    ACT is a 6-metre off-axis Gregorian telescope located at an altitude of 5200um in the Atacama desert in Chile, designed to observe the CMB at arcminute resolution. Galaxy clusters were detected in the 148GHz band by matched-filtering the maps with the pressure profile suggested by Arnaud et al. (2010A&A...517A..92A), fit to X-ray selected local (z<0.2) clusters, with varying cluster sizes,θ500, from 1.18 to 27-arcmin. Because of the complete overlap of ACT equatorial observations with Sloan Digital Sky Survey Data Release 8 (SDSS DR8; Aihara et al., 2011ApJS..193...29A) imaging, all cluster candidates were assessed with optical data (Menanteau et al., 2013ApJ...765...67M). We observed 20 clusters from the equatorial sample with the Gemini Multi-Object Spectrograph (GMOS) on the Gemini-South telescope, split in semesters 2011B (ObsID:GS-2011B-C-1, PI:Barrientos/Menanteau) and 2012A (ObsID:GS-2012A-C-1, PI:Menanteau), prioritizing clusters in the cosmological sample at 0.3

  10. Versatile Method for the Site-Specific Modification of DNA with Boron Clusters: Anti-Epidermal Growth Factor Receptor (EGFR) Antisense Oligonucleotide Case.

    PubMed

    Ebenryter-Olbińska, Katarzyna; Kaniowski, Damian; Sobczak, Milena; Wojtczak, Błażej A; Janczak, Sławomir; Wielgus, Ewelina; Nawrot, Barbara; Leśnikowski, Zbigniew J

    2017-11-21

    A general and convenient approach for the incorporation of different types of boron clusters into specific locations of the DNA-oligonucleotide chain based on the automated phosphoramidite method of oligonucleotide synthesis and post-synthetic "click chemistry" modification has been developed. Pronounced effects of boron-cluster modification on the physico- and biochemical properties of the antisense oligonucleotides were observed. The silencing activity of antisense oligonucleotides bearing a single boron cluster modification in the middle of the oligonucleotide chain was substantially higher than that of unmodified oligonucleotides. This finding may be of importance for the design of therapeutic nucleic acids with improved properties. The proposed synthetic methodology broadens the availability of nucleic acid-boron cluster conjugates and opens up new avenues for their potential practical use. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. An Analysis of Rich Cluster Redshift Survey Data for Large Scale Structure Studies

    NASA Astrophysics Data System (ADS)

    Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.

    1994-12-01

    The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and may hold the promise of confirming structure on the scale of the COBE result. However, many Abell clusters have zero or only one measured redshift, so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work will result in a deeper, more complete (and reliable) sample of positions of rich clusters. Our primary intent for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters in an effort to constrain theoretical models for structure formation. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 64 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms and stripe density plots for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.

  12. THE SWIFT AGN AND CLUSTER SURVEY. II. CLUSTER CONFIRMATION WITH SDSS DATA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Griffin, Rhiannon D.; Dai, Xinyu; Kochanek, Christopher S.

    2016-01-15

    We study 203 (of 442) Swift AGN and Cluster Survey extended X-ray sources located in the SDSS DR8 footprint to search for galaxy over-densities in three-dimensional space using SDSS galaxy photometric redshifts and positions near the Swift cluster candidates. We find 104 Swift clusters with a >3σ galaxy over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmation as galaxy clusters. We present a series of cluster properties including the redshift, brightest cluster galaxy (BCG) magnitude, BCG-to-X-ray center offset, optical richness, and X-ray luminosity. We also detect red sequences in ∼85% ofmore » the 104 confirmed clusters. The X-ray luminosity and optical richness for the SDSS confirmed Swift clusters are correlated and follow previously established relations. The distribution of the separations between the X-ray centroids and the most likely BCG is also consistent with expectation. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≲ 0.3 and is still 80% complete up to z ≃ 0.4, consistent with the SDSS survey depth. These analysis results suggest that our Swift cluster selection algorithm has yielded a statistically well-defined cluster sample for further study of cluster evolution and cosmology. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 23, and 1 matches in optical, X-ray, and Sunyaev–Zel’dovich catalogs, respectively, and so the majority of these clusters are new detections.« less

  13. Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique

    ERIC Educational Resources Information Center

    Steinley, Douglas

    2006-01-01

    Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate…

  14. Cluster lot quality assurance sampling: effect of increasing the number of clusters on classification precision and operational feasibility.

    PubMed

    Okayasu, Hiromasa; Brown, Alexandra E; Nzioki, Michael M; Gasasira, Alex N; Takane, Marina; Mkanda, Pascal; Wassilak, Steven G F; Sutter, Roland W

    2014-11-01

    To assess the quality of supplementary immunization activities (SIAs), the Global Polio Eradication Initiative (GPEI) has used cluster lot quality assurance sampling (C-LQAS) methods since 2009. However, since the inception of C-LQAS, questions have been raised about the optimal balance between operational feasibility and precision of classification of lots to identify areas with low SIA quality that require corrective programmatic action. To determine if an increased precision in classification would result in differential programmatic decision making, we conducted a pilot evaluation in 4 local government areas (LGAs) in Nigeria with an expanded LQAS sample size of 16 clusters (instead of the standard 6 clusters) of 10 subjects each. The results showed greater heterogeneity between clusters than the assumed standard deviation of 10%, ranging from 12% to 23%. Comparing the distribution of 4-outcome classifications obtained from all possible combinations of 6-cluster subsamples to the observed classification of the 16-cluster sample, we obtained an exact match in classification in 56% to 85% of instances. We concluded that the 6-cluster C-LQAS provides acceptable classification precision for programmatic action. Considering the greater resources required to implement an expanded C-LQAS, the improvement in precision was deemed insufficient to warrant the effort. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  15. Clustering on very small scales from a large sample of confirmed quasar pairs: does quasar clustering track from Mpc to kpc scales?

    NASA Astrophysics Data System (ADS)

    Eftekharzadeh, S.; Myers, A. D.; Hennawi, J. F.; Djorgovski, S. G.; Richards, G. T.; Mahabal, A. A.; Graham, M. J.

    2017-06-01

    We present the most precise estimate to date of the clustering of quasars on very small scales, based on a sample of 47 binary quasars with magnitudes of g < 20.85 and proper transverse separations of ˜25 h-1 kpc. Our sample of binary quasars, which is about six times larger than any previous spectroscopically confirmed sample on these scales, is targeted using a kernel density estimation (KDE) technique applied to Sloan Digital Sky Survey (SDSS) imaging over most of the SDSS area. Our sample is 'complete' in that all of the KDE target pairs with 17.0 ≲ R ≲ 36.2 h-1 kpc in our area of interest have been spectroscopically confirmed from a combination of previous surveys and our own long-slit observational campaign. We catalogue 230 candidate quasar pairs with angular separations of <8 arcsec, from which our binary quasars were identified. We determine the projected correlation function of quasars (\\bar{W}_p) in four bins of proper transverse scale over the range 17.0 ≲ R ≲ 36.2 h-1 kpc. The implied small-scale quasar clustering amplitude from the projected correlation function, integrated across our entire redshift range, is A = 24.1 ± 3.6 at ˜26.6 h-1 kpc. Our sample is the first spectroscopically confirmed sample of quasar pairs that is sufficiently large to study how quasar clustering evolves with redshift at ˜25 h-1 kpc. We find that empirical descriptions of how quasar clustering evolves with redshift at ˜25 h-1 Mpc also adequately describe the evolution of quasar clustering at ˜25 h-1 kpc.

  16. Online clustering algorithms for radar emitter classification.

    PubMed

    Liu, Jun; Lee, Jim P Y; Senior; Li, Lingjie; Luo, Zhi-Quan; Wong, K Max

    2005-08-01

    Radar emitter classification is a special application of data clustering for classifying unknown radar emitters from received radar pulse samples. The main challenges of this task are the high dimensionality of radar pulse samples, small sample group size, and closely located radar pulse clusters. In this paper, two new online clustering algorithms are developed for radar emitter classification: One is model-based using the Minimum Description Length (MDL) criterion and the other is based on competitive learning. Computational complexity is analyzed for each algorithm and then compared. Simulation results show the superior performance of the model-based algorithm over competitive learning in terms of better classification accuracy, flexibility, and stability.

  17. Estimating regression coefficients from clustered samples: Sampling errors and optimum sample allocation

    NASA Technical Reports Server (NTRS)

    Kalton, G.

    1983-01-01

    A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.

  18. Instructional Changes Adopted for an Engineering Course: Cluster Analysis on Academic Failure

    PubMed Central

    Álvarez-Bermejo, José A.; Belmonte-Ureña, Luis J.; Martos-Martínez, África; Barragán-Martín, Ana B.; Simón-Márquez, María M.

    2016-01-01

    As first year students come from diverse backgrounds, basic skills should be accessible to everyone as soon as possible. Transferring such skills to these students is challenging, especially in highly technical courses. Ensuring that essential knowledge is acquired quickly promotes the student’s self-esteem and may positively influence failure rates. Metaphors can help do this. Metaphors are used to understand the unknown. This paper shows how we made a turn in student learning at the University of Almeria. Our hypothesis assumed that metaphors accelerate the acquisition of basic knowledge so that other skills built on that foundation are easily learned. With these goals in mind, we changed the way we teach by using metaphors and abstract concepts in a computer organization course, a technical course in the first year of an information technology engineering degree. Cluster analysis of the data on collective student performance after this methodological change clearly identified two distinct groups. These two groups perfectly matched the “before and after” scenarios of the use of metaphors. The study was conducted during 11 academic years (2002/2003 to 2012/2013). The 475 observations made during this period illustrate the usefulness of this change in teaching and learning, shifting from a propositional teaching/learning model to a more dynamic model based on metaphors and abstractions. Data covering the whole period showed favorable evolution of student achievement and reduced failure rates, not only in this course, but also in many of the following more advanced courses. The paper is structured in five sections. The first gives an introduction, the second describes the methodology. The third section describes the sample and the study carried out. The fourth section presents the results and, finally, the fifth section discusses the main conclusions. PMID:27895611

  19. Determining Criteria and Weights for Prioritizing Health Technologies Based on the Preferences of the General Population: A New Zealand Pilot Study.

    PubMed

    Sullivan, Trudy; Hansen, Paul

    2017-04-01

    The use of multicriteria decision analysis for health technology prioritization depends on decision-making criteria and weights according to their relative importance. We report on a methodology for determining criteria and weights that was developed and piloted in New Zealand and enables extensive participation by members of the general population. Stimulated by a preliminary ranking exercise that involved prioritizing 14 diverse technologies, six focus groups discussed what matters to people when thinking about technologies that should be funded. These discussions informed the specification of criteria related to technologies' benefits for use in a discrete choice survey designed to generate weights for each individual participant as well as mean weights. A random sample of 3218 adults was invited to participate. To check test-retest reliability, a subsample completed the survey twice. Cluster analysis was performed to identify participants with similar patterns of weights. Six benefits-related criteria were distilled from the focus group discussions and included in the discrete choice survey, which was completed by 322 adults (10% response rate). Most participants (85%) found the survey easy to understand, and the survey exhibited test-retest reliability. The cluster analysis revealed that participant weights are related more to idiosyncratic personal preferences than to demographic and background characteristics. The methodology enables extensive participation by members of the general population, for whom it is both acceptable and reliable. Generating weights for each participant allows the heterogeneity of individual preferences, and the extent to which they are related to demographic and background characteristics, to be tested. Copyright © 2017 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  20. Magnetic signature of overbank sediment in industry impacted floodplains identified by data mining methods

    NASA Astrophysics Data System (ADS)

    Chudaničová, Monika; Hutchinson, Simon M.

    2016-11-01

    Our study attempts to identify a characteristic magnetic signature of overbank sediments exhibiting anthropogenically induced magnetic enhancement and thereby to distinguish them from unenhanced sediments with weak magnetic background values, using a novel approach based on data mining methods, thus providing a mean of rapid pollution determination. Data were obtained from 539 bulk samples from vertical profiles through overbank sediment, collected on seven rivers in the eastern Czech Republic and three rivers in northwest England. k-Means clustering and hierarchical clustering methods, paired group (UPGMA) and Ward's method, were used to divide the samples to natural groups according to their attributes. Interparametric ratios: SIRM/χ; SIRM/ARM; and S-0.1T were chosen as attributes for analyses making the resultant model more widely applicable as magnetic concentration values can differ by two orders. Division into three clusters appeared to be optimal and corresponded to inherent clusters in the data scatter. Clustering managed to separate samples with relatively weak anthropogenically induced enhancement, relatively strong anthropogenically induced enhancement and samples lacking enhancement. To describe the clusters explicitly and thus obtain a discrete magnetic signature, classification rules (JRip method) and decision trees (J4.8 and Simple Cart methods) were used. Samples lacking anthropogenic enhancement typically exhibited an S-0.1T < c. 0.5, SIRM/ARM < c. 150 and SIRM/χ < c. 6000 A m-1. Samples with magnetic enhancement all exhibited an S-0.1T > 0.5. Samples with relatively stronger anthropogenic enhancement were unequivocally distinguished from the samples with weaker enhancement by an SIRM/ARM > c. 150. Samples with SIRM/ARM in a range c. 126-150 were classified as relatively strongly enhanced when their SIRM/χ > 18 000 A m-1 and relatively less enhanced when their SIRM/χ < 18 000 A m-1. An additional rule was arbitrary added to exclude samples with χfd% > 6 per cent from anthropogenically enhanced clusters as samples with natural magnetic enhancement. The characteristics of the clusters resulted mainly from the relationship between SIRM/ARM and the S-0.1T, and SIRM/χ and the S-0.1T. Both SIRM/ARM and SIRM/χ increase with increasing S-0.1T values reflecting a greater level of anthropogenic magnetic particles. Overall, data mining methods demonstrated good potential for utilization in environmental magnetism.

  1. The phonotactic influence on the perception of a consonant cluster /pt/ by native English and native Polish listeners: A behavioral and event related potential (ERP) study

    PubMed Central

    Wagner, Monica; Shafer, Valerie L.; Martin, Brett; Steinschneider, Mitchell

    2013-01-01

    The effect of exposure to the contextual features of the /pt/ cluster was investigated in native-English and native-Polish listeners using behavioral and event-related potential (ERP) methodology. Both groups experience the /pt/ cluster in their languages, but only the Polish group experiences the cluster in the context of word onset examined in the current experiment. The /st/ cluster was used as an experimental control. ERPs were recorded while participants identified the number of syllables in the second word of nonsense word pairs. The results found that only Polish listeners accurately perceived the /pt/ cluster and perception was reflected within a late positive component of the ERP waveform. Furthermore, evidence of discrimination of /pt/ and /pǝt/ onsets in the neural signal was found even for non-native listeners who could not perceive the difference. These findings suggest that exposure to phoneme sequences in highly specific contexts may be necessary for accurate perception. PMID:22867752

  2. Support Vector Data Descriptions and k-Means Clustering: One Class?

    PubMed

    Gornitz, Nico; Lima, Luiz Alberto; Muller, Klaus-Robert; Kloft, Marius; Nakajima, Shinichi

    2017-09-27

    We present ClusterSVDD, a methodology that unifies support vector data descriptions (SVDDs) and k-means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to k-means. In particular, our approach leads to a new interpretation of k-means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD.

  3. Using cluster analysis for medical resource decision making.

    PubMed

    Dilts, D; Khamalah, J; Plotkin, A

    1995-01-01

    Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.

  4. No Galaxy Left Behind: Accurate Measurements with the Faintest Objects in the Dark Energy Survey

    DOE PAGES

    Suchyta, E.

    2016-01-27

    Accurate statistical measurement with large imaging surveys has traditionally required throwing away a sizable fraction of the data. This is because most measurements have have relied on selecting nearly complete samples, where variations in the composition of the galaxy population with seeing, depth, or other survey characteristics are small. We introduce a new measurement method that aims to minimize this wastage, allowing precision measurement for any class of stars or galaxies detectable in an imaging survey. We have implemented our proposal in Balrog, a software package which embeds fake objects in real imaging in order to accurately characterize measurement biases.more » We also demonstrate this technique with an angular clustering measurement using Dark Energy Survey (DES) data. We first show that recovery of our injected galaxies depends on a wide variety of survey characteristics in the same way as the real data. We then construct a flux-limited sample of the faintest galaxies in DES, chosen specifically for their sensitivity to depth and seeing variations. Using the synthetic galaxies as randoms in the standard LandySzalay correlation function estimator suppresses the effects of variable survey selection by at least two orders of magnitude. Now our measured angular clustering is found to be in excellent agreement with that of a matched sample drawn from much deeper, higherresolution space-based COSMOS imaging; over angular scales of 0.004° < θ < 0.2 ° , we find a best-fit scaling amplitude between the DES and COSMOS measurements of 1.00 ± 0.09. We expect this methodology to be broadly useful for extending the statistical reach of measurements in a wide variety of coming imaging surveys.« less

  5. Assessing map accuracy in a remotely sensed, ecoregion-scale cover map

    USGS Publications Warehouse

    Edwards, T.C.; Moisen, Gretchen G.; Cutler, D.R.

    1998-01-01

    Landscape- and ecoregion-based conservation efforts increasingly use a spatial component to organize data for analysis and interpretation. A challenge particular to remotely sensed cover maps generated from these efforts is how best to assess the accuracy of the cover maps, especially when they can exceed 1000 s/km2 in size. Here we develop and describe a methodological approach for assessing the accuracy of large-area cover maps, using as a test case the 21.9 million ha cover map developed for Utah Gap Analysis. As part of our design process, we first reviewed the effect of intracluster correlation and a simple cost function on the relative efficiency of cluster sample designs to simple random designs. Our design ultimately combined clustered and subsampled field data stratified by ecological modeling unit and accessibility (hereafter a mixed design). We next outline estimation formulas for simple map accuracy measures under our mixed design and report results for eight major cover types and the three ecoregions mapped as part of the Utah Gap Analysis. Overall accuracy of the map was 83.2% (SE=1.4). Within ecoregions, accuracy ranged from 78.9% to 85.0%. Accuracy by cover type varied, ranging from a low of 50.4% for barren to a high of 90.6% for man modified. In addition, we examined gains in efficiency of our mixed design compared with a simple random sample approach. In regard to precision, our mixed design was more precise than a simple random design, given fixed sample costs. We close with a discussion of the logistical constraints facing attempts to assess the accuracy of large-area, remotely sensed cover maps.

  6. Q-Sample Construction: A Critical Step for a Q-Methodological Study.

    PubMed

    Paige, Jane B; Morin, Karen H

    2016-01-01

    Q-sample construction is a critical step in Q-methodological studies. Prior to conducting Q-studies, researchers start with a population of opinion statements (concourse) on a particular topic of interest from which a sample is drawn. These sampled statements are known as the Q-sample. Although literature exists on methodological processes to conduct Q-methodological studies, limited guidance exists on the practical steps to reduce the population of statements to a Q-sample. A case exemplar illustrates the steps to construct a Q-sample in preparation for a study that explored perspectives nurse educators and nursing students hold about simulation design. Experts in simulation and Q-methodology evaluated the Q-sample for readability, clarity, and for representativeness of opinions contained within the concourse. The Q-sample was piloted and feedback resulted in statement refinement. Researchers especially those undertaking Q-method studies for the first time may benefit from the practical considerations to construct a Q-sample offered in this article. © The Author(s) 2014.

  7. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    PubMed

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  8. Self-similarity of temperature profiles in distant galaxy clusters: the quest for a universal law

    NASA Astrophysics Data System (ADS)

    Baldi, A.; Ettori, S.; Molendi, S.; Gastaldello, F.

    2012-09-01

    Context. We present the XMM-Newton temperature profiles of 12 bright (LX > 4 × 1044 erg s-1) clusters of galaxies at 0.4 < z < 0.9, having an average temperature in the range 5 ≲ kT ≲ 11 keV. Aims: The main goal of this paper is to study for the first time the temperature profiles of a sample of high-redshift clusters, to investigate their properties, and to define a universal law to describe the temperature radial profiles in galaxy clusters as a function of both cosmic time and their state of relaxation. Methods: We performed a spatially resolved spectral analysis, using Cash statistics, to measure the temperature in the intracluster medium at different radii. Results: We extracted temperature profiles for the clusters in our sample, finding that all profiles are declining toward larger radii. The normalized temperature profiles (normalized by the mean temperature T500) are found to be generally self-similar. The sample was subdivided into five cool-core (CC) and seven non cool-core (NCC) clusters by introducing a pseudo-entropy ratio σ = (TIN/TOUT) × (EMIN/EMOUT)-1/3 and defining the objects with σ < 0.6 as CC clusters and those with σ ≥ 0.6 as NCC clusters. The profiles of CC and NCC clusters differ mainly in the central regions, with the latter exhibiting a slightly flatter central profile. A significant dependence of the temperature profiles on the pseudo-entropy ratio σ is detected by fitting a function of r and σ, showing an indication that the outer part of the profiles becomes steeper for higher values of σ (i.e. transitioning toward the NCC clusters). No significant evidence of redshift evolution could be found within the redshift range sampled by our clusters (0.4 < z < 0.9). A comparison of our high-z sample with intermediate clusters at 0.1 < z < 0.3 showed how the CC and NCC cluster temperature profiles have experienced some sort of evolution. This can happen because higher z clusters are at a less advanced stage of their formation and did not have enough time to create a relaxed structure, which is characterized by a central temperature dip in CC clusters and by flatter profiles in NCC clusters. Conclusions: This is the first time that a systematic study of the temperature profiles of galaxy clusters at z > 0.4 has been attempted. We were able to define the closest possible relation to a universal law for the temperature profiles of galaxy clusters at 0.1 < z < 0.9, showing a dependence on both the relaxation state of the clusters and the redshift. Appendix A is only available in electronic form at http://www.aanda.org

  9. U.S. consumer demand for restaurant calorie information: targeting demographic and behavioral segments in labeling initiatives.

    PubMed

    Kolodinsky, Jane; Reynolds, Travis William; Cannella, Mark; Timmons, David; Bromberg, Daniel

    2009-01-01

    To identify different segments of U.S. consumers based on food choices, exercise patterns, and desire for restaurant calorie labeling. Using a stratified (by region) random sample of the U.S. population, trained interviewers collected data for this cross-sectional study through telephone surveys. Center for Rural Studies U.S. national health survey. The final sample included 580 responses (22% response rate); data were weighted to be representative of age and gender characteristics of the U.S. population. Self-reported behaviors related to food choices, exercise patterns, desire for calorie information in restaurants, and sample demographics. Clusters were identified using Schwartz Bayesian criteria. Impacts of demographic characteristics on cluster membership were analyzed using bivariate tests of association and multinomial logit regression. Cluster analysis revealed three clusters based on respondents' food choices, activity levels, and desire for restaurant labeling. Two clusters, comprising three quarters of the sample, desired calorie labeling in restaurants. The remaining cluster opposed restaurant labeling. Demographic variables significantly predicting cluster membership included region of residence (p < .10), income (p < .05), gender (p < .01), and age (p < .10). Though limited by a low response and potential self-reporting bias in the phone survey, this study suggests that several groups are likely to benefit from restaurant calorie labeling. Specific demographic clusters could be targeted through labeling initiatives.

  10. Testing the accuracy of clustering redshifts with simulations

    NASA Astrophysics Data System (ADS)

    Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

    2018-03-01

    We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.

  11. Social Network Clustering and the Spread of HIV/AIDS Among Persons Who Inject Drugs in 2 Cities in the Philippines.

    PubMed

    Verdery, Ashton M; Siripong, Nalyn; Pence, Brian W

    2017-09-01

    The Philippines has seen rapid increases in HIV prevalence among people who inject drugs. We study 2 neighboring cities where a linked HIV epidemic differed in timing of onset and levels of prevalence. In Cebu, prevalence rose rapidly from below 1% to 54% between 2009 and 2011 and remained high through 2013. In nearby Mandaue, HIV remained below 4% through 2011 then rose rapidly to 38% by 2013. We hypothesize that infection prevalence differences in these cities may owe to aspects of social network structure, specifically levels of network clustering. Building on previous research, we hypothesize that higher levels of network clustering are associated with greater epidemic potential. Data were collected with respondent-driven sampling among men who inject drugs in Cebu and Mandaue in 2013. We first examine sample composition using estimators for population means. We then apply new estimators of network clustering in respondent-driven sampling data to examine associations with HIV prevalence. Samples in both cities were comparable in composition by age, education, and injection locations. Dyadic needle-sharing levels were also similar between the 2 cities, but network clustering in the needle-sharing network differed dramatically. We found higher clustering in Cebu than Mandaue, consistent with expectations that higher clustering is associated with faster epidemic spread. This article is the first to apply estimators of network clustering to empirical respondent-driven samples, and it offers suggestive evidence that researchers should pay greater attention to network structure's role in HIV transmission dynamics.

  12. A clustering algorithm for sample data based on environmental pollution characteristics

    NASA Astrophysics Data System (ADS)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  13. Cosmology with XMM galaxy clusters: the X-CLASS/GROND catalogue and photometric redshifts

    NASA Astrophysics Data System (ADS)

    Ridl, J.; Clerc, N.; Sadibekova, T.; Faccioli, L.; Pacaud, F.; Greiner, J.; Krühler, T.; Rau, A.; Salvato, M.; Menzel, M.-L.; Steinle, H.; Wiseman, P.; Nandra, K.; Sanders, J.

    2017-06-01

    The XMM Cluster Archive Super Survey (X-CLASS) is a serendipitously detected X-ray-selected sample of 845 galaxy clusters based on 2774 XMM archival observations and covering an approximately 90 deg2 spread across the high-Galactic latitude (|b| > 20°) sky. The primary goal of this survey is to produce a well-selected sample of galaxy clusters on which cosmological analyses can be performed. This paper presents the photometric redshift follow-up of a high signal-to-noise ratio subset of 265 of these clusters with declination δ < +20° with Gamma-Ray Burst Optical and Near-Infrared Detector (GROND), a 7-channel (grizJHK) simultaneous imager on the MPG 2.2-m telescope at the ESO La Silla Observatory. We use a newly developed technique based on the red sequence colour-redshift relation, enhanced with information coming from the X-ray detection to provide photometric redshifts for this sample. We determine photometric redshifts for 232 clusters, finding a median redshift of z = 0.39 with an accuracy of Δz = 0.02(1 + z) when compared to a sample of 76 spectroscopically confirmed clusters. We also compute X-ray luminosities for the entire sample and find a median bolometric luminosity of 7.2 × 1043 erg s-1 and a median temperature of 2.9 keV. We compare our results to those of the XMM-XCS and XMM-XXL surveys, finding good agreement in both samples. The X-CLASS catalogue is available online at http://xmm-lss.in2p3.fr:8080/l4sdb/.

  14. Changes in cluster magnetism and suppression of local superconductivity in amorphous FeCrB alloy irradiated by Ar+ ions

    NASA Astrophysics Data System (ADS)

    Okunev, V. D.; Samoilenko, Z. A.; Szymczak, H.; Szewczyk, A.; Szymczak, R.; Lewandowski, S. J.; Aleshkevych, P.; Malinowski, A.; Gierłowski, P.; Więckowski, J.; Wolny-Marszałek, M.; Jeżabek, M.; Varyukhin, V. N.; Antoshina, I. A.

    2016-02-01

    We show that сluster magnetism in ferromagnetic amorphous Fe67Cr18B15 alloy is related to the presence of large, D=150-250 Å, α-(Fe Cr) clusters responsible for basic changes in cluster magnetism, small, D=30-100 Å, α-(Fe, Cr) and Fe3B clusters and subcluster atomic α-(Fe, Cr, B) groupings, D=10-20 Å, in disordered intercluster medium. For initial sample and irradiated one (Φ=1.5×1018 ions/cm2) superconductivity exists in the cluster shells of metallic α-(Fe, Cr) phase where ferromagnetism of iron is counterbalanced by antiferromagnetism of chromium. At Φ=3×1018 ions/cm2, the internal stresses intensify and the process of iron and chromium phase separation, favorable for mesoscopic superconductivity, changes for inverse one promoting more homogeneous distribution of iron and chromium in the clusters as well as gigantic (twice as much) increase in density of the samples. As a result, in the cluster shells ferromagnetism is restored leading to the increase in magnetization of the sample and suppression of local superconductivity. For initial samples, the temperature dependence of resistivity ρ(T) T2 is determined by the electron scattering on quantum defects. In strongly inhomogeneous samples, after irradiation by fluence Φ=1.5×1018 ions/cm2, the transition to a dependence ρ(T) T1/2 is caused by the effects of weak localization. In more homogeneous samples, at Φ=3×1018 ions/cm2, a return to the dependence ρ(T) T2 is observed.

  15. Mining the modular structure of protein interaction networks.

    PubMed

    Berenstein, Ariel José; Piñero, Janet; Furlong, Laura Inés; Chernomoretz, Ariel

    2015-01-01

    Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis. We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns. As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.

  16. Density-based clustering of small peptide conformations sampled from a molecular dynamics simulation.

    PubMed

    Kim, Minkyoung; Choi, Seung-Hoon; Kim, Junhyoung; Choi, Kihang; Shin, Jae-Min; Kang, Sang-Kee; Choi, Yun-Jaie; Jung, Dong Hyun

    2009-11-01

    This study describes the application of a density-based algorithm to clustering small peptide conformations after a molecular dynamics simulation. We propose a clustering method for small peptide conformations that enables adjacent clusters to be separated more clearly on the basis of neighbor density. Neighbor density means the number of neighboring conformations, so if a conformation has too few neighboring conformations, then it is considered as noise or an outlier and is excluded from the list of cluster members. With this approach, we can easily identify clusters in which the members are densely crowded in the conformational space, and we can safely avoid misclustering individual clusters linked by noise or outliers. Consideration of neighbor density significantly improves the efficiency of clustering of small peptide conformations sampled from molecular dynamics simulations and can be used for predicting peptide structures.

  17. acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

    DOE PAGES

    Lux, Markus; Kruger, Jan; Rinke, Christian; ...

    2016-12-20

    A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less

  18. acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lux, Markus; Kruger, Jan; Rinke, Christian

    A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less

  19. Wide-field lensing mass maps from Dark Energy Survey science verification data: Methodology and detailed analysis

    DOE PAGES

    Vikram, V.

    2015-07-29

    Weak gravitational lensing allows one to reconstruct the spatial distribution of the projected mass density across the sky. These “mass maps” provide a powerful tool for studying cosmology as they probe both luminous and dark matter. In this paper, we present a weak lensing mass map reconstructed from shear measurements in a 139 deg 2 area from the Dark Energy Survey (DES) science verification data. We compare the distribution of mass with that of the foreground distribution of galaxies and clusters. The overdensities in the reconstructed map correlate well with the distribution of optically detected clusters. We demonstrate that candidatemore » superclusters and voids along the line of sight can be identified, exploiting the tight scatter of the cluster photometric redshifts. We cross-correlate the mass map with a foreground magnitude-limited galaxy sample from the same data. Our measurement gives results consistent with mock catalogs from N-body simulations that include the primary sources of statistical uncertainties in the galaxy, lensing, and photo-z catalogs. The statistical significance of the cross-correlation is at the 6.8σ level with 20 arcminute smoothing. We find that the contribution of systematics to the lensing mass maps is generally within measurement uncertainties. In this study, we analyze less than 3% of the final area that will be mapped by the DES; the tools and analysis techniques developed in this paper can be applied to forthcoming larger data sets from the survey.« less

  20. Joint analysis of galaxy-galaxy lensing and galaxy clustering: Methodology and forecasts for Dark Energy Survey

    DOE PAGES

    Park, Y.; Krause, E.; Dodelson, S.; ...

    2016-09-30

    The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. Our analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we studymore » how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. Finally, we conclude that DES data will provide powerful constraints on the evolution of structure growth in the universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that covered over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.« less

  1. Joint analysis of galaxy-galaxy lensing and galaxy clustering: Methodology and forecasts for Dark Energy Survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Park, Y.; Krause, E.; Dodelson, S.

    The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. Our analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we studymore » how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. Finally, we conclude that DES data will provide powerful constraints on the evolution of structure growth in the universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that covered over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.« less

  2. Two-stage sequential sampling: A neighborhood-free adaptive sampling procedure

    USGS Publications Warehouse

    Salehi, M.; Smith, D.R.

    2005-01-01

    Designing an efficient sampling scheme for a rare and clustered population is a challenging area of research. Adaptive cluster sampling, which has been shown to be viable for such a population, is based on sampling a neighborhood of units around a unit that meets a specified condition. However, the edge units produced by sampling neighborhoods have proven to limit the efficiency and applicability of adaptive cluster sampling. We propose a sampling design that is adaptive in the sense that the final sample depends on observed values, but it avoids the use of neighborhoods and the sampling of edge units. Unbiased estimators of population total and its variance are derived using Murthy's estimator. The modified two-stage sampling design is easy to implement and can be applied to a wider range of populations than adaptive cluster sampling. We evaluate the proposed sampling design by simulating sampling of two real biological populations and an artificial population for which the variable of interest took the value either 0 or 1 (e.g., indicating presence and absence of a rare event). We show that the proposed sampling design is more efficient than conventional sampling in nearly all cases. The approach used to derive estimators (Murthy's estimator) opens the door for unbiased estimators to be found for similar sequential sampling designs. ?? 2005 American Statistical Association and the International Biometric Society.

  3. Accounting for twin births in sample size calculations for randomised trials.

    PubMed

    Yelland, Lisa N; Sullivan, Thomas R; Collins, Carmel T; Price, David J; McPhee, Andrew J; Lee, Katherine J

    2018-05-04

    Including twins in randomised trials leads to non-independence or clustering in the data. Clustering has important implications for sample size calculations, yet few trials take this into account. Estimates of the intracluster correlation coefficient (ICC), or the correlation between outcomes of twins, are needed to assist with sample size planning. Our aims were to provide ICC estimates for infant outcomes, describe the information that must be specified in order to account for clustering due to twins in sample size calculations, and develop a simple tool for performing sample size calculations for trials including twins. ICCs were estimated for infant outcomes collected in four randomised trials that included twins. The information required to account for clustering due to twins in sample size calculations is described. A tool that calculates the sample size based on this information was developed in Microsoft Excel and in R as a Shiny web app. ICC estimates ranged between -0.12, indicating a weak negative relationship, and 0.98, indicating a strong positive relationship between outcomes of twins. Example calculations illustrate how the ICC estimates and sample size calculator can be used to determine the target sample size for trials including twins. Clustering among outcomes measured on twins should be taken into account in sample size calculations to obtain the desired power. Our ICC estimates and sample size calculator will be useful for designing future trials that include twins. Publication of additional ICCs is needed to further assist with sample size planning for future trials. © 2018 John Wiley & Sons Ltd.

  4. Population Structure With Localized Haplotype Clusters

    PubMed Central

    Browning, Sharon R.; Weir, Bruce S.

    2010-01-01

    We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data. PMID:20457877

  5. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials.

    PubMed

    Kasza, J; Hemming, K; Hooper, R; Matthews, Jns; Forbes, A B

    2017-01-01

    Stepped wedge and cluster randomised crossover trials are examples of cluster randomised designs conducted over multiple time periods that are being used with increasing frequency in health research. Recent systematic reviews of both of these designs indicate that the within-cluster correlation is typically taken account of in the analysis of data using a random intercept mixed model, implying a constant correlation between any two individuals in the same cluster no matter how far apart in time they are measured: within-period and between-period intra-cluster correlations are assumed to be identical. Recently proposed extensions allow the within- and between-period intra-cluster correlations to differ, although these methods require that all between-period intra-cluster correlations are identical, which may not be appropriate in all situations. Motivated by a proposed intensive care cluster randomised trial, we propose an alternative correlation structure for repeated cross-sectional multiple-period cluster randomised trials in which the between-period intra-cluster correlation is allowed to decay depending on the distance between measurements. We present results for the variance of treatment effect estimators for varying amounts of decay, investigating the consequences of the variation in decay on sample size planning for stepped wedge, cluster crossover and multiple-period parallel-arm cluster randomised trials. We also investigate the impact of assuming constant between-period intra-cluster correlations instead of decaying between-period intra-cluster correlations. Our results indicate that in certain design configurations, including the one corresponding to the proposed trial, a correlation decay can have an important impact on variances of treatment effect estimators, and hence on sample size and power. An R Shiny app allows readers to interactively explore the impact of correlation decay.

  6. Effect of W self-implantation and He plasma exposure on early-stage defect and bubble formation in tungsten

    NASA Astrophysics Data System (ADS)

    Thompson, M.; Drummond, D.; Sullivan, J.; Elliman, R.; Kluth, P.; Kirby, N.; Riley, D.; Corr, C. S.

    2018-06-01

    To determine the effect of pre-existing defects on helium-vacancy cluster nucleation and growth, tungsten samples were self-implanted with 1 MeV tungsten ions at varying fluences to induce radiation damage, then subsequently exposed to helium plasma in the MAGPIE linear plasma device. Positron annihilation lifetime spectroscopy was performed both immediately after self-implantation, and again after plasma exposure. After self-implantation vacancies clusters were not observed near the sample surface (<30 nm). At greater depths (30–150 nm) vacancy clusters formed, and were found to increase in size with increasing W-ion fluence. After helium plasma exposure in the MAGPIE linear plasma device at ~300 K with a fluence of 1023 He-m‑2, deep (30–150 nm) vacancy clusters showed similar positron lifetimes, while shallow (<30 nm) clusters were not observed. The intensity of positron lifetime signals fell for most samples after plasma exposure, indicating that defects were filling with helium. The absence of shallow clusters indicates that helium requires pre-existing defects in order to drive vacancy cluster growth at 300 K. Further samples that had not been pre-damaged with W-ions were also exposed to helium plasma in MAGPIE across fluences from 1  ×  1022 to 1.2  ×  1024 He-m‑2. Samples exposed to fluences up to 1  ×  1023 He-m‑2 showed no signs of damage. Fluences of 5  ×  1023 He-m‑2 and higher showed significant helium-cluster formation within the first 30 nm, with positron lifetimes in the vicinity 0.5–0.6 ns. The sample temperature was significantly higher for these higher fluence exposures (~400 K) due to plasma heating. This higher temperature likely enhanced bubble formation by significantly increasing the rate interstitial helium clusters generate vacancies, which is we suspect is the rate-limiting step for helium-vacancy cluster/bubble nucleation in the absence of pre-existing defects.

  7. Formation of metallic clusters in oxide insulators by means of ion beam mixing

    NASA Astrophysics Data System (ADS)

    Talut, G.; Potzger, K.; Mücklich, A.; Zhou, Shengqiang

    2008-04-01

    The intermixing and near-interface cluster formation of Pt and FePt thin films deposited on different oxide surfaces by means of Pt+ ion irradiation and subsequent annealing was investigated. Irradiated as well as postannealed samples were investigated using high resolution transmission electron microscopy. In MgO and Y :ZrO2 covered with Pt, crystalline clusters with mean sizes of 2 and 3.5nm were found after the Pt+ irradiations with 8×1015 and 2×1016cm-2 and subsequent annealing, respectively. In MgO samples covered with FePt, clusters with mean sizes of 1 and 2nm were found after the Pt+ irradiations with 8×1015 and 2×1016cm-2 and subsequent annealing, respectively. In Y :ZrO2 samples covered with FePt, clusters up to 5nm in size were found after the Pt+ irradiation with 2×1016cm-2 and subsequent annealing. In LaAlO3 the irradiation was accompanied by a full amorphization of the host matrix and appearance of embedded clusters of different sizes. The determination of the lattice constant and thus the kind of the clusters in samples covered by FePt was hindered due to strong deviation of the electron beam by the ferromagnetic FePt.

  8. the-wizz: clustering redshift estimation for everyone

    NASA Astrophysics Data System (ADS)

    Morrison, C. B.; Hildebrandt, H.; Schmidt, S. J.; Baldry, I. K.; Bilicki, M.; Choi, A.; Erben, T.; Schneider, P.

    2017-05-01

    We present the-wizz, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of the-wizz is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an 'expert'. It allows the end user of a given survey to select any subsample of photometric galaxies with unknown redshifts, match this sample's catalogue indices into a value-added data file and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly survey and the Sloan Digital Sky Survey. The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. the-wizz can be downloaded at http://github.com/morriscb/The-wiZZ/.

  9. Rationale, design and methodology of a trial evaluating three strategies designed to improve sedation quality in intensive care units (DESIST study).

    PubMed

    Walsh, Timothy S; Kydonaki, Kalliopi; Antonelli, Jean; Stephen, Jacqueline; Lee, Robert J; Everingham, Kirsty; Hanley, Janet; Uutelo, Kimmo; Peltola, Petra; Weir, Christopher J

    2016-03-04

    To describe the rationale, design and methodology for a trial of three novel interventions developed to improve sedation-analgesia quality in adult intensive care units (ICUs). 8 clusters, each a Scottish ICU. All mechanically ventilated sedated patients were potentially eligible for inclusion in data analysis. Cluster randomised design in 8 ICUs, with ICUs randomised after 45 weeks baseline data collection to implement one of four intervention combinations: a web-based educational programme (2 ICUs); education plus regular sedation quality feedback using process control charts (2 ICUs); education plus a novel sedation monitoring technology (2 ICUs); or all three interventions. ICUs measured sedation-analgesia quality, relevant drug use and clinical outcomes, during a 45-week preintervention and 45-week postintervention period separated by an 8-week implementation period. The intended sample size was >100 patients per site per study period. The primary outcome was the proportion of 12 h care periods with optimum sedation-analgesia, defined as the absence of agitation, unnecessary deep sedation, poor relaxation and poor ventilator synchronisation. Secondary outcomes were proportions of care periods with each of these four components of optimum sedation and rates of sedation-related adverse events. Sedative and analgesic drug use, and ICU and hospital outcomes were also measured. Multilevel generalised linear regression mixed models will explore the effects of each intervention taking clustering into account, and adjusting for age, gender and APACHE II score. Sedation-analgesia quality outcomes will be explored at ICU level and individual patient level. A process evaluation using mixed methods including quantitative description of intervention implementation, focus groups and direct observation will provide explanatory information regarding any effects observed. The DESIST study uses a novel design to provide system-level evaluation of three contrasting complex interventions on sedation-analgesia quality. Recruitment is complete and analysis ongoing. NCT01634451. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  10. Interactive Inverse Groundwater Modeling - Addressing User Fatigue

    NASA Astrophysics Data System (ADS)

    Singh, A.; Minsker, B. S.

    2006-12-01

    This paper builds on ongoing research on developing an interactive and multi-objective framework to solve the groundwater inverse problem. In this work we solve the classic groundwater inverse problem of estimating a spatially continuous conductivity field, given field measurements of hydraulic heads. The proposed framework is based on an interactive multi-objective genetic algorithm (IMOGA) that not only considers quantitative measures such as calibration error and degree of regularization, but also takes into account expert knowledge about the structure of the underlying conductivity field expressed as subjective rankings of potential conductivity fields by the expert. The IMOGA converges to the optimal Pareto front representing the best trade- off among the qualitative as well as quantitative objectives. However, since the IMOGA is a population-based iterative search it requires the user to evaluate hundreds of solutions. This leads to the problem of 'user fatigue'. We propose a two step methodology to combat user fatigue in such interactive systems. The first step is choosing only a few highly representative solutions to be shown to the expert for ranking. Spatial clustering is used to group the search space based on the similarity of the conductivity fields. Sampling is then carried out from different clusters to improve the diversity of solutions shown to the user. Once the expert has ranked representative solutions from each cluster a machine learning model is used to 'learn user preference' and extrapolate these for the solutions not ranked by the expert. We investigate different machine learning models such as Decision Trees, Bayesian learning model, and instance based weighting to model user preference. In addition, we also investigate ways to improve the performance of these models by providing information about the spatial structure of the conductivity fields (which is what the expert bases his or her rank on). Results are shown for each of these machine learning models and the advantages and disadvantages for each approach are discussed. These results indicate that using the proposed two-step methodology leads to significant reduction in user-fatigue without deteriorating the solution quality of the IMOGA.

  11. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    PubMed

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  12. The properties of the disk system of globular clusters

    NASA Technical Reports Server (NTRS)

    Armandroff, Taft E.

    1989-01-01

    A large refined data sample is used to study the properties and origin of the disk system of globular clusters. A scale height for the disk cluster system of 800-1500 pc is found which is consistent with scale-height determinations for samples of field stars identified with the Galactic thick disk. A rotational velocity of 193 + or - 29 km/s and a line-of-sight velocity dispersion of 59 + or - 14 km/s have been found for the metal-rich clusters.

  13. The X-ray luminosity functions of Abell clusters from the Einstein Cluster Survey

    NASA Technical Reports Server (NTRS)

    Burg, R.; Giacconi, R.; Forman, W.; Jones, C.

    1994-01-01

    We have derived the present epoch X-ray luminosity function of northern Abell clusters using luminosities from the Einstein Cluster Survey. The sample is sufficiently large that we can determine the luminosity function for each richness class separately with sufficient precision to study and compare the different luminosity functions. We find that, within each richness class, the range of X-ray luminosity is quite large and spans nearly a factor of 25. Characterizing the luminosity function for each richness class with a Schechter function, we find that the characteristic X-ray luminosity, L(sub *), scales with richness class as (L(sub *) varies as N(sub*)(exp gamma), where N(sub *) is the corrected, mean number of galaxies in a richness class, and the best-fitting exponent is gamma = 1.3 +/- 0.4. Finally, our analysis suggests that there is a lower limit to the X-ray luminosity of clusters which is determined by the integrated emission of the cluster member galaxies, and this also scales with richness class. The present sample forms a baseline for testing cosmological evolution of Abell-like clusters when an appropriate high-redshift cluster sample becomes available.

  14. An Archival Search For Young Globular Clusters in Galaxies

    NASA Astrophysics Data System (ADS)

    Whitmore, Brad

    1995-07-01

    One of the most intriguing results from HST has been the discovery of ultraluminous star clusters in interacting and merging galaxies. These clusters have the luminosities, colors, and sizes that would be expected of young globular clusters produced by the interaction. We propose to use the data in the HST Archive to determine how prevalent this phenomena is, and to determine whether similar clusters are produced in other environments. Three samples will be extracted and studied in a systematic and consistent manner: 1} interacting and merging galaxies, 2} starburst galaxies, 3} a control sample of ``normal'' galaxies. A preliminary search of the archives shows that there are at least 20 galaxies in each of these samples, and the number will grow by about 50 observations become available. The data will be used to determine the luminosity function, color histogram , spatial distribution, and structural properties of the clusters using the same techniques employed in our study of NGC 7252 {``Atoms -for-Peace'' galaxy} and NGC 4038/4039 {``The Antennae''}. Our ultimate goals are: 1} to understand how globular clusters form, and 2} to use the clusters as evolutionary tracers to unravel the histories of interacting galaxies.

  15. Herschel And Alma Observations Of The Ism In Massive High-Redshift Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Wu, John F.; Aguirre, Paula; Baker, Andrew J.; Devlin, Mark J.; Hilton, Matt; Hughes, John P.; Infante, Leopoldo; Lindner, Robert R.; Sifón, Cristóbal

    2017-06-01

    The Sunyaev-Zel'dovich effect (SZE) can be used to select samples of galaxy clusters that are essentially mass-limited out to arbitrarily high redshifts. I will present results from an investigation of the star formation properties of galaxies in four massive clusters, extending to z 1, which were selected on the basis of their SZE decrements in the Atacama Cosmology Telescope (ACT) survey. All four clusters have been imaged with Herschel/PACS (tracing star formation rate) and two with ALMA (tracing dust and cold gas mass); newly discovered ALMA CO(4-3) and [CI] line detections expand an already large sample of spectroscopically confirmed cluster members. Star formation rate appears to anti-correlate with environmental density, but this trend vanishes after controlling for stellar mass. Elevated star formation and higher CO excitation are seen in "El Gordo," a violent cluster merger, relative to a virialized cluster at a similar high (z 1) redshift. Also exploiting ATCA 2.1 GHz observations to identify radio-loud active galactic nuclei (AGN) in our sample, I will use these data to develop a coherent picture of how environment influences galaxies' ISM properties and evolution in the most massive clusters at early cosmic times.

  16. Weak-lensing mass calibration of the Atacama Cosmology Telescope equatorial Sunyaev-Zeldovich cluster sample with the Canada-France-Hawaii telescope stripe 82 survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Battaglia, N.; Miyatake, H.; Hasselfield, M.

    Mass calibration uncertainty is the largest systematic effect for using clusters of galaxies to constrain cosmological parameters. We present weak lensing mass measurements from the Canada-France-Hawaii Telescope Stripe 82 Survey for galaxy clusters selected through their high signal-to-noise thermal Sunyaev-Zeldovich (tSZ) signal measured with the Atacama Cosmology Telescope (ACT). For a sample of 9 ACT clusters with a tSZ signal-to-noise greater than five the average weak lensing mass is (4.8±0.8) ×10{sup 14} M{sub ⊙}, consistent with the tSZ mass estimate of (4.70±1.0) ×10{sup 14} M{sub ⊙} which assumes a universal pressure profile for the cluster gas. Our results are consistentmore » with previous weak-lensing measurements of tSZ-detected clusters from the Planck satellite. When comparing our results, we estimate the Eddington bias correction for the sample intersection of Planck and weak-lensing clusters which was previously excluded.« less

  17. The Morphologies and Alignments of Gas, Mass, and the Central Galaxies of CLASH Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Donahue, Megan; Ettori, Stefano; Rasia, Elena; Sayers, Jack; Zitrin, Adi; Meneghetti, Massimo; Voit, G. Mark; Golwala, Sunil; Czakon, Nicole; Yepes, Gustavo; Baldi, Alessandro; Koekemoer, Anton; Postman, Marc

    2016-03-01

    Morphology is often used to infer the state of relaxation of galaxy clusters. The regularity, symmetry, and degree to which a cluster is centrally concentrated inform quantitative measures of cluster morphology. The Cluster Lensing and Supernova survey with Hubble Space Telescope (CLASH) used weak and strong lensing to measure the distribution of matter within a sample of 25 clusters, 20 of which were deemed to be “relaxed” based on their X-ray morphology and alignment of the X-ray emission with the Brightest Cluster Galaxy. Toward a quantitative characterization of this important sample of clusters, we present uniformly estimated X-ray morphological statistics for all 25 CLASH clusters. We compare X-ray morphologies of CLASH clusters with those identically measured for a large sample of simulated clusters from the MUSIC-2 simulations, selected by mass. We confirm a threshold in X-ray surface brightness concentration of C ≳ 0.4 for cool-core clusters, where C is the ratio of X-ray emission inside 100 h70-1 kpc compared to inside 500 {h}70-1 kpc. We report and compare morphologies of these clusters inferred from Sunyaev-Zeldovich Effect (SZE) maps of the hot gas and in from projected mass maps based on strong and weak lensing. We find a strong agreement in alignments of the orientation of major axes for the lensing, X-ray, and SZE maps of nearly all of the CLASH clusters at radii of 500 kpc (approximately 1/2 R500 for these clusters). We also find a striking alignment of clusters shapes at the 500 kpc scale, as measured with X-ray, SZE, and lensing, with that of the near-infrared stellar light at 10 kpc scales for the 20 “relaxed” clusters. This strong alignment indicates a powerful coupling between the cluster- and galaxy-scale galaxy formation processes.

  18. Multiwavelength study of X-ray luminous clusters in the Hyper Suprime-Cam Subaru Strategic Program S16A field

    NASA Astrophysics Data System (ADS)

    Miyaoka, Keita; Okabe, Nobuhiro; Kitaguchi, Takao; Oguri, Masamune; Fukazawa, Yasushi; Mandelbaum, Rachel; Medezinski, Elinor; Babazaki, Yasunori; Nishizawa, Atsushi J.; Hamana, Takashi; Lin, Yen-Ting; Akamatsu, Hiroki; Chiu, I.-Non; Fujita, Yutaka; Ichinohe, Yuto; Komiyama, Yutaka; Sasaki, Toru; Takizawa, Motokazu; Ueda, Shutaro; Umetsu, Keiichi; Coupon, Jean; Hikage, Chiaki; Hoshino, Akio; Leauthaud, Alexie; Matsushita, Kyoko; Mitsuishi, Ikuyuki; Miyatake, Hironao; Miyazaki, Satoshi; More, Surhud; Nakazawa, Kazuhiro; Ota, Naomi; Sato, Kousuke; Spergel, David; Tamura, Takayuki; Tanaka, Masayuki; Tanaka, Manobu M.; Utsumi, Yousuke

    2018-01-01

    We present a joint X-ray, optical, and weak-lensing analysis for X-ray luminous galaxy clusters selected from the MCXC (Meta-Catalog of X-Ray Detected Clusters of Galaxies) cluster catalog in the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) survey field with S16A data. As a pilot study for a series of papers, we measure hydrostatic equilibrium (HE) masses using XMM-Newton data for four clusters in the current coverage area out of a sample of 22 MCXC clusters. We additionally analyze a non-MCXC cluster associated with one MCXC cluster. We show that HE masses for the MCXC clusters are correlated with cluster richness from the CAMIRA catalog, while that for the non-MCXC cluster deviates from the scaling relation. The mass normalization of the relationship between cluster richness and HE mass is compatible with one inferred by matching CAMIRA cluster abundance with a theoretical halo mass function. The mean gas mass fraction based on HE masses for the MCXC clusters is = 0.125 ± 0.012 at spherical overdensity Δ = 500, which is ˜80%-90% of the cosmic mean baryon fraction, Ωb/Ωm, measured by cosmic microwave background experiments. We find that the mean baryon fraction estimated from X-ray and HSC-SSP optical data is comparable to Ωb/Ωm. A weak-lensing shear catalog of background galaxies, combined with photometric redshifts, is currently available only for three clusters in our sample. Hydrostatic equilibrium masses roughly agree with weak-lensing masses, albeit with large uncertainty. This study demonstrates that further multiwavelength study for a large sample of clusters using X-ray, HSC-SSP optical, and weak-lensing data will enable us to understand cluster physics and utilize cluster-based cosmology.

  19. Atomically precise (catalytic) particles synthesized by a novel cluster deposition instrument

    DOE PAGES

    Yin, C.; Tyo, E.; Kuchta, K.; ...

    2014-05-06

    Here, we report a new high vacuum instrument which is dedicated to the preparation of well-defined clusters supported on model and technologically relevant supports for catalytic and materials investigations. The instrument is based on deposition of size selected metallic cluster ions that are produced by a high flux magnetron cluster source. Furthermore, we maximize the throughput of the apparatus by collecting and focusing ions utilizing a conical octupole ion guide and a linear ion guide. The size selection is achieved by a quadrupole mass filter. The new design of the sample holder provides for the preparation of multiple samples onmore » supports of various sizes and shapes in one session. After cluster deposition onto the support of interest, samples will be taken out of the chamber for a variety of testing and characterization.« less

  20. Mutation Clusters from Cancer Exome.

    PubMed

    Kakushadze, Zura; Yu, Willie

    2017-08-15

    We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.

  1. Mutation Clusters from Cancer Exome

    PubMed Central

    Kakushadze, Zura; Yu, Willie

    2017-01-01

    We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development. PMID:28809811

  2. BioCluster: tool for identification and clustering of Enterobacteriaceae based on biochemical data.

    PubMed

    Abdullah, Ahmed; Sabbir Alam, S M; Sultana, Munawar; Hossain, M Anwar

    2015-06-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1-47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  3. a Snapshot Survey of X-Ray Selected Central Cluster Galaxies

    NASA Astrophysics Data System (ADS)

    Edge, Alastair

    1999-07-01

    Central cluster galaxies are the most massive stellar systems known and have been used as standard candles for many decades. Only recently have central cluster galaxies been recognised to exhibit a wide variety of small scale {<100 pc} features that can only be reliably detected with HST resolution. The most intriguing of these are dust lanes which have been detected in many central cluster galaxies. Dust is not expected to survive long in the hostile cluster environment unless shielded by the ISM of a disk galaxy or very dense clouds of cold gas. WFPC2 snapshot images of a representative subset of the central cluster galaxies from an X-ray selected cluster sample would provide important constraints on the formation and evolution of dust in cluster cores that cannot be obtained from ground-based observations. In addition, these images will allow the AGN component, the frequency of multiple nuclei, and the amount of massive-star formation in central cluster galaxies to be ass es sed. The proposed HST observatio ns would also provide high-resolution images of previously unresolved gravitational arcs in the most massive clusters in our sample resulting in constraints on the shape of the gravitational potential of these systems. This project will complement our extensive multi-frequency work on this sample that includes optical spectroscopy and photometry, VLA and X-ray images for the majority of the 210 targets.

  4. Cosmological Constraints from Galaxy Clustering and the Mass-to-number Ratio of Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Tinker, Jeremy L.; Sheldon, Erin S.; Wechsler, Risa H.; Becker, Matthew R.; Rozo, Eduardo; Zu, Ying; Weinberg, David H.; Zehavi, Idit; Blanton, Michael R.; Busha, Michael T.; Koester, Benjamin P.

    2012-01-01

    We place constraints on the average density (Ω m ) and clustering amplitude (σ8) of matter using a combination of two measurements from the Sloan Digital Sky Survey: the galaxy two-point correlation function, wp (rp ), and the mass-to-galaxy-number ratio within galaxy clusters, M/N, analogous to cluster M/L ratios. Our wp (rp ) measurements are obtained from DR7 while the sample of clusters is the maxBCG sample, with cluster masses derived from weak gravitational lensing. We construct nonlinear galaxy bias models using the Halo Occupation Distribution (HOD) to fit both wp (rp ) and M/N for different cosmological parameters. HOD models that match the same two-point clustering predict different numbers of galaxies in massive halos when Ω m or σ8 is varied, thereby breaking the degeneracy between cosmology and bias. We demonstrate that this technique yields constraints that are consistent and competitive with current results from cluster abundance studies, without the use of abundance information. Using wp (rp ) and M/N alone, we find Ω0.5 m σ8 = 0.465 ± 0.026, with individual constraints of Ω m = 0.29 ± 0.03 and σ8 = 0.85 ± 0.06. Combined with current cosmic microwave background data, these constraints are Ω m = 0.290 ± 0.016 and σ8 = 0.826 ± 0.020. All errors are 1σ. The systematic uncertainties that the M/N technique are most sensitive to are the amplitude of the bias function of dark matter halos and the possibility of redshift evolution between the SDSS Main sample and the maxBCG cluster sample. Our derived constraints are insensitive to the current level of uncertainties in the halo mass function and in the mass-richness relation of clusters and its scatter, making the M/N technique complementary to cluster abundances as a method for constraining cosmology with future galaxy surveys.

  5. Finding gene clusters for a replicated time course study

    PubMed Central

    2014-01-01

    Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656

  6. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

    PubMed

    Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

    2015-02-01

    Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.

  7. Extra virgin (EV) and ordinary (ON) olive oils: distinction and detection of adulteration (EV with ON) as determined by direct infusion electrospray ionization mass spectrometry and chemometric approaches.

    PubMed

    Alves, Júnia de O; Neto, Waldomiro B; Mitsutake, Hery; Alves, Paulo S P; Augusti, Rodinei

    2010-07-15

    Extra virgin (EV), the finest and most expensive among all the olive oil grades, is often adulterated by the cheapest and lowest quality ordinary (ON) olive oil. A new methodology is described herein that provides a simple, rapid, and accurate way not only to detect such type of adulteration, but also to distinguish between these olive oil grades (EV and ON). This approach is based on the application of direct infusion electrospray ionization mass spectrometry in the positive ion mode, ESI(+)-MS, followed by the treatment of the MS data via exploratory statistical approaches, PCA (principal component analysis) and HCA (hierarchical clustering analysis). Ten distinct brands of each EV and ON olive oil, acquired at local stores, were analyzed by ESI(+)-MS and the results from HCA and PCA clearly indicated the formation of two distinct groups related to these two categories. For the adulteration study, one brand of each olive oil grade (EV and ON) was selected. The counterfeit samples (a total of 20) were then prepared by adding assorted proportions, from 1 to 20% w/w, with increments of 1% w/w, of the ON to the EV olive oil. The PCA and HCA methodologies, applied to the ESI(+)-MS data from the counterfeit (20) and authentic (10) EV samples, were able to readily detect adulteration, even at levels as low as 1% w/w. Copyright 2010 John Wiley & Sons, Ltd.

  8. Dependence of the clustering properties of galaxies on stellar velocity dispersion in the Main galaxy sample of SDSS DR10

    NASA Astrophysics Data System (ADS)

    Deng, Xin-Fa; Song, Jun; Chen, Yi-Qing; Jiang, Peng; Ding, Ying-Ping

    2014-08-01

    Using two volume-limited Main galaxy samples of the Sloan Digital Sky Survey Data Release 10 (SDSS DR10), we investigate the dependence of the clustering properties of galaxies on stellar velocity dispersion by cluster analysis. It is found that in the luminous volume-limited Main galaxy sample, except at r=1.2, richer and larger systems can be more easily formed in the large stellar velocity dispersion subsample, while in the faint volume-limited Main galaxy sample, at r≥0.9, an opposite trend is observed. According to statistical analyses of the multiplicity functions, we conclude in two volume-limited Main galaxy samples: small stellar velocity dispersion galaxies preferentially form isolated galaxies, close pairs and small group, while large stellar velocity dispersion galaxies preferentially inhabit the dense groups and clusters. However, we note the difference between two volume-limited Main galaxy samples: in the faint volume-limited Main galaxy sample, at r≥0.9, the small stellar velocity dispersion subsample has a higher proportion of galaxies in superclusters ( n≥200) than the large stellar velocity dispersion subsample.

  9. See Change: the Supernova Sample from the Supernova Cosmology Project High Redshift Cluster Supernova Survey

    NASA Astrophysics Data System (ADS)

    Hayden, Brian; Perlmutter, Saul; Boone, Kyle; Nordin, Jakob; Rubin, David; Lidman, Chris; Deustua, Susana E.; Fruchter, Andrew S.; Aldering, Greg Scott; Brodwin, Mark; Cunha, Carlos E.; Eisenhardt, Peter R.; Gonzalez, Anthony H.; Jee, James; Hildebrandt, Hendrik; Hoekstra, Henk; Santos, Joana; Stanford, S. Adam; Stern, Daniel; Fassbender, Rene; Richard, Johan; Rosati, Piero; Wechsler, Risa H.; Muzzin, Adam; Willis, Jon; Boehringer, Hans; Gladders, Michael; Goobar, Ariel; Amanullah, Rahman; Hook, Isobel; Huterer, Dragan; Huang, Xiaosheng; Kim, Alex G.; Kowalski, Marek; Linder, Eric; Pain, Reynald; Saunders, Clare; Suzuki, Nao; Barbary, Kyle H.; Rykoff, Eli S.; Meyers, Joshua; Spadafora, Anthony L.; Sofiatti, Caroline; Wilson, Gillian; Rozo, Eduardo; Hilton, Matt; Ruiz-Lapuente, Pilar; Luther, Kyle; Yen, Mike; Fagrelius, Parker; Dixon, Samantha; Williams, Steven

    2017-01-01

    The Supernova Cosmology Project has finished executing a large (174 orbits, cycles 22-23) Hubble Space Telescope program, which has measured ~30 type Ia Supernovae above z~1 in the highest-redshift, most massive galaxy clusters known to date. Our SN Ia sample closely matches our pre-survey predictions; this sample will improve the constraint by a factor of 3 on the Dark Energy equation of state above z~1, allowing an unprecedented probe of Dark Energy time variation. When combined with the improved cluster mass calibration from gravitational lensing provided by the deep WFC3-IR observations of the clusters, See Change will triple the Dark Energy Task Force Figure of Merit. With the primary observing campaign completed, we present the preliminary supernova sample and our path forward to the supernova cosmology results. We also compare the number of SNe Ia discovered in each cluster with our pre-survey expectations based on cluster mass and SFR estimates. Our extensive HST and ground-based campaign has already produced unique results; we have confirmed several of the highest redshift cluster members known to date, confirmed the redshift of one of the most massive galaxy clusters at z~1.2 expected across the entire sky, and characterized one of the most extreme starburst environments yet known in a z~1.7 cluster. We have also discovered a lensed SN Ia at z=2.22 magnified by a factor of ~2.7, which is the highest spectroscopic redshift SN Ia currently known.

  10. MVP-CA Methodology for the Expert System Advocate's Advisor (ESAA)

    DOT National Transportation Integrated Search

    1997-11-01

    The Multi-Viewpoint Clustering Analysis (MVP-CA) tool is a semi-automated tool to provide a valuable aid for comprehension, verification, validation, maintenance, integration, and evolution of complex knowledge-based software systems. In this report,...

  11. Models of Educational Attainment: A Theoretical and Methodological Critique

    ERIC Educational Resources Information Center

    Byrne, D. S.; And Others

    1973-01-01

    Uses cluster analysis techniques to show that egalitarian policies in secondary education coupled with high financial inputs have measurable payoffs in higher attainment rates, based on Max Weber's notion of power'' within a community. (Author/JM)

  12. Fuzzy Set Methods for Object Recognition in Space Applications

    NASA Technical Reports Server (NTRS)

    Keller, James M. (Editor)

    1992-01-01

    Progress on the following four tasks is described: (1) fuzzy set based decision methodologies; (2) membership calculation; (3) clustering methods (including derivation of pose estimation parameters), and (4) acquisition of images and testing of algorithms.

  13. OGLE Collection of Star Clusters. New Objects in the Outskirts of the Large Magellanic Cloud

    NASA Astrophysics Data System (ADS)

    Sitek, M.; Szymański, M. K.; Skowron, D. M.; Udalski, A.; Kostrzewa-Rutkowska, Z.; Skowron, J.; Karczmarek, P.; Cieślar, M.; Wyrzykowski, Ł.; Kozłowski, S.; Pietrukowicz, P.; Soszyński, I.; Mróz, P.; Pawlak, M.; Poleski, R.; Ulaczyk, K.

    2016-09-01

    The Magellanic System (MS), consisting of the Large Magellanic Cloud (LMC), the Small Magellanic Cloud (SMC) and the Magellanic Bridge (MBR), contains diverse sample of star clusters. Their spatial distribution, ages and chemical abundances may provide important information about the history of formation of the whole System. We use deep photometric maps derived from the images collected during the fourth phase of the Optical Gravitational Lensing Experiment (OGLE-IV) to construct the most complete catalog of star clusters in the Large Magellanic Cloud using the homogeneous photometric data. In this paper we present the collection of star clusters found in the area of about 225 square degrees in the outer regions of the LMC. Our sample contains 679 visually identified star cluster candidates, 226 of which were not listed in any of the previously published catalogs. The new clusters are mainly young small open clusters or clusters similar to associations.

  14. The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.

    ERIC Educational Resources Information Center

    Wang, Lin; Fan, Xitao

    Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…

  15. A Comparison of Single Sample and Bootstrap Methods to Assess Mediation in Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Pituch, Keenan A.; Stapleton, Laura M.; Kang, Joo Youn

    2006-01-01

    A Monte Carlo study examined the statistical performance of single sample and bootstrap methods that can be used to test and form confidence interval estimates of indirect effects in two cluster randomized experimental designs. The designs were similar in that they featured random assignment of clusters to one of two treatment conditions and…

  16. Characterization of Oxygen Defect Clusters in UO2+ x Using Neutron Scattering and PDF Analysis.

    PubMed

    Ma, Yue; Garcia, Philippe; Lechelle, Jacques; Miard, Audrey; Desgranges, Lionel; Baldinozzi, Gianguido; Simeone, David; Fischer, Henry E

    2018-06-18

    In hyper-stoichiometric uranium oxide, both neutron diffraction work and, more recently, theoretical analyses report the existence of clusters such as the 2:2:2 cluster, comprising two anion vacancies and two types of anion interstitials. However, little is known about whether there exists a region of low deviation-from-stoichiometry in which defects remain isolated, or indeed whether at high deviation-from-stoichiometry defect clusters prevail that contain more excess oxygen atoms than the di-interstitial cluster. In this study, we report pair distribution function (PDF) analyses of UO 2 and UO 2+ x ( x ≈ 0.007 and x ≈ 0.16) samples obtained from high-temperature in situ neutron scattering experiments. PDF refinement for the lower deviation from stoichiometry sample suggests the system is too dilute to differentiate between isolated defects and di-interstitial clusters. For the UO 2.16 sample, several defect structures are tested, and it is found that the data are best represented assuming the presence of center-occupied cuboctahedra.

  17. A good mass proxy for galaxy clusters with XMM-Newton

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Hai-Hui; Jia, Shu-Mei; Chen, Yong

    2013-12-01

    We use a sample of 39 galaxy clusters at redshift z < 0.1 observed by XMM-Newton to investigate the relations between X-ray observables and total mass. Based on central cooling time and central temperature drop, the clusters in this sample are divided into two groups: 25 cool core clusters and 14 non-cool core clusters, respectively. We study the scaling relations of L {sub bol}-M {sub 500}, M {sub 500}-T, M {sub 500}-M {sub g}, and M {sub 500}-Y {sub X}, and also the influences of cool core on these relations. The results show that the M {sub 500}-Y {sub X}more » relation has a slope close to the standard self-similar value, has the smallest scatter and does not vary with the cluster sample. Moreover, the M {sub 500}-Y {sub X} relation is not affected by the cool core. Thus, the parameter of Y{sub X} may be the best mass indicator.« less

  18. Euler-euler anisotropic gaussian mesoscale simulation of homogeneous cluster-induced gas-particle turbulence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kong, Bo; Fox, Rodney O.; Feng, Heng

    An Euler–Euler anisotropic Gaussian approach (EE-AG) for simulating gas–particle flows, in which particle velocities are assumed to follow a multivariate anisotropic Gaussian distribution, is used to perform mesoscale simulations of homogeneous cluster-induced turbulence (CIT). A three-dimensional Gauss–Hermite quadrature formulation is used to calculate the kinetic flux for 10 velocity moments in a finite-volume framework. The particle-phase volume-fraction and momentum equations are coupled with the Eulerian solver for the gas phase. This approach is implemented in an open-source CFD package, OpenFOAM, and detailed simulation results are compared with previous Euler–Lagrange simulations in a domain size study of CIT. Here, these resultsmore » demonstrate that the proposed EE-AG methodology is able to produce comparable results to EL simulations, and this moment-based methodology can be used to perform accurate mesoscale simulations of dilute gas–particle flows.« less

  19. Euler-euler anisotropic gaussian mesoscale simulation of homogeneous cluster-induced gas-particle turbulence

    DOE PAGES

    Kong, Bo; Fox, Rodney O.; Feng, Heng; ...

    2017-02-16

    An Euler–Euler anisotropic Gaussian approach (EE-AG) for simulating gas–particle flows, in which particle velocities are assumed to follow a multivariate anisotropic Gaussian distribution, is used to perform mesoscale simulations of homogeneous cluster-induced turbulence (CIT). A three-dimensional Gauss–Hermite quadrature formulation is used to calculate the kinetic flux for 10 velocity moments in a finite-volume framework. The particle-phase volume-fraction and momentum equations are coupled with the Eulerian solver for the gas phase. This approach is implemented in an open-source CFD package, OpenFOAM, and detailed simulation results are compared with previous Euler–Lagrange simulations in a domain size study of CIT. Here, these resultsmore » demonstrate that the proposed EE-AG methodology is able to produce comparable results to EL simulations, and this moment-based methodology can be used to perform accurate mesoscale simulations of dilute gas–particle flows.« less

  20. Tisettanta case study: the interoperation of furniture production companies

    NASA Astrophysics Data System (ADS)

    Amarilli, Fabrizio; Spreafico, Alberto

    This chapter presents the Tisettanta case study, focusing on the definition of the possible innovations that ICT technologies can bring to the Italian wood-furniture industry. This sector is characterized by industrial clusters composed mainly of a few large companies with international brand reputations and a large base of SMEs that manufacture finished products or are specialized in the production of single components/processes (such as the Brianza cluster, where Tisettanta operates). In this particular business ecosystem, ICT technologies can bring relevant support and improvements to the supply chain process, where collaborations between enterprises are put into action through the exchange of business documents such as orders, order confirmation, bills of lading, invoices, etc. The analysis methodology adopted in the Tisettanta case study refers to the TEKNE Methodology of Change (see Chapter 2), which defines a framework for supporting firms in the adoption of the Internetworked Enterprise organizational paradigm.

Top