Sample records for cluster sampling method

  1. Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys

    PubMed Central

    Hund, Lauren; Bedrick, Edward J.; Pagano, Marcello

    2015-01-01

    Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis. PMID:26125967

  2. Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys.

    PubMed

    Hund, Lauren; Bedrick, Edward J; Pagano, Marcello

    2015-01-01

    Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis.

  3. A fast learning method for large scale and multi-class samples of SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yu; Guo, Huiming

    2017-06-01

    A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.

  4. Clustering Methods with Qualitative Data: A Mixed Methods Approach for Prevention Research with Small Samples

    PubMed Central

    Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.

    2016-01-01

    Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969

  5. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    PubMed

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  6. Methods for sample size determination in cluster randomized trials

    PubMed Central

    Rutterford, Clare; Copas, Andrew; Eldridge, Sandra

    2015-01-01

    Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515

  7. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review

    PubMed Central

    Morris, Tom; Gray, Laura

    2017-01-01

    Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637

  8. Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping.

    PubMed

    Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin

    2009-02-01

    A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.

  9. An improved initialization center k-means clustering algorithm based on distance and density

    NASA Astrophysics Data System (ADS)

    Duan, Yanling; Liu, Qun; Xia, Shuyin

    2018-04-01

    Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.

  10. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    PubMed

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  11. An imbalance in cluster sizes does not lead to notable loss of power in cross-sectional, stepped-wedge cluster randomised trials with a continuous outcome.

    PubMed

    Kristunas, Caroline A; Smith, Karen L; Gray, Laura J

    2017-03-07

    The current methodology for sample size calculations for stepped-wedge cluster randomised trials (SW-CRTs) is based on the assumption of equal cluster sizes. However, as is often the case in cluster randomised trials (CRTs), the clusters in SW-CRTs are likely to vary in size, which in other designs of CRT leads to a reduction in power. The effect of an imbalance in cluster size on the power of SW-CRTs has not previously been reported, nor what an appropriate adjustment to the sample size calculation should be to allow for any imbalance. We aimed to assess the impact of an imbalance in cluster size on the power of a cross-sectional SW-CRT and recommend a method for calculating the sample size of a SW-CRT when there is an imbalance in cluster size. The effect of varying degrees of imbalance in cluster size on the power of SW-CRTs was investigated using simulations. The sample size was calculated using both the standard method and two proposed adjusted design effects (DEs), based on those suggested for CRTs with unequal cluster sizes. The data were analysed using generalised estimating equations with an exchangeable correlation matrix and robust standard errors. An imbalance in cluster size was not found to have a notable effect on the power of SW-CRTs. The two proposed adjusted DEs resulted in trials that were generally considerably over-powered. We recommend that the standard method of sample size calculation for SW-CRTs be used, provided that the assumptions of the method hold. However, it would be beneficial to investigate, through simulation, what effect the maximum likely amount of inequality in cluster sizes would be on the power of the trial and whether any inflation of the sample size would be required.

  12. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

    PubMed

    Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

    2015-02-01

    Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.

  13. Magnetic signature of overbank sediment in industry impacted floodplains identified by data mining methods

    NASA Astrophysics Data System (ADS)

    Chudaničová, Monika; Hutchinson, Simon M.

    2016-11-01

    Our study attempts to identify a characteristic magnetic signature of overbank sediments exhibiting anthropogenically induced magnetic enhancement and thereby to distinguish them from unenhanced sediments with weak magnetic background values, using a novel approach based on data mining methods, thus providing a mean of rapid pollution determination. Data were obtained from 539 bulk samples from vertical profiles through overbank sediment, collected on seven rivers in the eastern Czech Republic and three rivers in northwest England. k-Means clustering and hierarchical clustering methods, paired group (UPGMA) and Ward's method, were used to divide the samples to natural groups according to their attributes. Interparametric ratios: SIRM/χ; SIRM/ARM; and S-0.1T were chosen as attributes for analyses making the resultant model more widely applicable as magnetic concentration values can differ by two orders. Division into three clusters appeared to be optimal and corresponded to inherent clusters in the data scatter. Clustering managed to separate samples with relatively weak anthropogenically induced enhancement, relatively strong anthropogenically induced enhancement and samples lacking enhancement. To describe the clusters explicitly and thus obtain a discrete magnetic signature, classification rules (JRip method) and decision trees (J4.8 and Simple Cart methods) were used. Samples lacking anthropogenic enhancement typically exhibited an S-0.1T < c. 0.5, SIRM/ARM < c. 150 and SIRM/χ < c. 6000 A m-1. Samples with magnetic enhancement all exhibited an S-0.1T > 0.5. Samples with relatively stronger anthropogenic enhancement were unequivocally distinguished from the samples with weaker enhancement by an SIRM/ARM > c. 150. Samples with SIRM/ARM in a range c. 126-150 were classified as relatively strongly enhanced when their SIRM/χ > 18 000 A m-1 and relatively less enhanced when their SIRM/χ < 18 000 A m-1. An additional rule was arbitrary added to exclude samples with χfd% > 6 per cent from anthropogenically enhanced clusters as samples with natural magnetic enhancement. The characteristics of the clusters resulted mainly from the relationship between SIRM/ARM and the S-0.1T, and SIRM/χ and the S-0.1T. Both SIRM/ARM and SIRM/χ increase with increasing S-0.1T values reflecting a greater level of anthropogenic magnetic particles. Overall, data mining methods demonstrated good potential for utilization in environmental magnetism.

  14. The effect of clustering on lot quality assurance sampling: a probabilistic model to calculate sample sizes for quality assessments

    PubMed Central

    2013-01-01

    Background Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. Results To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations. The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. Conclusions We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs. PMID:24160725

  15. The effect of clustering on lot quality assurance sampling: a probabilistic model to calculate sample sizes for quality assessments.

    PubMed

    Hedt-Gauthier, Bethany L; Mitsunaga, Tisha; Hund, Lauren; Olives, Casey; Pagano, Marcello

    2013-10-26

    Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations.The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs.

  16. Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.

    PubMed

    You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary

    2011-02-01

    The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure of relative efficiency might be less than the measure in the literature under some conditions, underestimating the relative efficiency. The relative efficiency of unequal versus equal cluster sizes defined using the noncentrality parameter suggests a sample size approach that is a flexible alternative and a useful complement to existing methods.

  17. The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.

    ERIC Educational Resources Information Center

    Wang, Lin; Fan, Xitao

    Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…

  18. A Comparison of Single Sample and Bootstrap Methods to Assess Mediation in Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Pituch, Keenan A.; Stapleton, Laura M.; Kang, Joo Youn

    2006-01-01

    A Monte Carlo study examined the statistical performance of single sample and bootstrap methods that can be used to test and form confidence interval estimates of indirect effects in two cluster randomized experimental designs. The designs were similar in that they featured random assignment of clusters to one of two treatment conditions and…

  19. Finding gene clusters for a replicated time course study

    PubMed Central

    2014-01-01

    Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656

  20. Weak lensing magnification of SpARCS galaxy clusters

    NASA Astrophysics Data System (ADS)

    Tudorica, A.; Hildebrandt, H.; Tewes, M.; Hoekstra, H.; Morrison, C. B.; Muzzin, A.; Wilson, G.; Yee, H. K. C.; Lidman, C.; Hicks, A.; Nantais, J.; Erben, T.; van der Burg, R. F. J.; Demarco, R.

    2017-12-01

    Context. Measuring and calibrating relations between cluster observables is critical for resource-limited studies. The mass-richness relation of clusters offers an observationally inexpensive way of estimating masses. Its calibration is essential for cluster and cosmological studies, especially for high-redshift clusters. Weak gravitational lensing magnification is a promising and complementary method to shear studies, that can be applied at higher redshifts. Aims: We aim to employ the weak lensing magnification method to calibrate the mass-richness relation up to a redshift of 1.4. We used the Spitzer Adaptation of the Red-Sequence Cluster Survey (SpARCS) galaxy cluster candidates (0.2 < z < 1.4) and optical data from the Canada France Hawaii Telescope (CFHT) to test whether magnification can be effectively used to constrain the mass of high-redshift clusters. Methods: Lyman-break galaxies (LBGs) selected using the u-band dropout technique and their colours were used as a background sample of sources. LBG positions were cross-correlated with the centres of the sample of SpARCS clusters to estimate the magnification signal, which was optimally-weighted using an externally-calibrated LBG luminosity function. The signal was measured for cluster sub-samples, binned in both redshift and richness. Results: We measured the cross-correlation between the positions of galaxy cluster candidates and LBGs and detected a weak lensing magnification signal for all bins at a detection significance of 2.6-5.5σ. In particular, the significance of the measurement for clusters with z> 1.0 is 4.1σ; for the entire cluster sample we obtained an average M200 of 1.28 -0.21+0.23 × 1014 M⊙. Conclusions: Our measurements demonstrated the feasibility of using weak lensing magnification as a viable tool for determining the average halo masses for samples of high redshift galaxy clusters. The results also established the success of using galaxy over-densities to select massive clusters at z > 1. Additional studies are necessary for further modelling of the various systematic effects we discussed.

  1. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering

    PubMed Central

    Kim, Eun-Youn; Kim, Seon-Young; Ashlock, Daniel; Nam, Dougu

    2009-01-01

    Background Uncovering subtypes of disease from microarray samples has important clinical implications such as survival time and sensitivity of individual patients to specific therapies. Unsupervised clustering methods have been used to classify this type of data. However, most existing methods focus on clusters with compact shapes and do not reflect the geometric complexity of the high dimensional microarray clusters, which limits their performance. Results We present a cluster-number-based ensemble clustering algorithm, called MULTI-K, for microarray sample classification, which demonstrates remarkable accuracy. The method amalgamates multiple k-means runs by varying the number of clusters and identifies clusters that manifest the most robust co-memberships of elements. In addition to the original algorithm, we newly devised the entropy-plot to control the separation of singletons or small clusters. MULTI-K, unlike the simple k-means or other widely used methods, was able to capture clusters with complex and high-dimensional structures accurately. MULTI-K outperformed other methods including a recently developed ensemble clustering algorithm in tests with five simulated and eight real gene-expression data sets. Conclusion The geometric complexity of clusters should be taken into account for accurate classification of microarray data, and ensemble clustering applied to the number of clusters tackles the problem very well. The C++ code and the data sets tested are available from the authors. PMID:19698124

  2. Optimal design of a plot cluster for monitoring

    Treesearch

    Charles T. Scott

    1993-01-01

    Traveling costs incurred during extensive forest surveys make cluster sampling cost-effective. Clusters are specified by the type of plots, plot size, number of plots, and the distance between plots within the cluster. A method to determine the optimal cluster design when different plot types are used for different forest resource attributes is described. The method...

  3. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

    NASA Astrophysics Data System (ADS)

    Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

    2014-05-01

    Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.

  4. Spatial cluster analysis of nanoscopically mapped serotonin receptors for classification of fixed brain tissue

    NASA Astrophysics Data System (ADS)

    Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw

    2014-01-01

    We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.

  5. Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR)

    PubMed Central

    Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

    2007-01-01

    Background Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. Methods We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. Application We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. Conclusion This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy. PMID:17543100

  6. Clustering of longitudinal data by using an extended baseline: A new method for treatment efficacy clustering in longitudinal data.

    PubMed

    Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine

    2018-01-01

    Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.

  7. Managing Clustered Data Using Hierarchical Linear Modeling

    ERIC Educational Resources Information Center

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  8. Testing the accuracy of clustering redshifts with simulations

    NASA Astrophysics Data System (ADS)

    Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

    2018-03-01

    We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.

  9. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    NASA Astrophysics Data System (ADS)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  10. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison

    PubMed Central

    Matsen IV, Frederick A.; Evans, Steven N.

    2013-01-01

    Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. PMID:23505415

  11. Mutation Clusters from Cancer Exome.

    PubMed

    Kakushadze, Zura; Yu, Willie

    2017-08-15

    We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.

  12. Mutation Clusters from Cancer Exome

    PubMed Central

    Kakushadze, Zura; Yu, Willie

    2017-01-01

    We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development. PMID:28809811

  13. Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation

    PubMed Central

    Iida, Shinji; Nakamura, Haruki; Higo, Junichi

    2016-01-01

    We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein–protein or protein–ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. PMID:27288028

  14. Population clustering based on copy number variations detected from next generation sequencing data.

    PubMed

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  15. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nurgaliev, D.; McDonald, M.; Benson, B. A.

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  16. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE PAGES

    Nurgaliev, D.; McDonald, M.; Benson, B. A.; ...

    2017-05-16

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  17. EVIDENCE FOR THE UNIVERSALITY OF PROPERTIES OF RED-SEQUENCE GALAXIES IN X-RAY- AND RED-SEQUENCE-SELECTED CLUSTERS AT z ∼ 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foltz, R.; Wilson, G.; DeGroot, A.

    We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable tomore » detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.« less

  18. Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR).

    PubMed

    Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

    2007-06-01

    Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy.

  19. Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation.

    PubMed

    Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J

    2009-04-01

    Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67x3 (67 clusters of three observations) and a 33x6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67x3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis.

  20. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  1. Machine learning approaches for estimation of prediction interval for the model output.

    PubMed

    Shrestha, Durga L; Solomatine, Dimitri P

    2006-03-01

    A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.

  2. Rigid geometry solves “curse of dimensionality” effects in clustering methods: An application to omics data

    PubMed Central

    2017-01-01

    The quality of samples preserved long term at ultralow temperatures has not been adequately studied. To improve our understanding, we need a strategy to analyze protein degradation and metabolism at subfreezing temperatures. To do this, we obtained liquid chromatography-mass spectrometry (LC/MS) data of calculated protein signal intensities in HEK-293 cells. Our first attempt at directly clustering the values failed, most likely due to the so-called “curse of dimensionality”. The clusters were not reproducible, and the outputs differed with different methods. By utilizing rigid geometry with a prime ideal I-adic (p-adic) metric, however, we rearranged the sample clusters into a meaningful and reproducible order, and the results were the same with each of the different clustering methods tested. Furthermore, we have also succeeded in application of this method to expression array data in similar situations. Thus, we eliminated the “curse of dimensionality” from the data set, at least in clustering methods. It is possible that our approach determines a characteristic value of systems that follow a Boltzmann distribution. PMID:28614363

  3. Hierarchical modeling of cluster size in wildlife surveys

    USGS Publications Warehouse

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  4. Enhanced conformational sampling to visualize a free-energy landscape of protein complex formation.

    PubMed

    Iida, Shinji; Nakamura, Haruki; Higo, Junichi

    2016-06-15

    We introduce various, recently developed, generalized ensemble methods, which are useful to sample various molecular configurations emerging in the process of protein-protein or protein-ligand binding. The methods introduced here are those that have been or will be applied to biomolecular binding, where the biomolecules are treated as flexible molecules expressed by an all-atom model in an explicit solvent. Sampling produces an ensemble of conformations (snapshots) that are thermodynamically probable at room temperature. Then, projection of those conformations to an abstract low-dimensional space generates a free-energy landscape. As an example, we show a landscape of homo-dimer formation of an endothelin-1-like molecule computed using a generalized ensemble method. The lowest free-energy cluster at room temperature coincided precisely with the experimentally determined complex structure. Two minor clusters were also found in the landscape, which were largely different from the native complex form. Although those clusters were isolated at room temperature, with rising temperature a pathway emerged linking the lowest and second-lowest free-energy clusters, and a further temperature increment connected all the clusters. This exemplifies that the generalized ensemble method is a powerful tool for computing the free-energy landscape, by which one can discuss the thermodynamic stability of clusters and the temperature dependence of the cluster networks. © 2016 The Author(s).

  5. Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes.

    PubMed

    Johnson, Jacqueline L; Kreidler, Sarah M; Catellier, Diane J; Murray, David M; Muller, Keith E; Glueck, Deborah H

    2015-11-30

    We used theoretical and simulation-based approaches to study Type I error rates for one-stage and two-stage analytic methods for cluster-randomized designs. The one-stage approach uses the observed data as outcomes and accounts for within-cluster correlation using a general linear mixed model. The two-stage model uses the cluster specific means as the outcomes in a general linear univariate model. We demonstrate analytically that both one-stage and two-stage models achieve exact Type I error rates when cluster sizes are equal. With unbalanced data, an exact size α test does not exist, and Type I error inflation may occur. Via simulation, we compare the Type I error rates for four one-stage and six two-stage hypothesis testing approaches for unbalanced data. With unbalanced data, the two-stage model, weighted by the inverse of the estimated theoretical variance of the cluster means, and with variance constrained to be positive, provided the best Type I error control for studies having at least six clusters per arm. The one-stage model with Kenward-Roger degrees of freedom and unconstrained variance performed well for studies having at least 14 clusters per arm. The popular analytic method of using a one-stage model with denominator degrees of freedom appropriate for balanced data performed poorly for small sample sizes and low intracluster correlation. Because small sample sizes and low intracluster correlation are common features of cluster-randomized trials, the Kenward-Roger method is the preferred one-stage approach. Copyright © 2015 John Wiley & Sons, Ltd.

  6. Observed intra-cluster correlation coefficients in a cluster survey sample of patient encounters in general practice in Australia

    PubMed Central

    Knox, Stephanie A; Chondros, Patty

    2004-01-01

    Background Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Methods Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Results Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. Conclusions The demonstrated precision and reliability of the estimated intra-cluster correlations indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit. PMID:15613248

  7. Self-similarity Clustering Event Detection Based on Triggers Guidance

    NASA Astrophysics Data System (ADS)

    Zhang, Xianfei; Li, Bicheng; Tian, Yuxuan

    Traditional method of Event Detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper doesn't classify event using word as samples, but cluster event in judging event types. It adopts self-similarity to convergence the value of K in K-means algorithm by the guidance of event triggers, and optimizes clustering algorithm. Then, combining with named entity and its comparative position information, the new method further make sure the pinpoint type of event. The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.

  8. Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation

    PubMed Central

    Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J

    2009-01-01

    Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. PMID:20011037

  9. ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

    PubMed Central

    Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.

    2010-01-01

    Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420

  10. A two-stage cluster sampling method using gridded population data, a GIS, and Google Earth(TM) imagery in a population-based mortality survey in Iraq.

    PubMed

    Galway, Lp; Bell, Nathaniel; Sae, Al Shatari; Hagopian, Amy; Burnham, Gilbert; Flaxman, Abraham; Weiss, Wiliam M; Rajaratnam, Julie; Takaro, Tim K

    2012-04-27

    Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.

  11. A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq

    PubMed Central

    2012-01-01

    Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings. PMID:22540266

  12. Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali

    PubMed Central

    2012-01-01

    Background Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. Methods We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. Results VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Conclusions Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes. PMID:23057445

  13. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    PubMed

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  14. Clustering of change patterns using Fourier coefficients.

    PubMed

    Kim, Jaehee; Kim, Haseong

    2008-01-15

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.

  15. Recognition of genetically modified product based on affinity propagation clustering and terahertz spectroscopy

    NASA Astrophysics Data System (ADS)

    Liu, Jianjun; Kan, Jianquan

    2018-04-01

    In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.

  16. Dimensional assessment of personality pathology in patients with eating disorders.

    PubMed

    Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L

    1999-02-22

    This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.

  17. Spatially explicit population estimates for black bears based on cluster sampling

    USGS Publications Warehouse

    Humm, J.; McCown, J. Walter; Scheick, B.K.; Clark, Joseph D.

    2017-01-01

    We estimated abundance and density of the 5 major black bear (Ursus americanus) subpopulations (i.e., Eglin, Apalachicola, Osceola, Ocala-St. Johns, Big Cypress) in Florida, USA with spatially explicit capture-mark-recapture (SCR) by extracting DNA from hair samples collected at barbed-wire hair sampling sites. We employed a clustered sampling configuration with sampling sites arranged in 3 × 3 clusters spaced 2 km apart within each cluster and cluster centers spaced 16 km apart (center to center). We surveyed all 5 subpopulations encompassing 38,960 km2 during 2014 and 2015. Several landscape variables, most associated with forest cover, helped refine density estimates for the 5 subpopulations we sampled. Detection probabilities were affected by site-specific behavioral responses coupled with individual capture heterogeneity associated with sex. Model-averaged bear population estimates ranged from 120 (95% CI = 59–276) bears or a mean 0.025 bears/km2 (95% CI = 0.011–0.44) for the Eglin subpopulation to 1,198 bears (95% CI = 949–1,537) or 0.127 bears/km2 (95% CI = 0.101–0.163) for the Ocala-St. Johns subpopulation. The total population estimate for our 5 study areas was 3,916 bears (95% CI = 2,914–5,451). The clustered sampling method coupled with information on land cover was efficient and allowed us to estimate abundance across extensive areas that would not have been possible otherwise. Clustered sampling combined with spatially explicit capture-recapture methods has the potential to provide rigorous population estimates for a wide array of species that are extensive and heterogeneous in their distribution.

  18. Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

    DOE PAGES

    Old, L.; Wojtak, R.; Pearce, F. R.; ...

    2017-12-20

    With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

  19. Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Old, L.; Wojtak, R.; Pearce, F. R.

    With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

  20. Molecular-based rapid inventories of sympatric diversity: a comparison of DNA barcode clustering methods applied to geography-based vs clade-based sampling of amphibians.

    PubMed

    Paz, Andrea; Crawford, Andrew J

    2012-11-01

    Molecular markers offer a universal source of data for quantifying biodiversity. DNA barcoding uses a standardized genetic marker and a curated reference database to identify known species and to reveal cryptic diversity within wellsampled clades. Rapid biological inventories, e.g. rapid assessment programs (RAPs), unlike most barcoding campaigns, are focused on particular geographic localities rather than on clades. Because of the potentially sparse phylogenetic sampling, the addition of DNA barcoding to RAPs may present a greater challenge for the identification of named species or for revealing cryptic diversity. In this article we evaluate the use of DNA barcoding for quantifying lineage diversity within a single sampling site as compared to clade-based sampling, and present examples from amphibians. We compared algorithms for identifying DNA barcode clusters (e.g. species, cryptic species or Evolutionary Significant Units) using previously published DNA barcode data obtained from geography-based sampling at a site in Central Panama, and from clade-based sampling in Madagascar. We found that clustering algorithms based on genetic distance performed similarly on sympatric as well as clade-based barcode data, while a promising coalescent-based method performed poorly on sympatric data. The various clustering algorithms were also compared in terms of speed and software implementation. Although each method has its shortcomings in certain contexts, we recommend the use of the ABGD method, which not only performs fairly well under either sampling method, but does so in a few seconds and with a user-friendly Web interface.

  1. Properties of star clusters - I. Automatic distance and extinction estimates

    NASA Astrophysics Data System (ADS)

    Buckner, Anne S. M.; Froebrich, Dirk

    2013-12-01

    Determining star cluster distances is essential to analyse their properties and distribution in the Galaxy. In particular, it is desirable to have a reliable, purely photometric distance estimation method for large samples of newly discovered cluster candidates e.g. from the Two Micron All Sky Survey, the UK Infrared Deep Sky Survey Galactic Plane Survey and VVV. Here, we establish an automatic method to estimate distances and reddening from near-infrared photometry alone, without the use of isochrone fitting. We employ a decontamination procedure of JHK photometry to determine the density of stars foreground to clusters and a galactic model to estimate distances. We then calibrate the method using clusters with known properties. This allows us to establish distance estimates with better than 40 per cent accuracy. We apply our method to determine the extinction and distance values to 378 known open clusters and 397 cluster candidates from the list of Froebrich, Scholz & Raftery. We find that the sample is biased towards clusters of a distance of approximately 3 kpc, with typical distances between 2 and 6 kpc. Using the cluster distances and extinction values, we investigate how the average extinction per kiloparsec distance changes as a function of the Galactic longitude. We find a systematic dependence that can be approximated by AH(l) [mag kpc-1] = 0.10 + 0.001 × |l - 180°|/° for regions more than 60° from the Galactic Centre.

  2. Necessary Sequencing Depth and Clustering Method to Obtain Relatively Stable Diversity Patterns in Studying Fish Gut Microbiota.

    PubMed

    Xiao, Fanshu; Yu, Yuhe; Li, Jinjin; Juneau, Philippe; Yan, Qingyun

    2018-05-25

    The 16S rRNA gene is one of the most commonly used molecular markers for estimating bacterial diversity during the past decades. However, there is no consistency about the sequencing depth (from thousand to millions of sequences per sample), and the clustering methods used to generate OTUs may also be different among studies. These inconsistent premises make effective comparisons among studies difficult or unreliable. This study aims to examine the necessary sequencing depth and clustering method that would be needed to ensure a stable diversity patterns for studying fish gut microbiota. A total number of 42 samples dataset of Siniperca chuatsi (carnivorous fish) gut microbiota were used to test how the sequencing depth and clustering may affect the alpha and beta diversity patterns of fish intestinal microbiota. Interestingly, we found that the sequencing depth (resampling 1000-11,000 per sample) and the clustering methods (UPARSE and UCLUST) did not bias the estimates of the diversity patterns during the fish development from larva to adult. Although we should acknowledge that a suitable sequencing depth may differ case by case, our finding indicates that a shallow sequencing such as 1000 sequences per sample may be also enough to reflect the general diversity patterns of fish gut microbiota. However, we have shown in the present study that strict pre-processing of the original sequences is required to ensure reliable results. This study provides evidences to help making a strong scientific choice of the sequencing depth and clustering method for future studies on fish gut microbiota patterns, but at the same time reducing as much as possible the costs related to the analysis.

  3. Declustering of clustered preferential sampling for histogram and semivariogram inference

    USGS Publications Warehouse

    Olea, R.A.

    2007-01-01

    Measurements of attributes obtained more as a consequence of business ventures than sampling design frequently result in samplings that are preferential both in location and value, typically in the form of clusters along the pay. Preferential sampling requires preprocessing for the purpose of properly inferring characteristics of the parent population, such as the cumulative distribution and the semivariogram. Consideration of the distance to the nearest neighbor allows preparation of resampled sets that produce comparable results to those from previously proposed methods. Clustered sampling of size 140, taken from an exhaustive sampling, is employed to illustrate this approach. ?? International Association for Mathematical Geology 2007.

  4. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods aremore » applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross-validation methodology. A previously developed B-spline active rays segmentation method was also considered for comparison purposes. Results: Interobserver and intraobserver segmentation agreements (median and [25%, 75%] quartile range) were substantial with respect to the distance metrics HDIST{sub cluster} (2.3 [1.8, 2.9] and 2.5 [2.1, 3.2] pixels) and AMINDIST{sub cluster} (0.8 [0.6, 1.0] and 1.0 [0.8, 1.2] pixels), while moderate with respect to AOM{sub cluster} (0.64 [0.55, 0.71] and 0.59 [0.52, 0.66]). The proposed segmentation method outperformed (0.80 ± 0.04) statistically significantly (Mann-Whitney U-test, p < 0.05) the B-spline active rays segmentation method (0.69 ± 0.04), suggesting the significance of the proposed semiautomated method. Conclusions: Results indicate a reliable semiautomated segmentation method for MC clusters offered by deformable models, which could be utilized in MC cluster quantitative image analysis.« less

  5. A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.

    PubMed

    Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang

    2016-12-01

    This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.

  6. Uniform deposition of size-selected clusters using Lissajous scanning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beniya, Atsushi; Watanabe, Yoshihide, E-mail: e0827@mosk.tytlabs.co.jp; Hirata, Hirohito

    2016-05-15

    Size-selected clusters can be deposited on the surface using size-selected cluster ion beams. However, because of the cross-sectional intensity distribution of the ion beam, it is difficult to define the coverage of the deposited clusters. The aggregation probability of the cluster depends on coverage, whereas cluster size on the surface depends on the position, despite the size-selected clusters are deposited. It is crucial, therefore, to deposit clusters uniformly on the surface. In this study, size-selected clusters were deposited uniformly on surfaces by scanning the cluster ions in the form of Lissajous pattern. Two sets of deflector electrodes set in orthogonalmore » directions were placed in front of the sample surface. Triangular waves were applied to the electrodes with an irrational frequency ratio to ensure that the ion trajectory filled the sample surface. The advantages of this method are simplicity and low cost of setup compared with raster scanning method. The authors further investigated CO adsorption on size-selected Pt{sub n} (n = 7, 15, 20) clusters uniformly deposited on the Al{sub 2}O{sub 3}/NiAl(110) surface and demonstrated the importance of uniform deposition.« less

  7. Methods of developing core collections based on the predicted genotypic value of rice ( Oryza sativa L.).

    PubMed

    Li, C T; Shi, C H; Wu, J G; Xu, H M; Zhang, H Z; Ren, Y L

    2004-04-01

    The selection of an appropriate sampling strategy and a clustering method is important in the construction of core collections based on predicted genotypic values in order to retain the greatest degree of genetic diversity of the initial collection. In this study, methods of developing rice core collections were evaluated based on the predicted genotypic values for 992 rice varieties with 13 quantitative traits. The genotypic values of the traits were predicted by the adjusted unbiased prediction (AUP) method. Based on the predicted genotypic values, Mahalanobis distances were calculated and employed to measure the genetic similarities among the rice varieties. Six hierarchical clustering methods, including the single linkage, median linkage, centroid, unweighted pair-group average, weighted pair-group average and flexible-beta methods, were combined with random, preferred and deviation sampling to develop 18 core collections of rice germplasm. The results show that the deviation sampling strategy in combination with the unweighted pair-group average method of hierarchical clustering retains the greatest degree of genetic diversities of the initial collection. The core collections sampled using predicted genotypic values had more genetic diversity than those based on phenotypic values.

  8. Evaluation of the procedure 1A component of the 1980 US/Canada wheat and barley exploratory experiment

    NASA Technical Reports Server (NTRS)

    Chapman, G. M. (Principal Investigator); Carnes, J. G.

    1981-01-01

    Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.

  9. X-ray morphological study of galaxy cluster catalogues

    NASA Astrophysics Data System (ADS)

    Democles, Jessica; Pierre, Marguerite; Arnaud, Monique

    2016-07-01

    Context : The intra-cluster medium distribution as probed by X-ray morphology based analysis gives good indication of the system dynamical state. In the race for the determination of precise scaling relations and understanding their scatter, the dynamical state offers valuable information. Method : We develop the analysis of the centroid-shift so that it can be applied to characterize galaxy cluster surveys such as the XXL survey or high redshift cluster samples. We use it together with the surface brightness concentration parameter and the offset between X-ray peak and brightest cluster galaxy in the context of the XXL bright cluster sample (Pacaud et al 2015) and a set of high redshift massive clusters detected by Planck and SPT and observed by both XMM-Newton and Chandra observatories. Results : Using the wide redshift coverage of the XXL sample, we see no trend between the dynamical state of the systems with the redshift.

  10. Progeny Clustering: A Method to Identify Biological Phenotypes

    PubMed Central

    Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

    2015-01-01

    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476

  11. Application of unsupervised pattern recognition approaches for exploration of rare earth elements in Se-Chahun iron ore, central Iran

    NASA Astrophysics Data System (ADS)

    Sarparandeh, Mohammadali; Hezarkhani, Ardeshir

    2017-12-01

    The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed methods and geological studies leads to finding some hidden information, and this approach has the best results compared to using only one of them.

  12. Restricted random search method based on taboo search in the multiple minima problem

    NASA Astrophysics Data System (ADS)

    Hong, Seung Do; Jhon, Mu Shik

    1997-03-01

    The restricted random search method is proposed as a simple Monte Carlo sampling method to search minima fast in the multiple minima problem. This method is based on taboo search applied recently to continuous test functions. The concept of the taboo region instead of the taboo list is used and therefore the sampling of a region near an old configuration is restricted in this method. This method is applied to 2-dimensional test functions and the argon clusters. This method is found to be a practical and efficient method to search near-global configurations of test functions and the argon clusters.

  13. VizieR Online Data Catalog: Star clusters distances and extinctions (Buckner+, 2013)

    NASA Astrophysics Data System (ADS)

    Buckner, A. S. M.; Froebrich, D.

    2014-10-01

    Determining star cluster distances is essential to analyse their properties and distribution in the Galaxy. In particular, it is desirable to have a reliable, purely photometric distance estimation method for large samples of newly discovered cluster candidates e.g. from the Two Micron All Sky Survey, the UK Infrared Deep Sky Survey Galactic Plane Survey and VVV. Here, we establish an automatic method to estimate distances and reddening from near-infrared photometry alone, without the use of isochrone fitting. We employ a decontamination procedure of JHK photometry to determine the density of stars foreground to clusters and a galactic model to estimate distances. We then calibrate the method using clusters with known properties. This allows us to establish distance estimates with better than 40 percent accuracy. We apply our method to determine the extinction and distance values to 378 known open clusters and 397 cluster candidates from the list of Froebrich, Scholz & Raftery (2007MNRAS.374..399F, Cat. J/MNRAS/374/399). We find that the sample is biased towards clusters of a distance of approximately 3kpc, with typical distances between 2 and 6kpc. Using the cluster distances and extinction values, we investigate how the average extinction per kiloparsec distance changes as a function of the Galactic longitude. We find a systematic dependence that can be approximated by AH(l)[mag/kpc]=0.10+0.001x|l-180°|/° for regions more than 60° from the Galactic Centre. (1 data file).

  14. Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important.

    PubMed

    Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A

    2016-08-05

    Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  15. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  16. The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

    PubMed

    Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun

    2017-03-23

    The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.

  17. A method of using cluster analysis to study statistical dependence in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.

  18. Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

    PubMed Central

    Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

    2012-01-01

    Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360

  19. Clustering of samples and variables with mixed-type data

    PubMed Central

    Edelmann, Dominic; Kopp-Schneider, Annette

    2017-01-01

    Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix. PMID:29182671

  20. Optimization of fecal cytology in the dog: comparison of three sampling methods.

    PubMed

    Frezoulis, Petros S; Angelidou, Elisavet; Diakou, Anastasia; Rallis, Timoleon S; Mylonakis, Mathios E

    2017-09-01

    Dry-mount fecal cytology (FC) is a component of the diagnostic evaluation of gastrointestinal diseases. There is limited information on the possible effect of the sampling method on the cytologic findings of healthy dogs or dogs admitted with diarrhea. We aimed to: (1) establish sampling method-specific expected values of selected cytologic parameters (isolated or clustered epithelial cells, neutrophils, lymphocytes, macrophages, spore-forming rods) in clinically healthy dogs; (2) investigate if the detection of cytologic abnormalities differs among methods in dogs admitted with diarrhea; and (3) investigate if there is any association between FC abnormalities and the anatomic origin (small- or large-bowel diarrhea) or the chronicity of diarrhea. Sampling with digital examination (DE), rectal scraping (RS), and rectal lavage (RL) was prospectively assessed in 37 healthy and 34 diarrheic dogs. The median numbers of isolated ( p = 0.000) or clustered ( p = 0.002) epithelial cells, and of lymphocytes ( p = 0.000), differed among the 3 methods in healthy dogs. In the diarrheic dogs, the RL method was the least sensitive in detecting neutrophils, and isolated or clustered epithelial cells. Cytologic abnormalities were not associated with the origin or the chronicity of diarrhea. Sampling methods differed in their sensitivity to detect abnormalities in FC; DE or RS may be of higher sensitivity compared to RL. Anatomic origin or chronicity of diarrhea do not seem to affect the detection of cytologic abnormalities.

  1. Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study

    PubMed Central

    2013-01-01

    Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148

  2. Synthesis and Characterization of Novel Compound Clusters

    DTIC Science & Technology

    1997-08-26

    also be intrinsically stable, they cannot be formed by this plasma chemistry presumably because the metals are less reactive. Plasma chemistry reactions...samples without the presence of hydrogen. Vaporization of these composite samples produces the metal carbide clusters in many cases where plasma chemistry does...antimony or bismuth cannot be produced by the hydrocarbon plasma chemistry method, but they are produced readily from composite sample (metal film on

  3. Cluster Masses Derived from X-ray and Sunyaev-Zeldovich Effect Measurements

    NASA Technical Reports Server (NTRS)

    Laroque, S.; Joy, Marshall; Bonamente, M.; Carlstrom, J.; Dawson, K.

    2003-01-01

    We infer the gas mass and total gravitational mass of 11 clusters using two different methods; analysis of X-ray data from the Chandra X-ray Observatory and analysis of centimeter-wave Sunyaev-Zel'dovich Effect (SZE) data from the BIMA and OVRO interferometers. This flux-limited sample of clusters from the BCS cluster catalogue was chosen so as to be well above the surface brightness limit of the ROSAT All Sky Survey; this is therefore an orientation unbiased sample. The gas mass fraction, f_g, is calculated for each cluster using both X-ray and SZE data, and the results are compared at a fiducial radius of r_500. Comparison of the X-ray and SZE results for this orientation unbiased sample allows us to constrain cluster systematics, such as clumping of the intracluster medium. We derive an upper limit on Omega_M assuming that the mass composition of clusters within r_500 reflects the universal mass composition Omega_M h_100 is greater than Omega _B / f-g. We also demonstrate how the mean f_g derived from the sample can be used to estimate the masses of clusters discovered by upcoming deep SZE surveys.

  4. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    PubMed

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  5. Common methods for fecal sample storage in field studies yield consistent signatures of individual identity in microbiome sequencing data.

    PubMed

    Blekhman, Ran; Tang, Karen; Archie, Elizabeth A; Barreiro, Luis B; Johnson, Zachary P; Wilson, Mark E; Kohn, Jordan; Yuan, Michael L; Gesquiere, Laurence; Grieneisen, Laura E; Tung, Jenny

    2016-08-16

    Field studies of wild vertebrates are frequently associated with extensive collections of banked fecal samples-unique resources for understanding ecological, behavioral, and phylogenetic effects on the gut microbiome. However, we do not understand whether sample storage methods confound the ability to investigate interindividual variation in gut microbiome profiles. Here, we extend previous work on storage methods for gut microbiome samples by comparing immediate freezing, the gold standard of preservation, to three methods commonly used in vertebrate field studies: lyophilization, storage in ethanol, and storage in RNAlater. We found that the signature of individual identity consistently outweighed storage effects: alpha diversity and beta diversity measures were significantly correlated across methods, and while samples often clustered by donor, they never clustered by storage method. Provided that all analyzed samples are stored the same way, banked fecal samples therefore appear highly suitable for investigating variation in gut microbiota. Our results open the door to a much-expanded perspective on variation in the gut microbiome across species and ecological contexts.

  6. Density-based clustering of small peptide conformations sampled from a molecular dynamics simulation.

    PubMed

    Kim, Minkyoung; Choi, Seung-Hoon; Kim, Junhyoung; Choi, Kihang; Shin, Jae-Min; Kang, Sang-Kee; Choi, Yun-Jaie; Jung, Dong Hyun

    2009-11-01

    This study describes the application of a density-based algorithm to clustering small peptide conformations after a molecular dynamics simulation. We propose a clustering method for small peptide conformations that enables adjacent clusters to be separated more clearly on the basis of neighbor density. Neighbor density means the number of neighboring conformations, so if a conformation has too few neighboring conformations, then it is considered as noise or an outlier and is excluded from the list of cluster members. With this approach, we can easily identify clusters in which the members are densely crowded in the conformational space, and we can safely avoid misclustering individual clusters linked by noise or outliers. Consideration of neighbor density significantly improves the efficiency of clustering of small peptide conformations sampled from molecular dynamics simulations and can be used for predicting peptide structures.

  7. The contribution of cluster and discriminant analysis to the classification of complex aquifer systems.

    PubMed

    Panagopoulos, G P; Angelopoulou, D; Tzirtzilakis, E E; Giannoulopoulos, P

    2016-10-01

    This paper presents an innovated method for the discrimination of groundwater samples in common groups representing the hydrogeological units from where they have been pumped. This method proved very efficient even in areas with complex hydrogeological regimes. The proposed method requires chemical analyses of water samples only for major ions, meaning that it is applicable to most of cases worldwide. Another benefit of the method is that it gives a further insight of the aquifer hydrogeochemistry as it provides the ions that are responsible for the discrimination of the group. The procedure begins with cluster analysis of the dataset in order to classify the samples in the corresponding hydrogeological unit. The feasibility of the method is proven from the fact that the samples of volcanic origin were separated into two different clusters, namely the lava units and the pyroclastic-ignimbritic aquifer. The second step is the discriminant analysis of the data which provides the functions that distinguish the groups from each other and the most significant variables that define the hydrochemical composition of the aquifer. The whole procedure was highly successful as the 94.7 % of the samples were classified to the correct aquifer system. Finally, the resulted functions can be safely used to categorize samples of either unknown or doubtful origin improving thus the quality and the size of existing hydrochemical databases.

  8. Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis.

    PubMed

    Liao, Minlei; Li, Yunfeng; Kianifard, Farid; Obi, Engels; Arcona, Stephen

    2016-03-02

    Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward's methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores. The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster.

  9. Automated cloud screening of AVHRR imagery using split-and-merge clustering

    NASA Technical Reports Server (NTRS)

    Gallaudet, Timothy C.; Simpson, James J.

    1991-01-01

    Previous methods to segment clouds from ocean in AVHRR imagery have shown varying degrees of success, with nighttime approaches being the most limited. An improved method of automatic image segmentation, the principal component transformation split-and-merge clustering (PCTSMC) algorithm, is presented and applied to cloud screening of both nighttime and daytime AVHRR data. The method combines spectral differencing, the principal component transformation, and split-and-merge clustering to sample objectively the natural classes in the data. This segmentation method is then augmented by supervised classification techniques to screen clouds from the imagery. Comparisons with other nighttime methods demonstrate its improved capability in this application. The sensitivity of the method to clustering parameters is presented; the results show that the method is insensitive to the split-and-merge thresholds.

  10. VizieR Online Data Catalog: Star clusters distances and extinctions. II. (Buckner+, 2014)

    NASA Astrophysics Data System (ADS)

    Buckner, A. S. M.; Froebrich, D.

    2015-04-01

    Until now, it has been impossible to observationally measure how star cluster scaleheight evolves beyond 1Gyr as only small samples have been available. Here, we establish a novel method to determine the scaleheight of a cluster sample using modelled distributions and Kolmogorov-Smirnov tests. This allows us to determine the scaleheight with a 25% accuracy for samples of 38 clusters or more. We apply our method to investigate the temporal evolution of cluster scaleheight, using homogeneously selected sub-samples of Kharchenko et al. (MWSC, 2012, Cat. J/A+A/543/A156, 2013, J/A+A/558/A53 ), Dias et al. (DAML02, 2002A&A...389..871D, Cat. B/ocl), WEBDA, and Froebrich et al. (FSR, 2007MNRAS.374..399F, Cat. J/MNRAS/374/399). We identify a linear relationship between scaleheight and log(age/yr) of clusters, considerably different from field stars. The scaleheight increases from about 40pc at 1Myr to 75pc at 1Gyr, most likely due to internal evolution and external scattering events. After 1Gyr, there is a marked change of the behaviour, with the scaleheight linearly increasing with log(age/yr) to about 550pc at 3.5Gyr. The most likely interpretation is that the surviving clusters are only observable because they have been scattered away from the mid-plane in their past. A detailed understanding of this observational evidence can only be achieved with numerical simulations of the evolution of cluster samples in the Galactic disc. Furthermore, we find a weak trend of an age-independent increase in scaleheight with Galactocentric distance. There are no significant temporal or spatial variations of the cluster distribution zero-point. We determine the Sun's vertical displacement from the Galactic plane as Z⊙=18.5+/-1.2pc. (1 data file).

  11. Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials.

    PubMed

    Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B

    2017-04-01

    Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.

  12. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-07-01

    In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.

  13. A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.

    PubMed

    Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip

    2014-11-01

    This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.

  14. Grouping methods for estimating the prevalences of rare traits from complex survey data that preserve confidentiality of respondents.

    PubMed

    Hyun, Noorie; Gastwirth, Joseph L; Graubard, Barry I

    2018-03-26

    Originally, 2-stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1-stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1-stage group testing to surveys with sample weighted complex multistage-cluster designs. Sample weighted-generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests. Two difficulties arise when using group testing in complex samples: (1) How does one weight the results of the test on each group as the sample weights will differ among observations in the same group. Furthermore, if the sample weights are related to positivity of the diagnostic test, then group-level weighting is needed to reduce bias in the prevalence estimation; (2) How does one form groups that will allow accurate estimation of the standard errors of prevalence estimates under multistage-cluster sampling allowing for intracluster correlation of the test results. We study 5 different grouping methods to address the weighting and cluster sampling aspects of complex designed samples. Finite sample properties of the estimators of prevalences, variances, and confidence interval coverage for these grouping methods are studied using simulations. National Health and Nutrition Examination Survey data are used to illustrate the methods. Copyright © 2018 John Wiley & Sons, Ltd.

  15. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    ERIC Educational Resources Information Center

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  16. Inherent Structure versus Geometric Metric for State Space Discretization

    PubMed Central

    Liu, Hanzhong; Li, Minghai; Fan, Jue; Huo, Shuanghong

    2016-01-01

    Inherent structure (IS) and geometry-based clustering methods are commonly used for analyzing molecular dynamics trajectories. ISs are obtained by minimizing the sampled conformations into local minima on potential/effective energy surface. The conformations that are minimized into the same energy basin belong to one cluster. We investigate the influence of the applications of these two methods of trajectory decomposition on our understanding of the thermodynamics and kinetics of alanine tetrapeptide. We find that at the micro cluster level, the IS approach and root-mean-square deviation (RMSD) based clustering method give totally different results. Depending on the local features of energy landscape, the conformations with close RMSDs can be minimized into different minima, while the conformations with large RMSDs could be minimized into the same basin. However, the relaxation timescales calculated based on the transition matrices built from the micro clusters are similar. The discrepancy at the micro cluster level leads to different macro clusters. Although the dynamic models established through both clustering methods are validated approximately Markovian, the IS approach seems to give a meaningful state space discretization at the macro cluster level. PMID:26915811

  17. Method of identifying clusters representing statistical dependencies in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.

  18. Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimers Disease

    DTIC Science & Technology

    2015-12-01

    group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met- ric. For...differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as

  19. Physical properties of star clusters in the outer LMC as observed by the DES

    DOE PAGES

    Pieres, A.; Santiago, B.; Balbinot, E.; ...

    2016-05-26

    The Large Magellanic Cloud (LMC) harbors a rich and diverse system of star clusters, whose ages, chemical abundances, and positions provide information about the LMC history of star formation. We use Science Verification imaging data from the Dark Energy Survey to increase the census of known star clusters in the outer LMC and to derive physical parameters for a large sample of such objects using a spatially and photometrically homogeneous data set. Our sample contains 255 visually identified cluster candidates, of which 109 were not listed in any previous catalog. We quantify the crowding effect for the stellar sample producedmore » by the DES Data Management pipeline and conclude that the stellar completeness is < 10% inside typical LMC cluster cores. We therefore develop a pipeline to sample and measure stellar magnitudes and positions around the cluster candidates using DAOPHOT. We also implement a maximum-likelihood method to fit individual density profiles and colour-magnitude diagrams. For 117 (from a total of 255) of the cluster candidates (28 uncatalogued clusters), we obtain reliable ages, metallicities, distance moduli and structural parameters, confirming their nature as physical systems. The distribution of cluster metallicities shows a radial dependence, with no clusters more metal-rich than [Fe/H] ~ -0.7 beyond 8 kpc from the LMC center. Furthermore, the age distribution has two peaks at ≃ 1.2 Gyr and ≃ 2.7 Gyr.« less

  20. Physical properties of star clusters in the outer LMC as observed by the DES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pieres, A.; Santiago, B.; Balbinot, E.

    The Large Magellanic Cloud (LMC) harbors a rich and diverse system of star clusters, whose ages, chemical abundances, and positions provide information about the LMC history of star formation. We use Science Verification imaging data from the Dark Energy Survey to increase the census of known star clusters in the outer LMC and to derive physical parameters for a large sample of such objects using a spatially and photometrically homogeneous data set. Our sample contains 255 visually identified cluster candidates, of which 109 were not listed in any previous catalog. We quantify the crowding effect for the stellar sample producedmore » by the DES Data Management pipeline and conclude that the stellar completeness is < 10% inside typical LMC cluster cores. We therefore develop a pipeline to sample and measure stellar magnitudes and positions around the cluster candidates using DAOPHOT. We also implement a maximum-likelihood method to fit individual density profiles and colour-magnitude diagrams. For 117 (from a total of 255) of the cluster candidates (28 uncatalogued clusters), we obtain reliable ages, metallicities, distance moduli and structural parameters, confirming their nature as physical systems. The distribution of cluster metallicities shows a radial dependence, with no clusters more metal-rich than [Fe/H] ~ -0.7 beyond 8 kpc from the LMC center. Furthermore, the age distribution has two peaks at ≃ 1.2 Gyr and ≃ 2.7 Gyr.« less

  1. Satellite quenching time-scales in clusters from projected phase space measurements matched to simulated orbits

    NASA Astrophysics Data System (ADS)

    Oman, Kyle A.; Hudson, Michael J.

    2016-12-01

    We measure the star formation quenching efficiency and time-scale in cluster environments. Our method uses N-body simulations to estimate the probability distribution of possible orbits for a sample of observed Sloan Digital Sky Survey galaxies in and around clusters based on their position and velocity offsets from their host cluster. We study the relationship between their star formation rates and their likely orbital histories via a simple model in which star formation is quenched once a delay time after infall has elapsed. Our orbit library method is designed to isolate the environmental effect on the star formation rate due to a galaxy's present-day host cluster from `pre-processing' in previous group hosts. We find that quenching of satellite galaxies of all stellar masses in our sample (109-10^{11.5}M_{⊙}) by massive (> 10^{13} M_{⊙}) clusters is essentially 100 per cent efficient. Our fits show that all galaxies quench on their first infall, approximately at or within a Gyr of their first pericentric passage. There is little variation in the onset of quenching from galaxy-to-galaxy: the spread in this time is at most ˜2 Gyr at fixed M*. Higher mass satellites quench earlier, with very little dependence on host cluster mass in the range probed by our sample.

  2. Clustering of self-organizing map identifies five distinct medulloblastoma subgroups.

    PubMed

    Cao, Changjun; Wang, Wei; Jiang, Pucha

    2016-01-01

    Medulloblastoma is one the most malignant paediatric brain tumours. Molecular subgrouping these medulloblastomas will not only help identify specific cohorts for certain treatment but also improve confidence in prognostic prediction. Currently, there is a consensus of the existences of four distinct subtypes of medulloblastoma. We proposed a novel bioinformatics method, clustering of self-organizing map, to determine the subgroups and their molecular diversity. Microarray expression profiles of 46 medulloblastoma samples were analysed and five clusters with distinct demographics, clinical outcome and transcriptional profiles were identified. The previously reported Wnt subgroup was identified as expected. Three other novel subgroups were proposed for later investigation. Our findings underscore the value of SOM clustering for discovering the medulloblastoma subgroups. When the suggested subdivision has been confirmed in large cohorts, this method should serve as a part of routine classification of clinical samples.

  3. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    NASA Astrophysics Data System (ADS)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  4. Dark Energy Survey Year 1 results: cross-correlation redshifts - methods and systematics characterization

    NASA Astrophysics Data System (ADS)

    Gatti, M.; Vielzeuf, P.; Davis, C.; Cawthon, R.; Rau, M. M.; DeRose, J.; De Vicente, J.; Alarcon, A.; Rozo, E.; Gaztanaga, E.; Hoyle, B.; Miquel, R.; Bernstein, G. M.; Bonnett, C.; Carnero Rosell, A.; Castander, F. J.; Chang, C.; da Costa, L. N.; Gruen, D.; Gschwend, J.; Hartley, W. G.; Lin, H.; MacCrann, N.; Maia, M. A. G.; Ogando, R. L. C.; Roodman, A.; Sevilla-Noarbe, I.; Troxel, M. A.; Wechsler, R. H.; Asorey, J.; Davis, T. M.; Glazebrook, K.; Hinton, S. R.; Lewis, G.; Lidman, C.; Macaulay, E.; Möller, A.; O'Neill, C. R.; Sommer, N. E.; Uddin, S. A.; Yuan, F.; Zhang, B.; Abbott, T. M. C.; Allam, S.; Annis, J.; Bechtol, K.; Brooks, D.; Burke, D. L.; Carollo, D.; Carrasco Kind, M.; Carretero, J.; Cunha, C. E.; D'Andrea, C. B.; DePoy, D. L.; Desai, S.; Eifler, T. F.; Evrard, A. E.; Flaugher, B.; Fosalba, P.; Frieman, J.; García-Bellido, J.; Gerdes, D. W.; Goldstein, D. A.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; Hoormann, J. K.; Jain, B.; James, D. J.; Jarvis, M.; Jeltema, T.; Johnson, M. W. G.; Johnson, M. D.; Krause, E.; Kuehn, K.; Kuhlmann, S.; Kuropatkin, N.; Li, T. S.; Lima, M.; Marshall, J. L.; Melchior, P.; Menanteau, F.; Nichol, R. C.; Nord, B.; Plazas, A. A.; Reil, K.; Rykoff, E. S.; Sako, M.; Sanchez, E.; Scarpine, V.; Schubnell, M.; Sheldon, E.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Tucker, B. E.; Tucker, D. L.; Vikram, V.; Walker, A. R.; Weller, J.; Wester, W.; Wolf, R. C.

    2018-06-01

    We use numerical simulations to characterize the performance of a clustering-based method to calibrate photometric redshift biases. In particular, we cross-correlate the weak lensing source galaxies from the Dark Energy Survey Year 1 sample with redMaGiC galaxies (luminous red galaxies with secure photometric redshifts) to estimate the redshift distribution of the former sample. The recovered redshift distributions are used to calibrate the photometric redshift bias of standard photo-z methods applied to the same source galaxy sample. We apply the method to two photo-z codes run in our simulated data: Bayesian Photometric Redshift and Directional Neighbourhood Fitting. We characterize the systematic uncertainties of our calibration procedure, and find that these systematic uncertainties dominate our error budget. The dominant systematics are due to our assumption of unevolving bias and clustering across each redshift bin, and to differences between the shapes of the redshift distributions derived by clustering versus photo-zs. The systematic uncertainty in the mean redshift bias of the source galaxy sample is Δz ≲ 0.02, though the precise value depends on the redshift bin under consideration. We discuss possible ways to mitigate the impact of our dominant systematics in future analyses.

  5. Unsupervised learning on scientific ocean drilling datasets from the South China Sea

    NASA Astrophysics Data System (ADS)

    Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.

    2018-06-01

    Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.

  6. Generating Random Samples of a Given Size Using Social Security Numbers.

    ERIC Educational Resources Information Center

    Erickson, Richard C.; Brauchle, Paul E.

    1984-01-01

    The purposes of this article are (1) to present a method by which social security numbers may be used to draw cluster samples of a predetermined size and (2) to describe procedures used to validate this method of drawing random samples. (JOW)

  7. Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

    PubMed Central

    Bushel, Pierre R; Wolfinger, Russell D; Gibson, Greg

    2007-01-01

    Background Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. Results We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. Conclusion The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable. PMID:17408499

  8. SU-G-TeP3-14: Three-Dimensional Cluster Model in Inhomogeneous Dose Distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, J; Penagaricano, J; Narayanasamy, G

    2016-06-15

    Purpose: We aim to investigate 3D cluster formation in inhomogeneous dose distribution to search for new models predicting radiation tissue damage and further leading to new optimization paradigm for radiotherapy planning. Methods: The aggregation of higher dose in the organ at risk (OAR) than a preset threshold was chosen as the cluster whose connectivity dictates the cluster structure. Upon the selection of the dose threshold, the fractional density defined as the fraction of voxels in the organ eligible to be part of the cluster was determined according to the dose volume histogram (DVH). A Monte Carlo method was implemented tomore » establish a case pertinent to the corresponding DVH. Ones and zeros were randomly assigned to each OAR voxel with the sampling probability equal to the fractional density. Ten thousand samples were randomly generated to ensure a sufficient number of cluster sets. A recursive cluster searching algorithm was developed to analyze the cluster with various connectivity choices like 1-, 2-, and 3-connectivity. The mean size of the largest cluster (MSLC) from the Monte Carlo samples was taken to be a function of the fractional density. Various OARs from clinical plans were included in the study. Results: Intensive Monte Carlo study demonstrates the inverse relationship between the MSLC and the cluster connectivity as anticipated and the cluster size does not change with fractional density linearly regardless of the connectivity types. An initially-slow-increase to exponential growth transition of the MSLC from low to high density was observed. The cluster sizes were found to vary within a large range and are relatively independent of the OARs. Conclusion: The Monte Carlo study revealed that the cluster size could serve as a suitable index of the tissue damage (percolation cluster) and the clinical outcome of the same DVH might be potentially different.« less

  9. Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali.

    PubMed

    Minetti, Andrea; Riera-Montes, Margarita; Nackers, Fabienne; Roederer, Thomas; Koudika, Marie Hortense; Sekkenes, Johanne; Taconet, Aurore; Fermon, Florence; Touré, Albouhary; Grais, Rebecca F; Checchi, Francesco

    2012-10-12

    Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes.

  10. Precision, time, and cost: a comparison of three sampling designs in an emergency setting.

    PubMed

    Deitchler, Megan; Deconinck, Hedwig; Bergeron, Gilles

    2008-05-02

    The conventional method to collect data on the health, nutrition, and food security status of a population affected by an emergency is a 30 x 30 cluster survey. This sampling method can be time and resource intensive and, accordingly, may not be the most appropriate one when data are needed rapidly for decision making. In this study, we compare the precision, time and cost of the 30 x 30 cluster survey with two alternative sampling designs: a 33 x 6 cluster design (33 clusters, 6 observations per cluster) and a 67 x 3 cluster design (67 clusters, 3 observations per cluster). Data for each sampling design were collected concurrently in West Darfur, Sudan in September-October 2005 in an emergency setting. Results of the study show the 30 x 30 design to provide more precise results (i.e. narrower 95% confidence intervals) than the 33 x 6 and 67 x 3 design for most child-level indicators. Exceptions are indicators of immunization and vitamin A capsule supplementation coverage which show a high intra-cluster correlation. Although the 33 x 6 and 67 x 3 designs provide wider confidence intervals than the 30 x 30 design for child anthropometric indicators, the 33 x 6 and 67 x 3 designs provide the opportunity to conduct a LQAS hypothesis test to detect whether or not a critical threshold of global acute malnutrition prevalence has been exceeded, whereas the 30 x 30 design does not. For the household-level indicators tested in this study, the 67 x 3 design provides the most precise results. However, our results show that neither the 33 x 6 nor the 67 x 3 design are appropriate for assessing indicators of mortality. In this field application, data collection for the 33 x 6 and 67 x 3 designs required substantially less time and cost than that required for the 30 x 30 design. The findings of this study suggest the 33 x 6 and 67 x 3 designs can provide useful time- and resource-saving alternatives to the 30 x 30 method of data collection in emergency settings.

  11. Precision, time, and cost: a comparison of three sampling designs in an emergency setting

    PubMed Central

    Deitchler, Megan; Deconinck, Hedwig; Bergeron, Gilles

    2008-01-01

    The conventional method to collect data on the health, nutrition, and food security status of a population affected by an emergency is a 30 × 30 cluster survey. This sampling method can be time and resource intensive and, accordingly, may not be the most appropriate one when data are needed rapidly for decision making. In this study, we compare the precision, time and cost of the 30 × 30 cluster survey with two alternative sampling designs: a 33 × 6 cluster design (33 clusters, 6 observations per cluster) and a 67 × 3 cluster design (67 clusters, 3 observations per cluster). Data for each sampling design were collected concurrently in West Darfur, Sudan in September-October 2005 in an emergency setting. Results of the study show the 30 × 30 design to provide more precise results (i.e. narrower 95% confidence intervals) than the 33 × 6 and 67 × 3 design for most child-level indicators. Exceptions are indicators of immunization and vitamin A capsule supplementation coverage which show a high intra-cluster correlation. Although the 33 × 6 and 67 × 3 designs provide wider confidence intervals than the 30 × 30 design for child anthropometric indicators, the 33 × 6 and 67 × 3 designs provide the opportunity to conduct a LQAS hypothesis test to detect whether or not a critical threshold of global acute malnutrition prevalence has been exceeded, whereas the 30 × 30 design does not. For the household-level indicators tested in this study, the 67 × 3 design provides the most precise results. However, our results show that neither the 33 × 6 nor the 67 × 3 design are appropriate for assessing indicators of mortality. In this field application, data collection for the 33 × 6 and 67 × 3 designs required substantially less time and cost than that required for the 30 × 30 design. The findings of this study suggest the 33 × 6 and 67 × 3 designs can provide useful time- and resource-saving alternatives to the 30 × 30 method of data collection in emergency settings. PMID:18454866

  12. Source Identification of PM2.5 in Steubenville, Ohio Using a Hybrid Method for Highly Time-resolved Data

    EPA Science Inventory

    A new source-type identification method, Reduction and Species Clustering Using Episodes (ReSCUE), was developed to exploit the temporal synchronicity between species to form clusters of species that vary together. High time-resolution (30 min) PM2.5 sampling was condu...

  13. Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

    PubMed

    Ruzik, L; Obarski, N; Papierz, A; Mojski, M

    2015-06-01

    High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  14. A clustering algorithm for sample data based on environmental pollution characteristics

    NASA Astrophysics Data System (ADS)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  15. Fingerprint analysis of Hibiscus mutabilis L. leaves based on ultra performance liquid chromatography with photodiode array detector combined with similarity analysis and hierarchical clustering analysis methods

    PubMed Central

    Liang, Xianrui; Ma, Meiling; Su, Weike

    2013-01-01

    Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD) combined with similarity analysis (SA) and hierarchical clustering analysis (HCA). Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs) of the relative retention times (RRT) and relative peak areas (RPA) of 10 characteristic peaks (one of them was identified as rutin) in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent. PMID:23930008

  16. Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimer’s Disease

    DTIC Science & Technology

    2013-10-01

    correct group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...centering of log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met...A) The 108 differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with

  17. Empirical entropic contributions in computational docking: evaluation in APS reductase complexes.

    PubMed

    Chang, Max W; Belew, Richard K; Carroll, Kate S; Olson, Arthur J; Goodsell, David S

    2008-08-01

    The results from reiterated docking experiments may be used to evaluate an empirical vibrational entropy of binding in ligand-protein complexes. We have tested several methods for evaluating the vibrational contribution to binding of 22 nucleotide analogues to the enzyme APS reductase. These include two cluster size methods that measure the probability of finding a particular conformation, a method that estimates the extent of the local energetic well by looking at the scatter of conformations within clustered results, and an RMSD-based method that uses the overall scatter and clustering of all conformations. We have also directly characterized the local energy landscape by randomly sampling around docked conformations. The simple cluster size method shows the best performance, improving the identification of correct conformations in multiple docking experiments. 2008 Wiley Periodicals, Inc.

  18. A comparison of confidence interval methods for the intraclass correlation coefficient in community-based cluster randomization trials with a binary outcome.

    PubMed

    Braschel, Melissa C; Svec, Ivana; Darlington, Gerarda A; Donner, Allan

    2016-04-01

    Many investigators rely on previously published point estimates of the intraclass correlation coefficient rather than on their associated confidence intervals to determine the required size of a newly planned cluster randomized trial. Although confidence interval methods for the intraclass correlation coefficient that can be applied to community-based trials have been developed for a continuous outcome variable, fewer methods exist for a binary outcome variable. The aim of this study is to evaluate confidence interval methods for the intraclass correlation coefficient applied to binary outcomes in community intervention trials enrolling a small number of large clusters. Existing methods for confidence interval construction are examined and compared to a new ad hoc approach based on dividing clusters into a large number of smaller sub-clusters and subsequently applying existing methods to the resulting data. Monte Carlo simulation is used to assess the width and coverage of confidence intervals for the intraclass correlation coefficient based on Smith's large sample approximation of the standard error of the one-way analysis of variance estimator, an inverted modified Wald test for the Fleiss-Cuzick estimator, and intervals constructed using a bootstrap-t applied to a variance-stabilizing transformation of the intraclass correlation coefficient estimate. In addition, a new approach is applied in which clusters are randomly divided into a large number of smaller sub-clusters with the same methods applied to these data (with the exception of the bootstrap-t interval, which assumes large cluster sizes). These methods are also applied to a cluster randomized trial on adolescent tobacco use for illustration. When applied to a binary outcome variable in a small number of large clusters, existing confidence interval methods for the intraclass correlation coefficient provide poor coverage. However, confidence intervals constructed using the new approach combined with Smith's method provide nominal or close to nominal coverage when the intraclass correlation coefficient is small (<0.05), as is the case in most community intervention trials. This study concludes that when a binary outcome variable is measured in a small number of large clusters, confidence intervals for the intraclass correlation coefficient may be constructed by dividing existing clusters into sub-clusters (e.g. groups of 5) and using Smith's method. The resulting confidence intervals provide nominal or close to nominal coverage across a wide range of parameters when the intraclass correlation coefficient is small (<0.05). Application of this method should provide investigators with a better understanding of the uncertainty associated with a point estimator of the intraclass correlation coefficient used for determining the sample size needed for a newly designed community-based trial. © The Author(s) 2015.

  19. An adaptive two-stage sequential design for sampling rare and clustered populations

    USGS Publications Warehouse

    Brown, J.A.; Salehi, M.M.; Moradi, M.; Bell, G.; Smith, D.R.

    2008-01-01

    How to design an efficient large-area survey continues to be an interesting question for ecologists. In sampling large areas, as is common in environmental studies, adaptive sampling can be efficient because it ensures survey effort is targeted to subareas of high interest. In two-stage sampling, higher density primary sample units are usually of more interest than lower density primary units when populations are rare and clustered. Two-stage sequential sampling has been suggested as a method for allocating second stage sample effort among primary units. Here, we suggest a modification: adaptive two-stage sequential sampling. In this method, the adaptive part of the allocation process means the design is more flexible in how much extra effort can be directed to higher-abundance primary units. We discuss how best to design an adaptive two-stage sequential sample. ?? 2008 The Society of Population Ecology and Springer.

  20. Automated image analysis for quantitative fluorescence in situ hybridization with environmental samples.

    PubMed

    Zhou, Zhi; Pons, Marie Noëlle; Raskin, Lutgarde; Zilles, Julie L

    2007-05-01

    When fluorescence in situ hybridization (FISH) analyses are performed with complex environmental samples, difficulties related to the presence of microbial cell aggregates and nonuniform background fluorescence are often encountered. The objective of this study was to develop a robust and automated quantitative FISH method for complex environmental samples, such as manure and soil. The method and duration of sample dispersion were optimized to reduce the interference of cell aggregates. An automated image analysis program that detects cells from 4',6'-diamidino-2-phenylindole (DAPI) micrographs and extracts the maximum and mean fluorescence intensities for each cell from corresponding FISH images was developed with the software Visilog. Intensity thresholds were not consistent even for duplicate analyses, so alternative ways of classifying signals were investigated. In the resulting method, the intensity data were divided into clusters using fuzzy c-means clustering, and the resulting clusters were classified as target (positive) or nontarget (negative). A manual quality control confirmed this classification. With this method, 50.4, 72.1, and 64.9% of the cells in two swine manure samples and one soil sample, respectively, were positive as determined with a 16S rRNA-targeted bacterial probe (S-D-Bact-0338-a-A-18). Manual counting resulted in corresponding values of 52.3, 70.6, and 61.5%, respectively. In two swine manure samples and one soil sample 21.6, 12.3, and 2.5% of the cells were positive with an archaeal probe (S-D-Arch-0915-a-A-20), respectively. Manual counting resulted in corresponding values of 22.4, 14.0, and 2.9%, respectively. This automated method should facilitate quantitative analysis of FISH images for a variety of complex environmental samples.

  1. The cluster-cluster correlation function. [of galaxies

    NASA Technical Reports Server (NTRS)

    Postman, M.; Geller, M. J.; Huchra, J. P.

    1986-01-01

    The clustering properties of the Abell and Zwicky cluster catalogs are studied using the two-point angular and spatial correlation functions. The catalogs are divided into eight subsamples to determine the dependence of the correlation function on distance, richness, and the method of cluster identification. It is found that the Corona Borealis supercluster contributes significant power to the spatial correlation function to the Abell cluster sample with distance class of four or less. The distance-limited catalog of 152 Abell clusters, which is not greatly affected by a single system, has a spatial correlation function consistent with the power law Xi(r) = 300r exp -1.8. In both the distance class four or less and distance-limited samples the signal in the spatial correlation function is a power law detectable out to 60/h Mpc. The amplitude of Xi(r) for clusters of richness class two is about three times that for richness class one clusters. The two-point spatial correlation function is sensitive to the use of estimated redshifts.

  2. Coarse Point Cloud Registration by Egi Matching of Voxel Clusters

    NASA Astrophysics Data System (ADS)

    Wang, Jinhu; Lindenbergh, Roderik; Shen, Yueqian; Menenti, Massimo

    2016-06-01

    Laser scanning samples the surface geometry of objects efficiently and records versatile information as point clouds. However, often more scans are required to fully cover a scene. Therefore, a registration step is required that transforms the different scans into a common coordinate system. The registration of point clouds is usually conducted in two steps, i.e. coarse registration followed by fine registration. In this study an automatic marker-free coarse registration method for pair-wise scans is presented. First the two input point clouds are re-sampled as voxels and dimensionality features of the voxels are determined by principal component analysis (PCA). Then voxel cells with the same dimensionality are clustered. Next, the Extended Gaussian Image (EGI) descriptor of those voxel clusters are constructed using significant eigenvectors of each voxel in the cluster. Correspondences between clusters in source and target data are obtained according to the similarity between their EGI descriptors. The random sampling consensus (RANSAC) algorithm is employed to remove outlying correspondences until a coarse alignment is obtained. If necessary, a fine registration is performed in a final step. This new method is illustrated on scan data sampling two indoor scenarios. The results of the tests are evaluated by computing the point to point distance between the two input point clouds. The presented two tests resulted in mean distances of 7.6 mm and 9.5 mm respectively, which are adequate for fine registration.

  3. A method for determining the radius of an open cluster from stellar proper motions

    NASA Astrophysics Data System (ADS)

    Sánchez, Néstor; Alfaro, Emilio J.; López-Martínez, Fátima

    2018-04-01

    We propose a method for calculating the radius of an open cluster in an objective way from an astrometric catalogue containing, at least, positions and proper motions. It uses the minimum spanning tree in the proper motion space to discriminate cluster stars from field stars and it quantifies the strength of the cluster-field separation by means of a statistical parameter defined for the first time in this paper. This is done for a range of different sampling radii from where the cluster radius is obtained as the size at which the best cluster-field separation is achieved. The novelty of this strategy is that the cluster radius is obtained independently of how its stars are spatially distributed. We test the reliability and robustness of the method with both simulated and real data from a well-studied open cluster (NGC 188), and apply it to UCAC4 data for five other open clusters with different catalogued radius values. NGC 188, NGC 1647, NGC 6603, and Ruprecht 155 yielded unambiguous radius values of 15.2 ± 1.8, 29.4 ± 3.4, 4.2 ± 1.7, and 7.0 ± 0.3 arcmin, respectively. ASCC 19 and Collinder 471 showed more than one possible solution, but it is not possible to know whether this is due to the involved uncertainties or due to the presence of complex patterns in their proper motion distributions, something that could be inherent to the physical object or due to the way in which the catalogue was sampled.

  4. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  5. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  6. Topology in two dimensions. II - The Abell and ACO cluster catalogues

    NASA Astrophysics Data System (ADS)

    Plionis, Manolis; Valdarnini, Riccardo; Coles, Peter

    1992-09-01

    We apply a method for quantifying the topology of projected galaxy clustering to the Abell and ACO catalogues of rich clusters. We use numerical simulations to quantify the statistical bias involved in using high peaks to define the large-scale structure, and we use the results obtained to correct our observational determinations for this known selection effect and also for possible errors introduced by boundary effects. We find that the Abell cluster sample is consistent with clusters being identified with high peaks of a Gaussian random field, but that the ACO shows a slight meatball shift away from the Gaussian behavior over and above that expected purely from the high-peak selection. The most conservative explanation of this effect is that it is caused by some artefact of the procedure used to select the clusters in the two samples.

  7. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-11-01

    In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.

  8. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  9. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa

    PubMed Central

    Petegrosso, Raphael; Tolar, Jakub

    2018-01-01

    Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. PMID:29630593

  10. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap.

    PubMed

    Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trivellore E

    2016-06-01

    Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.

  11. Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap

    PubMed Central

    Zhou, Hanzhi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure. PMID:29226161

  12. Identifying sighting clusters of endangered taxa with historical records.

    PubMed

    Duffy, Karl J

    2011-04-01

    The probability and time of extinction of taxa is often inferred from statistical analyses of historical records. Many of these analyses require the exclusion of multiple records within a unit of time (i.e., a month or a year). Nevertheless, spatially explicit, temporally aggregated data may be useful for identifying clusters of sightings (i.e., sighting clusters) in space and time. Identification of sighting clusters highlights changes in the historical recording of endangered taxa. I used two methods to identify sighting clusters in historical records: the Ederer-Myers-Mantel (EMM) test and the space-time permutation scan (STPS). I applied these methods to the spatially explicit sighting records of three species of orchids that are listed as endangered in the Republic of Ireland under the Wildlife Act (1976): Cephalanthera longifolia, Hammarbya paludosa, and Pseudorchis albida. Results with the EMM test were strongly affected by the choice of the time interval, and thus the number of temporal samples, used to examine the records. For example, sightings of P. albida clustered when the records were partitioned into 20-year temporal samples, but not when they were partitioned into 22-year temporal samples. Because the statistical power of EMM was low, it will not be useful when data are sparse. Nevertheless, the STPS identified regions that contained sighting clusters because it uses a flexible scanning window (defined by cylinders of varying size that move over the study area and evaluate the likelihood of clustering) to detect them, and it identified regions with high and regions with low rates of orchid sightings. The STPS analyses can be used to detect sighting clusters of endangered species that may be related to regions of extirpation and may assist in the categorization of threat status. ©2010 Society for Conservation Biology.

  13. Quality of reporting of pilot and feasibility cluster randomised trials: a systematic review

    PubMed Central

    Chan, Claire L; Leyrat, Clémence; Eldridge, Sandra M

    2017-01-01

    Objectives To systematically review the quality of reporting of pilot and feasibility of cluster randomised trials (CRTs). In particular, to assess (1) the number of pilot CRTs conducted between 1 January 2011 and 31 December 2014, (2) whether objectives and methods are appropriate and (3) reporting quality. Methods We searched PubMed (2011–2014) for CRTs with ‘pilot’ or ‘feasibility’ in the title or abstract; that were assessing some element of feasibility and showing evidence the study was in preparation for a main effectiveness/efficacy trial. Quality assessment criteria were based on the Consolidated Standards of Reporting Trials (CONSORT) extensions for pilot trials and CRTs. Results Eighteen pilot CRTs were identified. Forty-four per cent did not have feasibility as their primary objective, and many (50%) performed formal hypothesis testing for effectiveness/efficacy despite being underpowered. Most (83%) included ‘pilot’ or ‘feasibility’ in the title, and discussed implications for progression from the pilot to the future definitive trial (89%), but fewer reported reasons for the randomised pilot trial (39%), sample size rationale (44%) or progression criteria (17%). Most defined the cluster (100%), and number of clusters randomised (94%), but few reported how the cluster design affected sample size (17%), whether consent was sought from clusters (11%), or who enrolled clusters (17%). Conclusions That only 18 pilot CRTs were identified necessitates increased awareness of the importance of conducting and publishing pilot CRTs and improved reporting. Pilot CRTs should primarily be assessing feasibility, avoiding formal hypothesis testing for effectiveness/efficacy and reporting reasons for the pilot, sample size rationale and progression criteria, as well as enrolment of clusters, and how the cluster design affects design aspects. We recommend adherence to the CONSORT extensions for pilot trials and CRTs. PMID:29122791

  14. Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects

    PubMed Central

    Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

    2017-01-01

    An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called “cluster randomization”). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies. PMID:28584874

  15. Cluster-randomized Studies in Educational Research: Principles and Methodological Aspects.

    PubMed

    Dreyhaupt, Jens; Mayer, Benjamin; Keis, Oliver; Öchsner, Wolfgang; Muche, Rainer

    2017-01-01

    An increasing number of studies are being performed in educational research to evaluate new teaching methods and approaches. These studies could be performed more efficiently and deliver more convincing results if they more strictly applied and complied with recognized standards of scientific studies. Such an approach could substantially increase the quality in particular of prospective, two-arm (intervention) studies that aim to compare two different teaching methods. A key standard in such studies is randomization, which can minimize systematic bias in study findings; such bias may result if the two study arms are not structurally equivalent. If possible, educational research studies should also achieve this standard, although this is not yet generally the case. Some difficulties and concerns exist, particularly regarding organizational and methodological aspects. An important point to consider in educational research studies is that usually individuals cannot be randomized, because of the teaching situation, and instead whole groups have to be randomized (so-called "cluster randomization"). Compared with studies with individual randomization, studies with cluster randomization normally require (significantly) larger sample sizes and more complex methods for calculating sample size. Furthermore, cluster-randomized studies require more complex methods for statistical analysis. The consequence of the above is that a competent expert with respective special knowledge needs to be involved in all phases of cluster-randomized studies. Studies to evaluate new teaching methods need to make greater use of randomization in order to achieve scientifically convincing results. Therefore, in this article we describe the general principles of cluster randomization and how to implement these principles, and we also outline practical aspects of using cluster randomization in prospective, two-arm comparative educational research studies.

  16. Thin-layer chromatographic identification of Chinese propolis using chemometric fingerprinting.

    PubMed

    Tang, Tie-xin; Guo, Wei-yan; Xu, Ye; Zhang, Si-ming; Xu, Xin-jun; Wang, Dong-mei; Zhao, Zhi-min; Zhu, Long-ping; Yang, De-po

    2014-01-01

    Poplar tree gum has a similar chemical composition and appearance to Chinese propolis (bee glue) and has been widely used as a counterfeit propolis because Chinese propolis is typically the poplar-type propolis, the chemical composition of which is determined mainly by the resin of poplar trees. The discrimination of Chinese propolis from poplar tree gum is a challenging task. To develop a rapid thin-layer chromatographic (TLC) identification method using chemometric fingerprinting to discriminate Chinese propolis from poplar tree gum. A new TLC method using a combination of ammonia and hydrogen peroxide vapours as the visualisation reagent was developed to characterise the chemical profile of Chinese propolis. Three separate people performed TLC on eight Chinese propolis samples and three poplar tree gum samples of varying origins. Five chemometric methods, including similarity analysis, hierarchical clustering, k-means clustering, neural network and support vector machine, were compared for use in classifying the samples based on their densitograms obtained from the TLC chromatograms via image analysis. Hierarchical clustering, neural network and support vector machine analyses achieved a correct classification rate of 100% in classifying the samples. A strategy for TLC identification of Chinese propolis using chemometric fingerprinting was proposed and it provided accurate sample classification. The study has shown that the TLC identification method using chemometric fingerprinting is a rapid, low-cost method for the discrimination of Chinese propolis from poplar tree gum and may be used for the quality control of Chinese propolis. Copyright © 2014 John Wiley & Sons, Ltd.

  17. Cooling rate dependence of structural order in Al90Sm10 metallic glass

    NASA Astrophysics Data System (ADS)

    Sun, Yang; Zhang, Yue; Zhang, Feng; Ye, Zhuo; Ding, Zejun; Wang, Cai-Zhuang; Ho, Kai-Ming

    2016-07-01

    The atomic structure of Al90Sm10 metallic glass is studied using molecular dynamics simulations. By performing a long sub-Tg annealing, we developed a glass model closer to the experiments than the models prepared by continuous cooling. Using the cluster alignment method, we found that "3661" cluster is the dominating short-range order in the glass samples. The connection and arrangement of "3661" clusters, which define the medium-range order in the system, are enhanced significantly in the sub-Tg annealed sample as compared with the fast cooled glass samples. Unlike some strong binary glass formers such as Cu64.5Zr35.5, the clusters representing the short-range order do not form an interconnected interpenetrating network in Al90Sm10, which has only marginal glass formability.

  18. Two-Phase and Graph-Based Clustering Methods for Accurate and Efficient Segmentation of Large Mass Spectrometry Images.

    PubMed

    Dexter, Alex; Race, Alan M; Steven, Rory T; Barnes, Jennifer R; Hulme, Heather; Goodwin, Richard J A; Styles, Iain B; Bunch, Josephine

    2017-11-07

    Clustering is widely used in MSI to segment anatomical features and differentiate tissue types, but existing approaches are both CPU and memory-intensive, limiting their application to small, single data sets. We propose a new approach that uses a graph-based algorithm with a two-phase sampling method that overcomes this limitation. We demonstrate the algorithm on a range of sample types and show that it can segment anatomical features that are not identified using commonly employed algorithms in MSI, and we validate our results on synthetic MSI data. We show that the algorithm is robust to fluctuations in data quality by successfully clustering data with a designed-in variance using data acquired with varying laser fluence. Finally, we show that this method is capable of generating accurate segmentations of large MSI data sets acquired on the newest generation of MSI instruments and evaluate these results by comparison with histopathology.

  19. Segmentation and clustering as complementary sources of information

    NASA Astrophysics Data System (ADS)

    Dale, Michael B.; Allison, Lloyd; Dale, Patricia E. R.

    2007-03-01

    This paper examines the effects of using a segmentation method to identify change-points or edges in vegetation. It identifies coherence (spatial or temporal) in place of unconstrained clustering. The segmentation method involves change-point detection along a sequence of observations so that each cluster formed is composed of adjacent samples; this is a form of constrained clustering. The protocol identifies one or more models, one for each section identified, and the quality of each is assessed using a minimum message length criterion, which provides a rational basis for selecting an appropriate model. Although the segmentation is less efficient than clustering, it does provide other information because it incorporates textural similarity as well as homogeneity. In addition it can be useful in determining various scales of variation that may apply to the data, providing a general method of small-scale pattern analysis.

  20. Quantum structural fluctuation in para-hydrogen clusters revealed by the variational path integral method

    NASA Astrophysics Data System (ADS)

    Miura, Shinichi

    2018-03-01

    In this paper, the ground state of para-hydrogen clusters for size regime N ≤ 40 has been studied by our variational path integral molecular dynamics method. Long molecular dynamics calculations have been performed to accurately evaluate ground state properties. The chemical potential of the hydrogen molecule is found to have a zigzag size dependence, indicating the magic number stability for the clusters of the size N = 13, 26, 29, 34, and 39. One-body density of the hydrogen molecule is demonstrated to have a structured profile, not a melted one. The observed magic number stability is examined using the inherent structure analysis. We also have developed a novel method combining our variational path integral hybrid Monte Carlo method with the replica exchange technique. We introduce replicas of the original system bridging from the structured to the melted cluster, which is realized by scaling the potential energy of the system. Using the enhanced sampling method, the clusters are demonstrated to have the structured density profile in the ground state.

  1. Quantum structural fluctuation in para-hydrogen clusters revealed by the variational path integral method.

    PubMed

    Miura, Shinichi

    2018-03-14

    In this paper, the ground state of para-hydrogen clusters for size regime N ≤ 40 has been studied by our variational path integral molecular dynamics method. Long molecular dynamics calculations have been performed to accurately evaluate ground state properties. The chemical potential of the hydrogen molecule is found to have a zigzag size dependence, indicating the magic number stability for the clusters of the size N = 13, 26, 29, 34, and 39. One-body density of the hydrogen molecule is demonstrated to have a structured profile, not a melted one. The observed magic number stability is examined using the inherent structure analysis. We also have developed a novel method combining our variational path integral hybrid Monte Carlo method with the replica exchange technique. We introduce replicas of the original system bridging from the structured to the melted cluster, which is realized by scaling the potential energy of the system. Using the enhanced sampling method, the clusters are demonstrated to have the structured density profile in the ground state.

  2. Substructures in Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Lehodey, Brigitte Tome

    2000-01-01

    This dissertation presents two methods for the detection of substructures in clusters of galaxies and the results of their application to a group of four clusters. In chapters 2 and 3, we remember the main properties of clusters of galaxies and give the definition of substructures. We also try to show why the study of substructures in clusters of galaxies is so important for Cosmology. Chapters 4 and 5 describe these two methods, the first one, the adaptive Kernel, is applied to the study of the spatial and kinematical distribution of the cluster galaxies. The second one, the MVM (Multiscale Vision Model), is applied to analyse the cluster diffuse X-ray emission, i.e., the intracluster gas distribution. At the end of these two chapters, we also present the results of the application of these methods to our sample of clusters. In chapter 6, we draw the conclusions from the comparison of the results we obtain with each method. In the last chapter, we present the main conclusions of this work trying to point out possible developments. We close with two appendices in which we detail some questions raised in this work not directly linked to the problem of substructures detection.

  3. Cluster lot quality assurance sampling: effect of increasing the number of clusters on classification precision and operational feasibility.

    PubMed

    Okayasu, Hiromasa; Brown, Alexandra E; Nzioki, Michael M; Gasasira, Alex N; Takane, Marina; Mkanda, Pascal; Wassilak, Steven G F; Sutter, Roland W

    2014-11-01

    To assess the quality of supplementary immunization activities (SIAs), the Global Polio Eradication Initiative (GPEI) has used cluster lot quality assurance sampling (C-LQAS) methods since 2009. However, since the inception of C-LQAS, questions have been raised about the optimal balance between operational feasibility and precision of classification of lots to identify areas with low SIA quality that require corrective programmatic action. To determine if an increased precision in classification would result in differential programmatic decision making, we conducted a pilot evaluation in 4 local government areas (LGAs) in Nigeria with an expanded LQAS sample size of 16 clusters (instead of the standard 6 clusters) of 10 subjects each. The results showed greater heterogeneity between clusters than the assumed standard deviation of 10%, ranging from 12% to 23%. Comparing the distribution of 4-outcome classifications obtained from all possible combinations of 6-cluster subsamples to the observed classification of the 16-cluster sample, we obtained an exact match in classification in 56% to 85% of instances. We concluded that the 6-cluster C-LQAS provides acceptable classification precision for programmatic action. Considering the greater resources required to implement an expanded C-LQAS, the improvement in precision was deemed insufficient to warrant the effort. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  4. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets

    PubMed Central

    Wernisch, Lorenz

    2017-01-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm. PMID:29036190

  5. Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.

    PubMed

    Gabasova, Evelina; Reid, John; Wernisch, Lorenz

    2017-10-01

    Integrative clustering is used to identify groups of samples by jointly analysing multiple datasets describing the same set of biological samples, such as gene expression, copy number, methylation etc. Most existing algorithms for integrative clustering assume that there is a shared consistent set of clusters across all datasets, and most of the data samples follow this structure. However in practice, the structure across heterogeneous datasets can be more varied, with clusters being joined in some datasets and separated in others. In this paper, we present a probabilistic clustering method to identify groups across datasets that do not share the same cluster structure. The proposed algorithm, Clusternomics, identifies groups of samples that share their global behaviour across heterogeneous datasets. The algorithm models clusters on the level of individual datasets, while also extracting global structure that arises from the local cluster assignments. Clusters on both the local and the global level are modelled using a hierarchical Dirichlet mixture model to identify structure on both levels. We evaluated the model both on simulated and on real-world datasets. The simulated data exemplifies datasets with varying degrees of common structure. In such a setting Clusternomics outperforms existing algorithms for integrative and consensus clustering. In a real-world application, we used the algorithm for cancer subtyping, identifying subtypes of cancer from heterogeneous datasets. We applied the algorithm to TCGA breast cancer dataset, integrating gene expression, miRNA expression, DNA methylation and proteomics. The algorithm extracted clinically meaningful clusters with significantly different survival probabilities. We also evaluated the algorithm on lung and kidney cancer TCGA datasets with high dimensionality, again showing clinically significant results and scalability of the algorithm.

  6. A multicomponent matched filter cluster confirmation tool for eROSITA: initial application to the RASS and DES-SV data sets

    DOE PAGES

    Klein, M.; Mohr, J. J.; Desai, S.; ...

    2017-11-14

    We describe a multi-component matched filter cluster confirmation tool (MCMF) designed for the study of large X-ray source catalogs produced by the upcoming X-ray all-sky survey mission eROSITA. We apply the method to confirm a sample of 88 clusters with redshifts $0.05

  7. A multicomponent matched filter cluster confirmation tool for eROSITA: initial application to the RASS and DES-SV data sets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klein, M.; Mohr, J. J.; Desai, S.

    We describe a multi-component matched filter cluster confirmation tool (MCMF) designed for the study of large X-ray source catalogs produced by the upcoming X-ray all-sky survey mission eROSITA. We apply the method to confirm a sample of 88 clusters with redshifts $0.05

  8. LoCuSS: THE MASS DENSITY PROFILE OF MASSIVE GALAXY CLUSTERS AT z = 0.2 {sup ,}

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Okabe, Nobuhiro; Umetsu, Keiichi; Smith, Graham P.

    We present a stacked weak-lensing analysis of an approximately mass-selected sample of 50 galaxy clusters at 0.15 < z < 0.3, based on observations with Suprime-Cam on the Subaru Telescope. We develop a new method for selecting lensed background galaxies from which we estimate that our sample of red background galaxies suffers just 1% contamination. We detect the stacked tangential shear signal from the full sample of 50 clusters, based on this red sample of background galaxies, at a total signal-to-noise ratio of 32.7. The Navarro-Frenk-White model is an excellent fit to the data, yielding sub-10% statistical precision on massmore » and concentration: M{sub vir}=7.19{sup +0.53}{sub -0.50} Multiplication-Sign 10{sup 14} h{sup -1} M{sub sun}, c{sub vir}=5.41{sup +0.49}{sub -0.45} (c{sub 200}=4.22{sup +0.40}{sub -0.36}). Tests of a range of possible systematic errors, including shear calibration and stacking-related issues, indicate that they are subdominant to the statistical errors. The concentration parameter obtained from stacking our approximately mass-selected cluster sample is broadly in line with theoretical predictions. Moreover, the uncertainty on our measurement is comparable with the differences between the different predictions in the literature. Overall, our results highlight the potential for stacked weak-lensing methods to probe the mean mass density profile of cluster-scale dark matter halos with upcoming surveys, including Hyper-Suprime-Cam, Dark Energy Survey, and KIDS.« less

  9. The MUSE-Wide survey: detection of a clustering signal from Lyman α emitters in the range 3 < z < 6

    NASA Astrophysics Data System (ADS)

    Diener, C.; Wisotzki, L.; Schmidt, K. B.; Herenz, E. C.; Urrutia, T.; Garel, T.; Kerutt, J.; Saust, R. L.; Bacon, R.; Cantalupo, S.; Contini, T.; Guiderdoni, B.; Marino, R. A.; Richard, J.; Schaye, J.; Soucail, G.; Weilbacher, P. M.

    2017-11-01

    We present a clustering analysis of a sample of 238 Ly α emitters at redshift 3 ≲ z ≲ 6 from the MUSE-Wide survey. This survey mosaics extragalactic legacy fields with 1h MUSE pointings to detect statistically relevant samples of emission line galaxies. We analysed the first year observations from MUSE-Wide making use of the clustering signal in the line-of-sight direction. This method relies on comparing pair-counts at close redshifts for a fixed transverse distance and thus exploits the full potential of the redshift range covered by our sample. A clear clustering signal with a correlation length of r0=2.9^{+1.0}_{-1.1} Mpc (comoving) is detected. Whilst this result is based on only about a quarter of the full survey size, it already shows the immense potential of MUSE for efficiently observing and studying the clustering of Ly α emitters.

  10. Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)

    PubMed Central

    DeMaere, Matthew Z.

    2016-01-01

    Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713

  11. Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

    PubMed

    Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

    2018-06-07

    Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Interest of LQAS method in a survey of HTLV-I infection in Benin (West Africa).

    PubMed

    Houinato, Dismand; Preux, Pierre-Marie; Charriere, Bénédicte; Massit, Bruno; Avodé, Gilbert; Denis, François; Dumas, Michel; Boutros-Toni, Fernand; Salamon, Roger

    2002-02-01

    HTLV-I is heterogeneously distributed in Sub-Saharan Africa. Traditional survey methods as cluster sampling could provide information for a country or region of interest. However, they cannot identify small areas with higher prevalences of infection to help in the health policy planning. Identification of such areas could be done by a Lot Quality Assurance Sampling (LQAS) method, which is currently used in industry to identify a poor performance in assembly lines. The LQAS method was used in Atacora (Northern Benin) between March and May 1998 to identify areas with a HTLV-I seroprevalence higher than 4%. Sixty-five subjects were randomly selected in each of 36 communes (lots) of this department. Lots were classified as unacceptable when the sample contained at least one positive subject. The LQAS method identified 25 (69.4 %) communes with a prevalence higher than 4%. Using stratified sampling theory, the overall HTLV-I seroprevalence was 4.5% (95% CI: 3.6-5.4%). These data show the interest of LQAS method application under field conditions to detect clusters of infection.

  13. Adaptive cluster sampling: An efficient method for assessing inconspicuous species

    Treesearch

    Andrea M. Silletti; Joan Walker

    2003-01-01

    Restorationistis typically evaluate the success of a project by estimating the population sizes of species that have been planted or seeded. Because total census is raely feasible, they must rely on sampling methods for population estimates. However, traditional random sampling designs may be inefficient for species that, for one reason or another, are challenging to...

  14. Cooling rate dependence of structural order in Al 90Sm 10 metallic glass

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sun, Yang; Zhang, Yue; Zhang, Feng

    2016-07-07

    Here, the atomic structure of Al 90Sm 10 metallic glass is studied using molecular dynamics simulations. By performing a long sub-T g annealing, we developed a glass model closer to the experiments than the models prepared by continuous cooling. Using the cluster alignment method, we found that “3661” cluster is the dominating short-range order in the glass samples. The connection and arrangement of “3661” clusters, which define the medium-range order in the system, are enhanced significantly in the sub-T g annealed sample as compared with the fast cooled glass samples. Unlike some strong binary glass formers such as Cu 64.5Zrmore » 35.5, the clusters representing the short-range order do not form an interconnected interpenetrating network in Al 90Sm 10, which has only marginal glass formability.« less

  15. Cooling rate dependence of structural order in Al{sub 90}Sm{sub 10} metallic glass

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sun, Yang; Ames Laboratory, US Department of Energy, Ames, Iowa 50011; Zhang, Yue

    2016-07-07

    The atomic structure of Al{sub 90}Sm{sub 10} metallic glass is studied using molecular dynamics simulations. By performing a long sub-T{sub g} annealing, we developed a glass model closer to the experiments than the models prepared by continuous cooling. Using the cluster alignment method, we found that “3661” cluster is the dominating short-range order in the glass samples. The connection and arrangement of “3661” clusters, which define the medium-range order in the system, are enhanced significantly in the sub-T{sub g} annealed sample as compared with the fast cooled glass samples. Unlike some strong binary glass formers such as Cu{sub 64.5}Zr{sub 35.5},more » the clusters representing the short-range order do not form an interconnected interpenetrating network in Al{sub 90}Sm{sub 10,} which has only marginal glass formability.« less

  16. Network visualization of conformational sampling during molecular dynamics simulation.

    PubMed

    Ahlstrom, Logan S; Baker, Joseph Lee; Ehrlich, Kent; Campbell, Zachary T; Patel, Sunita; Vorontsov, Ivan I; Tama, Florence; Miyashita, Osamu

    2013-11-01

    Effective data reduction methods are necessary for uncovering the inherent conformational relationships present in large molecular dynamics (MD) trajectories. Clustering algorithms provide a means to interpret the conformational sampling of molecules during simulation by grouping trajectory snapshots into a few subgroups, or clusters, but the relationships between the individual clusters may not be readily understood. Here we show that network analysis can be used to visualize the dominant conformational states explored during simulation as well as the connectivity between them, providing a more coherent description of conformational space than traditional clustering techniques alone. We compare the results of network visualization against 11 clustering algorithms and principal component conformer plots. Several MD simulations of proteins undergoing different conformational changes demonstrate the effectiveness of networks in reaching functional conclusions. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Milestoning with coarse memory

    NASA Astrophysics Data System (ADS)

    Hawk, Alexander T.

    2013-04-01

    Milestoning is a method used to calculate the kinetics of molecular processes occurring on timescales inaccessible to traditional molecular dynamics (MD) simulations. In the method, the phase space of the system is partitioned by milestones (hypersurfaces), trajectories are initialized on each milestone, and short MD simulations are performed to calculate transitions between neighboring milestones. Long trajectories of the system are then reconstructed with a semi-Markov process from the observed statistics of transition. The procedure is typically justified by the assumption that trajectories lose memory between crossing successive milestones. Here we present Milestoning with Coarse Memory (MCM), a generalization of Milestoning that relaxes the memory loss assumption of conventional Milestoning. In the method, milestones are defined and sample transitions are calculated in the standard Milestoning way. Then, after it is clear where trajectories sample milestones, the milestones are broken up into distinct neighborhoods (clusters), and each sample transition is associated with two clusters: the cluster containing the coordinates the trajectory was initialized in, and the cluster (on the terminal milestone) containing trajectory's final coordinates. Long trajectories of the system are then reconstructed with a semi-Markov process in an extended state space built from milestone and cluster indices. To test the method, we apply it to a process that is particularly ill suited for Milestoning: the dynamics of a polymer confined to a narrow cylinder. We show that Milestoning calculations of both the mean first passage time and the mean transit time of reversal—which occurs when the end-to-end vector reverses direction—are significantly improved when MCM is applied. Finally, we note the overhead of performing MCM on top of conventional Milestoning is negligible.

  18. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.

    PubMed

    Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin

    2015-12-01

    One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .

  19. Detecting synchronization clusters in multivariate time series via coarse-graining of Markov chains.

    PubMed

    Allefeld, Carsten; Bialonski, Stephan

    2007-12-01

    Synchronization cluster analysis is an approach to the detection of underlying structures in data sets of multivariate time series, starting from a matrix R of bivariate synchronization indices. A previous method utilized the eigenvectors of R for cluster identification, analogous to several recent attempts at group identification using eigenvectors of the correlation matrix. All of these approaches assumed a one-to-one correspondence of dominant eigenvectors and clusters, which has however been shown to be wrong in important cases. We clarify the usefulness of eigenvalue decomposition for synchronization cluster analysis by translating the problem into the language of stochastic processes, and derive an enhanced clustering method harnessing recent insights from the coarse-graining of finite-state Markov processes. We illustrate the operation of our method using a simulated system of coupled Lorenz oscillators, and we demonstrate its superior performance over the previous approach. Finally we investigate the question of robustness of the algorithm against small sample size, which is important with regard to field applications.

  20. An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

    USGS Publications Warehouse

    McKenna, J.E.

    2003-01-01

    The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.

  1. Differences in soil biological activity by terrain types at the sub-field scale in central Iowa US

    DOE PAGES

    Kaleita, Amy L.; Schott, Linda R.; Hargreaves, Sarah K.; ...

    2017-07-07

    Soil microbial communities are structured by biogeochemical processes that occur at many different spatial scales, which makes soil sampling difficult. Because soil microbial communities are important in nutrient cycling and soil fertility, it is important to understand how microbial communities function within the heterogeneous soil landscape. In this study, a self-organizing map was used to determine whether landscape data can be used to characterize the distribution of microbial biomass and activity in order to provide an improved understanding of soil microbial community function. Points within a row crop field in south-central Iowa were clustered via a self-organizing map using sixmore » landscape properties into three separate landscape clusters. Twelve sampling locations per cluster were chosen for a total of 36 locations. After the soil samples were collected, the samples were then analysed for various metabolic indicators, such as nitrogen and carbon mineralization, extractable organic carbon, microbial biomass, etc. It was found that sampling locations located in the potholes and toe slope positions had significantly greater microbial biomass nitrogen and carbon, total carbon, total nitrogen and extractable organic carbon than the other two landscape position clusters, while locations located on the upslope did not differ significantly from the other landscape clusters. However, factors such as nitrate, ammonia, and nitrogen and carbon mineralization did not differ significantly across the landscape. Altogether, this research demonstrates the effectiveness of a terrain-based clustering method for guiding soil sampling of microbial communities.« less

  2. Differences in soil biological activity by terrain types at the sub-field scale in central Iowa US

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaleita, Amy L.; Schott, Linda R.; Hargreaves, Sarah K.

    Soil microbial communities are structured by biogeochemical processes that occur at many different spatial scales, which makes soil sampling difficult. Because soil microbial communities are important in nutrient cycling and soil fertility, it is important to understand how microbial communities function within the heterogeneous soil landscape. In this study, a self-organizing map was used to determine whether landscape data can be used to characterize the distribution of microbial biomass and activity in order to provide an improved understanding of soil microbial community function. Points within a row crop field in south-central Iowa were clustered via a self-organizing map using sixmore » landscape properties into three separate landscape clusters. Twelve sampling locations per cluster were chosen for a total of 36 locations. After the soil samples were collected, the samples were then analysed for various metabolic indicators, such as nitrogen and carbon mineralization, extractable organic carbon, microbial biomass, etc. It was found that sampling locations located in the potholes and toe slope positions had significantly greater microbial biomass nitrogen and carbon, total carbon, total nitrogen and extractable organic carbon than the other two landscape position clusters, while locations located on the upslope did not differ significantly from the other landscape clusters. However, factors such as nitrate, ammonia, and nitrogen and carbon mineralization did not differ significantly across the landscape. Altogether, this research demonstrates the effectiveness of a terrain-based clustering method for guiding soil sampling of microbial communities.« less

  3. Polymorphism in magic-sized Au144(SR)60 clusters

    NASA Astrophysics Data System (ADS)

    Jensen, Kirsten M. Ø.; Juhas, Pavol; Tofanelli, Marcus A.; Heinecke, Christine L.; Vaughan, Gavin; Ackerson, Christopher J.; Billinge, Simon J. L.

    2016-06-01

    Ultra-small, magic-sized metal nanoclusters represent an important new class of materials with properties between molecules and particles. However, their small size challenges the conventional methods for structure characterization. Here we present the structure of ultra-stable Au144(SR)60 magic-sized nanoclusters obtained from atomic pair distribution function analysis of X-ray powder diffraction data. The study reveals structural polymorphism in these archetypal nanoclusters. In addition to confirming the theoretically predicted icosahedral-cored cluster, we also find samples with a truncated decahedral core structure, with some samples exhibiting a coexistence of both cluster structures. Although the clusters are monodisperse in size, structural diversity is apparent. The discovery of polymorphism may open up a new dimension in nanoscale engineering.

  4. A revised moving cluster distance to the Pleiades open cluster

    NASA Astrophysics Data System (ADS)

    Galli, P. A. B.; Moraux, E.; Bouy, H.; Bouvier, J.; Olivares, J.; Teixeira, R.

    2017-02-01

    Context. The distance to the Pleiades open cluster has been extensively debated in the literature over several decades. Although different methods point to a discrepancy in the trigonometric parallaxes produced by the Hipparcos mission, the number of individual stars with known distances is still small compared to the number of cluster members to help solve this problem. Aims: We provide a new distance estimate for the Pleiades based on the moving cluster method, which will be useful to further discuss the so-called Pleiades distance controversy and compare it with the very precise parallaxes from the Gaia space mission. Methods: We apply a refurbished implementation of the convergent point search method to an updated census of Pleiades stars to calculate the convergent point position of the cluster from stellar proper motions. Then, we derive individual parallaxes for 64 cluster members using radial velocities compiled from the literature, and approximate parallaxes for another 1146 stars based on the spatial velocity of the cluster. This represents the largest sample of Pleiades stars with individual distances to date. Results: The parallaxes derived in this work are in good agreement with previous results obtained in different studies (excluding Hipparcos) for individual stars in the cluster. We report a mean parallax of 7.44 ± 0.08 mas and distance of pc that is consistent with the weighted mean of 135.0 ± 0.6 pc obtained from the non-Hipparcos results in the literature. Conclusions: Our result for the distance to the Pleiades open cluster is not consistent with the Hipparcos catalog, but favors the recent and more precise distance determination of 136.2 ± 1.2 pc obtained from Very Long Baseline Interferometry observations. It is also in good agreement with the mean distance of 133 ± 5 pc obtained from the first trigonometric parallaxes delivered by the Gaia satellite for the brightest cluster members in common with our sample. Full Table B.2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A48

  5. Adding-point strategy for reduced-order hypersonic aerothermodynamics modeling based on fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Chen, Xin; Liu, Li; Zhou, Sida; Yue, Zhenjiang

    2016-09-01

    Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.

  6. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials.

    PubMed

    Kasza, J; Hemming, K; Hooper, R; Matthews, Jns; Forbes, A B

    2017-01-01

    Stepped wedge and cluster randomised crossover trials are examples of cluster randomised designs conducted over multiple time periods that are being used with increasing frequency in health research. Recent systematic reviews of both of these designs indicate that the within-cluster correlation is typically taken account of in the analysis of data using a random intercept mixed model, implying a constant correlation between any two individuals in the same cluster no matter how far apart in time they are measured: within-period and between-period intra-cluster correlations are assumed to be identical. Recently proposed extensions allow the within- and between-period intra-cluster correlations to differ, although these methods require that all between-period intra-cluster correlations are identical, which may not be appropriate in all situations. Motivated by a proposed intensive care cluster randomised trial, we propose an alternative correlation structure for repeated cross-sectional multiple-period cluster randomised trials in which the between-period intra-cluster correlation is allowed to decay depending on the distance between measurements. We present results for the variance of treatment effect estimators for varying amounts of decay, investigating the consequences of the variation in decay on sample size planning for stepped wedge, cluster crossover and multiple-period parallel-arm cluster randomised trials. We also investigate the impact of assuming constant between-period intra-cluster correlations instead of decaying between-period intra-cluster correlations. Our results indicate that in certain design configurations, including the one corresponding to the proposed trial, a correlation decay can have an important impact on variances of treatment effect estimators, and hence on sample size and power. An R Shiny app allows readers to interactively explore the impact of correlation decay.

  7. PhyloChip™ microarray comparison of sampling methods used for coral microbial ecology

    USGS Publications Warehouse

    Kellogg, Christina A.; Piceno, Yvette M.; Tom, Lauren M.; DeSantis, Todd Z.; Zawada, David G.; Andersen, Gary L.

    2012-01-01

    Interest in coral microbial ecology has been increasing steadily over the last decade, yet standardized methods of sample collection still have not been defined. Two methods were compared for their ability to sample coral-associated microbial communities: tissue punches and foam swabs, the latter being less invasive and preferred by reef managers. Four colonies of star coral, Montastraea annularis, were sampled in the Dry Tortugas National Park (two healthy and two with white plague disease). The PhyloChip™ G3 microarray was used to assess microbial community structure of amplified 16S rRNA gene sequences. Samples clustered based on methodology rather than coral colony. Punch samples from healthy and diseased corals were distinct. All swab samples clustered closely together with the seawater control and did not group according to the health state of the corals. Although more microbial taxa were detected by the swab method, there is a much larger overlap between the water control and swab samples than punch samples, suggesting some of the additional diversity is due to contamination from water absorbed by the swab. While swabs are useful for noninvasive studies of the coral surface mucus layer, these results show that they are not optimal for studies of coral disease.

  8. PhyloChip™ microarray comparison of sampling methods used for coral microbial ecology.

    PubMed

    Kellogg, Christina A; Piceno, Yvette M; Tom, Lauren M; DeSantis, Todd Z; Zawada, David G; Andersen, Gary L

    2012-01-01

    Interest in coral microbial ecology has been increasing steadily over the last decade, yet standardized methods of sample collection still have not been defined. Two methods were compared for their ability to sample coral-associated microbial communities: tissue punches and foam swabs, the latter being less invasive and preferred by reef managers. Four colonies of star coral, Montastraea annularis, were sampled in the Dry Tortugas National Park (two healthy and two with white plague disease). The PhyloChip™ G3 microarray was used to assess microbial community structure of amplified 16S rRNA gene sequences. Samples clustered based on methodology rather than coral colony. Punch samples from healthy and diseased corals were distinct. All swab samples clustered closely together with the seawater control and did not group according to the health state of the corals. Although more microbial taxa were detected by the swab method, there is a much larger overlap between the water control and swab samples than punch samples, suggesting some of the additional diversity is due to contamination from water absorbed by the swab. While swabs are useful for noninvasive studies of the coral surface mucus layer, these results show that they are not optimal for studies of coral disease. Published by Elsevier B.V.

  9. Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System.

    PubMed

    Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

    2016-01-01

    Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP).

  10. Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System

    PubMed Central

    Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

    2016-01-01

    Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP). PMID:26829639

  11. Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

    PubMed Central

    Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

    2011-01-01

    High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

  12. Rapidly differentiating grape seeds from different sources based on characteristic fingerprints using direct analysis in real time coupled with time-of-flight mass spectrometry combined with chemometrics.

    PubMed

    Song, Yuqiao; Liao, Jie; Dong, Junxing; Chen, Li

    2015-09-01

    The seeds of grapevine (Vitis vinifera) are a byproduct of wine production. To examine the potential value of grape seeds, grape seeds from seven sources were subjected to fingerprinting using direct analysis in real time coupled with time-of-flight mass spectrometry combined with chemometrics. Firstly, we listed all reported components (56 components) from grape seeds and calculated the precise m/z values of the deprotonated ions [M-H](-) . Secondly, the experimental conditions were systematically optimized based on the peak areas of total ion chromatograms of the samples. Thirdly, the seven grape seed samples were examined using the optimized method. Information about 20 grape seed components was utilized to represent characteristic fingerprints. Finally, hierarchical clustering analysis and principal component analysis were performed to analyze the data. Grape seeds from seven different sources were classified into two clusters; hierarchical clustering analysis and principal component analysis yielded similar results. The results of this study lay the foundation for appropriate utilization and exploitation of grape seed samples. Due to the absence of complicated sample preparation methods and chromatographic separation, the method developed in this study represents one of the simplest and least time-consuming methods for grape seed fingerprinting. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Dark Energy Survey Year 1 Results: Cross-Correlation Redshifts - Methods and Systematics Characterization

    DOE PAGES

    Gatti, M.

    2018-02-22

    We use numerical simulations to characterize the performance of a clustering-based method to calibrate photometric redshift biases. In particular, we cross-correlate the weak lensing (WL) source galaxies from the Dark Energy Survey Year 1 (DES Y1) sample with redMaGiC galaxies (luminous red galaxies with secure photometric red- shifts) to estimate the redshift distribution of the former sample. The recovered redshift distributions are used to calibrate the photometric redshift bias of standard photo-z methods applied to the same source galaxy sample. We also apply the method to three photo-z codes run in our simulated data: Bayesian Photometric Redshift (BPZ), Directional Neighborhoodmore » Fitting (DNF), and Random Forest-based photo-z (RF). We characterize the systematic uncertainties of our calibration procedure, and find that these systematic uncertainties dominate our error budget. The dominant systematics are due to our assumption of unevolving bias and clustering across each redshift bin, and to differences between the shapes of the redshift distributions derived by clustering vs photo-z's. The systematic uncertainty in the mean redshift bias of the source galaxy sample is z ≲ 0.02, though the precise value depends on the redshift bin under consideration. Here, we discuss possible ways to mitigate the impact of our dominant systematics in future analyses.« less

  14. Dark Energy Survey Year 1 Results: Cross-Correlation Redshifts - Methods and Systematics Characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gatti, M.

    We use numerical simulations to characterize the performance of a clustering-based method to calibrate photometric redshift biases. In particular, we cross-correlate the weak lensing (WL) source galaxies from the Dark Energy Survey Year 1 (DES Y1) sample with redMaGiC galaxies (luminous red galaxies with secure photometric red- shifts) to estimate the redshift distribution of the former sample. The recovered redshift distributions are used to calibrate the photometric redshift bias of standard photo-z methods applied to the same source galaxy sample. We also apply the method to three photo-z codes run in our simulated data: Bayesian Photometric Redshift (BPZ), Directional Neighborhoodmore » Fitting (DNF), and Random Forest-based photo-z (RF). We characterize the systematic uncertainties of our calibration procedure, and find that these systematic uncertainties dominate our error budget. The dominant systematics are due to our assumption of unevolving bias and clustering across each redshift bin, and to differences between the shapes of the redshift distributions derived by clustering vs photo-z's. The systematic uncertainty in the mean redshift bias of the source galaxy sample is z ≲ 0.02, though the precise value depends on the redshift bin under consideration. Here, we discuss possible ways to mitigate the impact of our dominant systematics in future analyses.« less

  15. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing.

    PubMed

    Kennedy, Nicholas A; Walker, Alan W; Berry, Susan H; Duncan, Sylvia H; Farquarson, Freda M; Louis, Petra; Thomson, John M; Satsangi, Jack; Flint, Harry J; Parkhill, Julian; Lees, Charlie W; Hold, Georgina L

    2014-01-01

    Determining bacterial community structure in fecal samples through DNA sequencing is an important facet of intestinal health research. The impact of different commercially available DNA extraction kits upon bacterial community structures has received relatively little attention. The aim of this study was to analyze bacterial communities in volunteer and inflammatory bowel disease (IBD) patient fecal samples extracted using widely used DNA extraction kits in established gastrointestinal research laboratories. Fecal samples from two healthy volunteers (H3 and H4) and two relapsing IBD patients (I1 and I2) were investigated. DNA extraction was undertaken using MoBio Powersoil and MP Biomedicals FastDNA SPIN Kit for Soil DNA extraction kits. PCR amplification for pyrosequencing of bacterial 16S rRNA genes was performed in both laboratories on all samples. Hierarchical clustering of sequencing data was done using the Yue and Clayton similarity coefficient. DNA extracted using the FastDNA kit and the MoBio kit gave median DNA concentrations of 475 (interquartile range 228-561) and 22 (IQR 9-36) ng/µL respectively (p<0.0001). Hierarchical clustering of sequence data by Yue and Clayton coefficient revealed four clusters. Samples from individuals H3 and I2 clustered by patient; however, samples from patient I1 extracted with the MoBio kit clustered with samples from patient H4 rather than the other I1 samples. Linear modelling on relative abundance of common bacterial families revealed significant differences between kits; samples extracted with MoBio Powersoil showed significantly increased Bacteroidaceae, Ruminococcaceae and Porphyromonadaceae, and lower Enterobacteriaceae, Lachnospiraceae, Clostridiaceae, and Erysipelotrichaceae (p<0.05). This study demonstrates significant differences in DNA yield and bacterial DNA composition when comparing DNA extracted from the same fecal sample with different extraction kits. This highlights the importance of ensuring that samples in a study are prepared with the same method, and the need for caution when cross-comparing studies that use different methods.

  16. Determination of Cluster Distances from Chandra Imaging Spectroscopy and Sunyaev-Zeldovich Effect Measurements. I; Analysis Methods and Initial Results

    NASA Technical Reports Server (NTRS)

    Bonamente, Massimiliano; Joy, Marshall K.; Carlstrom, John E.; LaRoque, Samuel J.

    2004-01-01

    X-ray and Sunyaev-Zeldovich Effect data ca,n be combined to determine the distance to galaxy clusters. High-resolution X-ray data are now available from the Chandra Observatory, which provides both spatial and spectral information, and interferometric radio measurements of the Sunyam-Zeldovich Effect are available from the BIMA and 0VR.O arrays. We introduce a Monte Carlo Markov chain procedure for the joint analysis of X-ray and Sunyaev-Zeldovich Effect data. The advantages of this method are the high computational efficiency and the ability to measure the full probability distribution of all parameters of interest, such as the spatial and spectral properties of the cluster gas and the cluster distance. We apply this technique to the Chandra X-ray data and the OVRO radio data for the galaxy cluster Abell 611. Comparisons with traditional likelihood-ratio methods reveal the robustness of the method. This method will be used in a follow-up paper to determine the distance of a large sample of galaxy clusters for which high-resolution Chandra X-ray and BIMA/OVRO radio data are available.

  17. Three-dimensional Identification and Reconstruction of Galaxy Systems within Flux-limited Redshift Surveys

    NASA Astrophysics Data System (ADS)

    Marinoni, Christian; Davis, Marc; Newman, Jeffrey A.; Coil, Alison L.

    2002-11-01

    We have developed a new geometrical method for identifying and reconstructing a homogeneous and highly complete set of galaxy groups within flux-limited redshift surveys. Our method combines information from the three-dimensional Voronoi diagram and its dual, the Delaunay triangulation, to obtain group and cluster catalogs that are remarkably robust over wide ranges in redshift and degree of density enhancement. As free by-products, this Voronoi-Delaunay method (VDM) provides a nonparametric measurement of the galaxy density around each object observed and a quantitative measure of the distribution of cosmological voids in the survey volume. In this paper, we describe the VDM algorithm in detail and test its effectiveness using a family of mock catalogs that simulate the Deep Extragalactic Evolutionary Probe (DEEP2) Redshift Survey, which should present at least as much challenge to cluster reconstruction methods as any other near-future survey that is capable of resolving their velocity dispersions. Using these mock DEEP2 catalogs, we demonstrate that the VDM algorithm can be used to identify a homogeneous set of groups in a magnitude-limited sample throughout the survey redshift window 0.7~400 km s-1. Finally, we argue that the bivariate distribution of systems as a function of redshift and velocity dispersion reconstructed with these techniques reproduces with high fidelity the underlying real space distribution and can thus be used robustly to constrain cosmological parameters. We expect that the VDM algorithm, which has performed so well when faced with the challenges posed by the DEEP2 survey, should only be more effective when applied to the better sampled, larger surveys of the local universe now underway.

  18. Cross-entropy clustering framework for catchment classification

    NASA Astrophysics Data System (ADS)

    Tongal, Hakan; Sivakumar, Bellie

    2017-09-01

    There is an increasing interest in catchment classification and regionalization in hydrology, as they are useful for identification of appropriate model complexity and transfer of information from gauged catchments to ungauged ones, among others. This study introduces a nonlinear cross-entropy clustering (CEC) method for classification of catchments. The method specifically considers embedding dimension (m), sample entropy (SampEn), and coefficient of variation (CV) to represent dimensionality, complexity, and variability of the time series, respectively. The method is applied to daily streamflow time series from 217 gauging stations across Australia. The results suggest that a combination of linear and nonlinear parameters (i.e. m, SampEn, and CV), representing different aspects of the underlying dynamics of streamflows, could be useful for determining distinct patterns of flow generation mechanisms within a nonlinear clustering framework. For the 217 streamflow time series, nine hydrologically homogeneous clusters that have distinct patterns of flow regime characteristics and specific dominant hydrological attributes with different climatic features are obtained. Comparison of the results with those obtained using the widely employed k-means clustering method (which results in five clusters, with the loss of some information about the features of the clusters) suggests the superiority of the cross-entropy clustering method. The outcomes from this study provide a useful guideline for employing the nonlinear dynamic approaches based on hydrologic signatures and for gaining an improved understanding of streamflow variability at a large scale.

  19. Querying Co-regulated Genes on Diverse Gene Expression Datasets Via Biclustering.

    PubMed

    Deveci, Mehmet; Küçüktunç, Onur; Eren, Kemal; Bozdağ, Doruk; Kaya, Kamer; Çatalyürek, Ümit V

    2016-01-01

    Rapid development and increasing popularity of gene expression microarrays have resulted in a number of studies on the discovery of co-regulated genes. One important way of discovering such co-regulations is the query-based search since gene co-expressions may indicate a shared role in a biological process. Although there exist promising query-driven search methods adapting clustering, they fail to capture many genes that function in the same biological pathway because microarray datasets are fraught with spurious samples or samples of diverse origin, or the pathways might be regulated under only a subset of samples. On the other hand, a class of clustering algorithms known as biclustering algorithms which simultaneously cluster both the items and their features are useful while analyzing gene expression data, or any data in which items are related in only a subset of their samples. This means that genes need not be related in all samples to be clustered together. Because many genes only interact under specific circumstances, biclustering may recover the relationships that traditional clustering algorithms can easily miss. In this chapter, we briefly summarize the literature using biclustering for querying co-regulated genes. Then we present a novel biclustering approach and evaluate its performance by a thorough experimental analysis.

  20. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification.

    PubMed

    Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L

    2016-01-01

    An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.

  1. On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.

    PubMed

    Haneuse, Sebastien; Rivera-Rodriguez, Claudia

    2018-01-01

    In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.

  2. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  3. Automated Classification and Analysis of Non-metallic Inclusion Data Sets

    NASA Astrophysics Data System (ADS)

    Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.

    2018-05-01

    The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.

  4. Nonlinear inversion of electrical resistivity imaging using pruning Bayesian neural networks

    NASA Astrophysics Data System (ADS)

    Jiang, Fei-Bo; Dai, Qian-Wei; Dong, Li

    2016-06-01

    Conventional artificial neural networks used to solve electrical resistivity imaging (ERI) inversion problem suffer from overfitting and local minima. To solve these problems, we propose to use a pruning Bayesian neural network (PBNN) nonlinear inversion method and a sample design method based on the K-medoids clustering algorithm. In the sample design method, the training samples of the neural network are designed according to the prior information provided by the K-medoids clustering results; thus, the training process of the neural network is well guided. The proposed PBNN, based on Bayesian regularization, is used to select the hidden layer structure by assessing the effect of each hidden neuron to the inversion results. Then, the hyperparameter α k , which is based on the generalized mean, is chosen to guide the pruning process according to the prior distribution of the training samples under the small-sample condition. The proposed algorithm is more efficient than other common adaptive regularization methods in geophysics. The inversion of synthetic data and field data suggests that the proposed method suppresses the noise in the neural network training stage and enhances the generalization. The inversion results with the proposed method are better than those of the BPNN, RBFNN, and RRBFNN inversion methods as well as the conventional least squares inversion.

  5. The association between content of the elements S, Cl, K, Fe, Cu, Zn and Br in normal and cirrhotic liver tissue from Danes and Greenlandic Inuit examined by dual hierarchical clustering analysis.

    PubMed

    Laursen, Jens; Milman, Nils; Pind, Niels; Pedersen, Henrik; Mulvad, Gert

    2014-01-01

    Meta-analysis of previous studies evaluating associations between content of elements sulphur (S), chlorine (Cl), potassium (K), iron (Fe), copper (Cu), zinc (Zn) and bromine (Br) in normal and cirrhotic autopsy liver tissue samples. Normal liver samples from 45 Greenlandic Inuit, median age 60 years and from 71 Danes, median age 61 years. Cirrhotic liver samples from 27 Danes, median age 71 years. Element content was measured using X-ray fluorescence spectrometry. Dual hierarchical clustering analysis, creating a dual dendrogram, one clustering element contents according to calculated similarities, one clustering elements according to correlation coefficients between the element contents, both using Euclidian distance and Ward Procedure. One dendrogram separated subjects in 7 clusters showing no differences in ethnicity, gender or age. The analysis discriminated between elements in normal and cirrhotic livers. The other dendrogram clustered elements in four clusters: sulphur and chlorine; copper and bromine; potassium and zinc; iron. There were significant correlations between the elements in normal liver samples: S was associated with Cl, K, Br and Zn; Cl with S and Br; K with S, Br and Zn; Cu with Br. Zn with S and K. Br with S, Cl, K and Cu. Fe did not show significant associations with any other element. In contrast to simple statistical methods, which analyses content of elements separately one by one, dual hierarchical clustering analysis incorporates all elements at the same time and can be used to examine the linkage and interplay between multiple elements in tissue samples. Copyright © 2013 Elsevier GmbH. All rights reserved.

  6. Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

    PubMed

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  7. A multimembership catalogue for 1876 open clusters using UCAC4 data

    NASA Astrophysics Data System (ADS)

    Sampedro, L.; Dias, W. S.; Alfaro, E. J.; Monteiro, H.; Molino, A.

    2017-10-01

    The main objective of this work is to determine the cluster members of 1876 open clusters, using positions and proper motions of the astrometric fourth United States Naval Observatory (USNO) CCD Astrograph Catalog (UCAC4). For this purpose, we apply three different methods, all based on a Bayesian approach, but with different formulations: a purely parametric method, another completely non-parametric algorithm and a third, recently developed by Sampedro & Alfaro, using both formulations at different steps of the whole process. The first and second statistical moments of the members' phase-space subspace, obtained after applying the three methods, are compared for every cluster. Although, on average, the three methods yield similar results, there are also specific differences between them, as well as for some particular clusters. The comparison with other published catalogues shows good agreement. We have also estimated, for the first time, the mean proper motion for a sample of 18 clusters. The results are organized in a single catalogue formed by two main files, one with the most relevant information for each cluster, partially including that in UCAC4, and the other showing the individual membership probabilities for each star in the cluster area. The final catalogue, with an interface design that enables an easy interaction with the user, is available in electronic format at the Stellar Systems Group (SSG-IAA) web site (http://ssg.iaa.es/en/content/sampedro-cluster-catalog).

  8. Towards Tunable Consensus Clustering for Studying Functional Brain Connectivity During Affective Processing.

    PubMed

    Liu, Chao; Abu-Jamous, Basel; Brattico, Elvira; Nandi, Asoke K

    2017-03-01

    In the past decades, neuroimaging of humans has gained a position of status within neuroscience, and data-driven approaches and functional connectivity analyses of functional magnetic resonance imaging (fMRI) data are increasingly favored to depict the complex architecture of human brains. However, the reliability of these findings is jeopardized by too many analysis methods and sometimes too few samples used, which leads to discord among researchers. We propose a tunable consensus clustering paradigm that aims at overcoming the clustering methods selection problem as well as reliability issues in neuroimaging by means of first applying several analysis methods (three in this study) on multiple datasets and then integrating the clustering results. To validate the method, we applied it to a complex fMRI experiment involving affective processing of hundreds of music clips. We found that brain structures related to visual, reward, and auditory processing have intrinsic spatial patterns of coherent neuroactivity during affective processing. The comparisons between the results obtained from our method and those from each individual clustering algorithm demonstrate that our paradigm has notable advantages over traditional single clustering algorithms in being able to evidence robust connectivity patterns even with complex neuroimaging data involving a variety of stimuli and affective evaluations of them. The consensus clustering method is implemented in the R package "UNCLES" available on http://cran.r-project.org/web/packages/UNCLES/index.html .

  9. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

    PubMed Central

    Saeed, Isaam; Tang, Sen-Lin; Halgamuge, Saman K.

    2012-01-01

    An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis. PMID:22180538

  10. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

    PubMed

    Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.

  11. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets

    PubMed Central

    Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225

  12. Mixed Pattern Matching-Based Traffic Abnormal Behavior Recognition

    PubMed Central

    Cui, Zhiming; Zhao, Pengpeng

    2014-01-01

    A motion trajectory is an intuitive representation form in time-space domain for a micromotion behavior of moving target. Trajectory analysis is an important approach to recognize abnormal behaviors of moving targets. Against the complexity of vehicle trajectories, this paper first proposed a trajectory pattern learning method based on dynamic time warping (DTW) and spectral clustering. It introduced the DTW distance to measure the distances between vehicle trajectories and determined the number of clusters automatically by a spectral clustering algorithm based on the distance matrix. Then, it clusters sample data points into different clusters. After the spatial patterns and direction patterns learned from the clusters, a recognition method for detecting vehicle abnormal behaviors based on mixed pattern matching was proposed. The experimental results show that the proposed technical scheme can recognize main types of traffic abnormal behaviors effectively and has good robustness. The real-world application verified its feasibility and the validity. PMID:24605045

  13. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  14. Clustering of attitudes towards obesity: a mixed methods study of Australian parents and children

    PubMed Central

    2013-01-01

    Background Current population-based anti-obesity campaigns often target individuals based on either weight or socio-demographic characteristics, and give a ‘mass’ message about personal responsibility. There is a recognition that attempts to influence attitudes and opinions may be more effective if they resonate with the beliefs that different groups have about the causes of, and solutions for, obesity. Limited research has explored how attitudinal factors may inform the development of both upstream and downstream social marketing initiatives. Methods Computer-assisted face-to-face interviews were conducted with 159 parents and 184 of their children (aged 9–18 years old) in two Australian states. A mixed methods approach was used to assess attitudes towards obesity, and elucidate why different groups held various attitudes towards obesity. Participants were quantitatively assessed on eight dimensions relating to the severity and extent, causes and responsibility, possible remedies, and messaging strategies. Cluster analysis was used to determine attitudinal clusters. Participants were also able to qualify each answer. Qualitative responses were analysed both within and across attitudinal clusters using a constant comparative method. Results Three clusters were identified. Concerned Internalisers (27% of the sample) judged that obesity was a serious health problem, that Australia had among the highest levels of obesity in the world and that prevalence was rapidly increasing. They situated the causes and remedies for the obesity crisis in individual choices. Concerned Externalisers (38% of the sample) held similar views about the severity and extent of the obesity crisis. However, they saw responsibility and remedies as a societal rather than an individual issue. The final cluster, the Moderates, which contained significantly more children and males, believed that obesity was not such an important public health issue, and judged the extent of obesity to be less extreme than the other clusters. Conclusion Attitudinal clusters provide new information and insights which may be useful in tailoring anti-obesity social marketing initiatives. PMID:24119724

  15. CHEERS: The chemical evolution RGS sample

    NASA Astrophysics Data System (ADS)

    de Plaa, J.; Kaastra, J. S.; Werner, N.; Pinto, C.; Kosec, P.; Zhang, Y.-Y.; Mernier, F.; Lovisari, L.; Akamatsu, H.; Schellenberger, G.; Hofmann, F.; Reiprich, T. H.; Finoguenov, A.; Ahoranta, J.; Sanders, J. S.; Fabian, A. C.; Pols, O.; Simionescu, A.; Vink, J.; Böhringer, H.

    2017-11-01

    Context. The chemical yields of supernovae and the metal enrichment of the intra-cluster medium (ICM) are not well understood. The hot gas in clusters of galaxies has been enriched with metals originating from billions of supernovae and provides a fair sample of large-scale metal enrichment in the Universe. High-resolution X-ray spectra of clusters of galaxies provide a unique way of measuring abundances in the hot intracluster medium (ICM). The abundance measurements can provide constraints on the supernova explosion mechanism and the initial-mass function of the stellar population. This paper introduces the CHEmical Enrichment RGS Sample (CHEERS), which is a sample of 44 bright local giant ellipticals, groups, and clusters of galaxies observed with XMM-Newton. Aims: The CHEERS project aims to provide the most accurate set of cluster abundances measured in X-rays using this sample. This paper focuses specifically on the abundance measurements of O and Fe using the reflection grating spectrometer (RGS) on board XMM-Newton. We aim to thoroughly discuss the cluster to cluster abundance variations and the robustness of the measurements. Methods: We have selected the CHEERS sample such that the oxygen abundance in each cluster is detected at a level of at least 5σ in the RGS. The dispersive nature of the RGS limits the sample to clusters with sharp surface brightness peaks. The deep exposures and the size of the sample allow us to quantify the intrinsic scatter and the systematic uncertainties in the abundances using spectral modeling techniques. Results: We report the oxygen and iron abundances as measured with RGS in the core regions of all 44 clusters in the sample. We do not find a significant trend of O/Fe as a function of cluster temperature, but we do find an intrinsic scatter in the O and Fe abundances from cluster to cluster. The level of systematic uncertainties in the O/Fe ratio is estimated to be around 20-30%, while the systematic uncertainties in the absolute O and Fe abundances can be as high as 50% in extreme cases. Thanks to the high statistics of the observations, we were able to identify and correct a systematic bias in the oxygen abundance determination that was due to an inaccuracy in the spectral model. Conclusions: The lack of dependence of O/Fe on temperature suggests that the enrichment of the ICM does not depend on cluster mass and that most of the enrichment likely took place before the ICM was formed. We find that the observed scatter in the O/Fe ratio is due to a combination of intrinsic scatter in the source and systematic uncertainties in the spectral fitting, which we are unable to separate. The astrophysical source of intrinsic scatter could be due to differences in active galactic nucleus activity and ongoing star formation in the brightest cluster galaxy. The systematic scatter is due to uncertainties in the spatial line broadening, absorption column, multi-temperature structure, and the thermal plasma models.

  16. The Atacama Cosmology Telescope: Sunyaev-Zel'dovich-Selected Galaxy Clusters AT 148 GHz in the 2008 Survey

    NASA Technical Reports Server (NTRS)

    Marriage, Tobias A.; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John William; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; Brown, Ben; hide

    2011-01-01

    We report on 23 clusters detected blindly as Sunyaev-Zel'dovich (SZ) decrements in a 148 GHz, 455 deg (exp 2) map of the southern sky made with data from the Atacama Cosmology Telescope 2008 observing season. All SZ detections announced in this work have confirmed optical counterparts. Ten of the clusters are new discoveries. One newly discovered cluster, ACT-CL 10102-4915, with a redshift of 0.75 (photometric), has an SZ decrement comparable to the most massive systems at lower redshifts. Simulations of the cluster recovery method reproduce the sample purity measured by optical follow-up. In particular, for clusters detected with a signal-to-noise ratio greater than six, simulations are consistent with optical follow-up that demonstrated this subsample is 100% pure, The simulations further imply that the total sample is 80% complete for clusters with mass in excess of 6 x 10(exp 14) solar masses referenced to the cluster volume characterized by 500 times the critical density. The Compton gamma-X-ray luminosity mass comparison for the 11 best-detected clusters visually agrees with both self-similar and non-adiabatic, simulation-derived scaling laws,

  17. X-ray morphological study of the ESZ sample

    NASA Astrophysics Data System (ADS)

    Lovisari, L.; Forman, W.; Jones, C.; Andrade-Santos, F.; Democles, J.; Pratt, G.; Ettori, S.; Arnaud, M.; Randall, S.; Kraft, R.

    2017-10-01

    An accurate knowledge of the scaling relations between X-ray observables and cluster mass is a crucial step for studies that aim to constrain cosmological parameters using galaxy clusters. The measure of the dynamical state of the systems offers important information to obtain precise scaling relations and understand their scatter. Unfortunately, characterize the dynamical state of a galaxy cluster requires to access a large set of information in different wavelength which are available only for a few individual systems. An alternative is to compute well defined morphological parameters making use of the relatively cheap X-ray images and profiles. Due to different projection effects none of the methods is good in all the cases and a combination of them is more effective to quantify the level of substructures. I will present the cluster morphologies that we derived for the ESZ sample. I will show their dependence on different cluster properties like total mass, redshift, and luminosity and how they differ from the ones obtained for X-ray selected clusters.

  18. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    NASA Astrophysics Data System (ADS)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  19. Polymorphism in magic-sized Au144(SR)60 clusters

    DOE PAGES

    Jensen, Kirsten M. O.; Juhas, Pavol; Tofanelli, Marcus A.; ...

    2016-06-14

    Ultra-small, magic-sized metal nanoclusters represent an important new class of materials with properties between molecules and particles. However, their small size challenges the conventional methods for structure characterization. We present the structure of ultra-stable Au144(SR)60 magic-sized nanoclusters obtained from atomic pair distribution function analysis of X-ray powder diffraction data. Our study reveals structural polymorphism in these archetypal nanoclusters. Additionally, in order to confirm the theoretically predicted icosahedral-cored cluster, we also find samples with a truncated decahedral core structure, with some samples exhibiting a coexistence of both cluster structures. Although the clusters are monodisperse in size, structural diversity is apparent. Finally,more » the discovery of polymorphism may open up a new dimension in nanoscale engineering.« less

  20. Cosmological Constraints from Galaxy Clustering and the Mass-to-number Ratio of Galaxy Clusters

    NASA Astrophysics Data System (ADS)

    Tinker, Jeremy L.; Sheldon, Erin S.; Wechsler, Risa H.; Becker, Matthew R.; Rozo, Eduardo; Zu, Ying; Weinberg, David H.; Zehavi, Idit; Blanton, Michael R.; Busha, Michael T.; Koester, Benjamin P.

    2012-01-01

    We place constraints on the average density (Ω m ) and clustering amplitude (σ8) of matter using a combination of two measurements from the Sloan Digital Sky Survey: the galaxy two-point correlation function, wp (rp ), and the mass-to-galaxy-number ratio within galaxy clusters, M/N, analogous to cluster M/L ratios. Our wp (rp ) measurements are obtained from DR7 while the sample of clusters is the maxBCG sample, with cluster masses derived from weak gravitational lensing. We construct nonlinear galaxy bias models using the Halo Occupation Distribution (HOD) to fit both wp (rp ) and M/N for different cosmological parameters. HOD models that match the same two-point clustering predict different numbers of galaxies in massive halos when Ω m or σ8 is varied, thereby breaking the degeneracy between cosmology and bias. We demonstrate that this technique yields constraints that are consistent and competitive with current results from cluster abundance studies, without the use of abundance information. Using wp (rp ) and M/N alone, we find Ω0.5 m σ8 = 0.465 ± 0.026, with individual constraints of Ω m = 0.29 ± 0.03 and σ8 = 0.85 ± 0.06. Combined with current cosmic microwave background data, these constraints are Ω m = 0.290 ± 0.016 and σ8 = 0.826 ± 0.020. All errors are 1σ. The systematic uncertainties that the M/N technique are most sensitive to are the amplitude of the bias function of dark matter halos and the possibility of redshift evolution between the SDSS Main sample and the maxBCG cluster sample. Our derived constraints are insensitive to the current level of uncertainties in the halo mass function and in the mass-richness relation of clusters and its scatter, making the M/N technique complementary to cluster abundances as a method for constraining cosmology with future galaxy surveys.

  1. Structural parameters of young star clusters: fractal analysis

    NASA Astrophysics Data System (ADS)

    Hetem, A.

    2017-07-01

    A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.

  2. Age and Mass for 920 Large Magellanic Cloud Clusters Derived from 100 Million Monte Carlo Simulations

    NASA Astrophysics Data System (ADS)

    Popescu, Bogdan; Hanson, M. M.; Elmegreen, Bruce G.

    2012-06-01

    We present new age and mass estimates for 920 stellar clusters in the Large Magellanic Cloud (LMC) based on previously published broadband photometry and the stellar cluster analysis package, MASSCLEANage. Expressed in the generic fitting formula, d 2 N/dMdtvpropM α t β, the distribution of observed clusters is described by α = -1.5 to -1.6 and β = -2.1 to -2.2. For 288 of these clusters, ages have recently been determined based on stellar photometric color-magnitude diagrams, allowing us to gauge the confidence of our ages. The results look very promising, opening up the possibility that this sample of 920 clusters, with reliable and consistent age, mass, and photometric measures, might be used to constrain important characteristics about the stellar cluster population in the LMC. We also investigate a traditional age determination method that uses a χ2 minimization routine to fit observed cluster colors to standard infinite-mass limit simple stellar population models. This reveals serious defects in the derived cluster age distribution using this method. The traditional χ2 minimization method, due to the variation of U, B, V, R colors, will always produce an overdensity of younger and older clusters, with an underdensity of clusters in the log (age/yr) = [7.0, 7.5] range. Finally, we present a unique simulation aimed at illustrating and constraining the fading limit in observed cluster distributions that includes the complex effects of stochastic variations in the observed properties of stellar clusters.

  3. Clustering redshift distributions for the Dark Energy Survey

    NASA Astrophysics Data System (ADS)

    Helsby, Jennifer

    Accurate determination of photometric redshifts and their errors is critical for large scale structure and weak lensing studies for constraining cosmology from deep, wide imaging surveys. Current photometric redshift methods suffer from bias and scatter due to incomplete training sets. Exploiting the clustering between a sample of galaxies for which we have spectroscopic redshifts and a sample of galaxies for which the redshifts are unknown can allow us to reconstruct the true redshift distribution of the unknown sample. Here we use this method in both simulations and early data from the Dark Energy Survey (DES) to determine the true redshift distributions of galaxies in photometric redshift bins. We find that cross-correlating with the spectroscopic samples currently used for training provides a useful test of photometric redshifts and provides reliable estimates of the true redshift distribution in a photometric redshift bin. We discuss the use of the cross-correlation method in validating template- or learning-based approaches to redshift estimation and its future use in Stage IV surveys.

  4. Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

    PubMed Central

    Li, Peng; Redden, David T.

    2014-01-01

    SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738

  5. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

    PubMed

    He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

    2015-01-01

    The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.

  6. Characterization of Glutaredoxin Fe-S Cluster-Binding Interactions Using Circular Dichroism Spectroscopy.

    PubMed

    Albetel, Angela-Nadia; Outten, Caryn E

    2018-01-01

    Monothiol glutaredoxins (Grxs) with a conserved Cys-Gly-Phe-Ser (CGFS) active site are iron-sulfur (Fe-S) cluster-binding proteins that interact with a variety of partner proteins and perform crucial roles in iron metabolism including Fe-S cluster transfer, Fe-S cluster repair, and iron signaling. Various analytical and spectroscopic methods are currently being used to monitor and characterize glutaredoxin Fe-S cluster-dependent interactions at the molecular level. The electronic, magnetic, and vibrational properties of the protein-bound Fe-S cluster provide a convenient handle to probe the structure, function, and coordination chemistry of Grx complexes. However, some limitations arise from sample preparation requirements, complexity of individual techniques, or the necessity for combining multiple methods in order to achieve a complete investigation. In this chapter, we focus on the use of UV-visible circular dichroism spectroscopy as a fast and simple initial approach for investigating glutaredoxin Fe-S cluster-dependent interactions. © 2018 Elsevier Inc. All rights reserved.

  7. Hydration of Atmospheric Molecular Clusters: Systematic Configurational Sampling.

    PubMed

    Kildgaard, Jens; Mikkelsen, Kurt V; Bilde, Merete; Elm, Jonas

    2018-05-09

    We present a new systematic configurational sampling algorithm for investigating the potential energy surface of hydrated atmospheric molecular clusters. The algo- rithm is based on creating a Fibonacci sphere around each atom in the cluster and adding water molecules to each point in 9 different orientations. To allow the sam- pling of water molecules to existing hydrogen bonds, the cluster is displaced along the hydrogen bond and a water molecule is placed in between in three different ori- entations. Generated redundant structures are eliminated based on minimizing the root mean square distance (RMSD) of different conformers. Initially, the clusters are sampled using the semiempirical PM6 method and subsequently using density func- tional theory (M06-2X and ωB97X-D) with the 6-31++G(d,p) basis set. Applying the developed algorithm we study the hydration of sulfuric acid with up to 15 water molecules. We find that the additions of the first four water molecules "saturate" the sulfuric acid molecule and are more thermodynamically favourable than the addition of water molecule 5-15. Using the large generated set of conformers, we assess the performance of approximate methods (ωB97X-D, M06-2X, PW91 and PW6B95-D3) in calculating the binding energies and assigning the global minimum conformation compared to high level CCSD(T)-F12a/VDZ-F12 reference calculations. The tested DFT functionals systematically overestimates the binding energies compared to cou- pled cluster calculations, and we find that this deficiency can be corrected by a simple scaling factor.

  8. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  9. Galaxy properties in clusters. II. Backsplash galaxies

    NASA Astrophysics Data System (ADS)

    Muriel, H.; Coenda, V.

    2014-04-01

    Aims: We explore the properties of galaxies on the outskirts of clusters and their dependence on recent dynamical history in order to understand the real impact that the cluster core has on the evolution of galaxies. Methods: We analyse the properties of more than 1000 galaxies brighter than M0.1r = - 19.6 on the outskirts of 90 clusters (1 < r/rvir < 2) in the redshift range 0.05 < z < 0.10. Using the line of sight velocity of galaxies relative to the cluster's mean, we selected low and high velocity subsamples. Theoretical predictions indicate that a significant fraction of the first subsample should be backsplash galaxies, that is, objects that have already orbited near the cluster centre. A significant proportion of the sample of high relative velocity (HV) galaxies seems to be composed of infalling objects. Results: Our results suggest that, at fixed stellar mass, late-type galaxies in the low-velocity (LV) sample are systematically older, redder, and have formed fewer stars during the last 3 Gyrs than galaxies in the HV sample. This result is consistent with models that assume that the central regions of clusters are effective in quenching the star formation by means of processes such as ram pressure stripping or strangulation. At fixed stellar mass, LV galaxies show some evidence of having higher surface brightness and smaller size than HV galaxies. These results are consistent with the scenario where galaxies that have orbited the central regions of clusters are more likely to suffer tidal effects, producing loss of mass as well as a re-distribution of matter towards more compact configurations. Finally, we found a higher fraction of ET galaxies in the LV sample, supporting the idea that the central region of clusters of galaxies may contribute to the transformation of morphological types towards earlier types.

  10. Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

    PubMed

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2014-01-01

    Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.

  11. Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images

    NASA Astrophysics Data System (ADS)

    Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang

    2016-10-01

    Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.

  12. Cocoa content influences chocolate molecular profile investigated by MALDI-TOF mass spectrometry.

    PubMed

    Bonatto, Cínthia C; Silva, Luciano P

    2015-06-01

    Chocolate authentication is a key aspect of quality control and safety. Matrix-assisted laser desorption ionization time-of flight (MALDI-TOF) mass spectrometry (MS) has been demonstrated to be useful for molecular profiling of cells, tissues, and even food. The present study evaluated if MALDI-TOF MS analysis on low molecular mass profile may classify chocolate samples according to the cocoa content. The molecular profiles of seven processed commercial chocolate samples were compared by using MALDI-TOF MS. Some ions detected exclusively in chocolate samples corresponded to the metabolites of cocoa or other constituents. This method showed the presence of three distinct clusters according to confectionery and sensorial features of the chocolates and was used to establish a mass spectra database. Also, novel chocolate samples were evaluated in order to check the validity of the method and to challenge the database created with the mass spectra of the primary samples. Thus, the method was shown to be reliable for clustering unknown samples into the main chocolate categories. Simple sample preparation of the MALDI-TOF MS approach described will allow the surveillance and monitoring of constituents during the molecular profiling of chocolates. © 2014 Society of Chemical Industry.

  13. Generation of gas-phase ions from charged clusters: an important ionization step causing suppression of matrix and analyte ions in matrix-assisted laser desorption/ionization mass spectrometry.

    PubMed

    Lou, Xianwen; van Dongen, Joost L J; Milroy, Lech-Gustav; Meijer, E W

    2016-12-30

    Ionization in matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is a very complicated process. It has been reported that quaternary ammonium salts show extremely strong matrix and analyte suppression effects which cannot satisfactorily be explained by charge transfer reactions. Further investigation of the reasons causing these effects can be useful to improve our understanding of the MALDI process. The dried-droplet and modified thin-layer methods were used as sample preparation methods. In the dried-droplet method, analytes were co-crystallized with matrix, whereas in the modified thin-layer method analytes were deposited on the surface of matrix crystals. Model compounds, tetrabutylammonium iodide ([N(Bu) 4 ]I), cesium iodide (CsI), trihexylamine (THA) and polyethylene glycol 600 (PEG 600), were selected as the test analytes given their ability to generate exclusively pre-formed ions, protonated ions and metal ion adducts respectively in MALDI. The strong matrix suppression effect (MSE) observed using the dried-droplet method might disappear using the modified thin-layer method, which suggests that the incorporation of analytes in matrix crystals contributes to the MSE. By depositing analytes on the matrix surface instead of incorporating in the matrix crystals, the competition for evaporation/ionization from charged matrix/analyte clusters could be weakened resulting in reduced MSE. Further supporting evidence for this inference was found by studying the analyte suppression effect using the same two sample deposition methods. By comparing differences between the mass spectra obtained via the two sample preparation methods, we present evidence suggesting that the generation of gas-phase ions from charged matrix/analyte clusters may induce significant suppression of matrix and analyte ions. The results suggest that the generation of gas-phase ions from charged matrix/analyte clusters is an important ionization step in MALDI-MS. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  14. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method.

    PubMed

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-07-01

    The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways and calculating marginal STR distributions. We found that the Y-STR haplotypes from e.g. Finland were relatively homogeneous as opposed to the relatively heterogeneous Y-STR haplotypes from e.g. Lublin, Eastern Poland and Berlin, Germany. We demonstrated that the observed distributions of alleles at each locus were similar to the expected ones. We also compared pairwise distances between geographically separated samples from Africa with those obtained using the AMOVA method and found good agreement. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  15. Dynamic evolution of nearby galaxy clusters

    NASA Astrophysics Data System (ADS)

    Biernacka, M.; Flin, P.

    2011-06-01

    A study of the evolution of 377 rich ACO clusters with redshift z<0.2 is presented. The data concerning galaxies in the investigated clusters were obtained using FOCAS packages applied to Digital Sky Survey I. The 377 galaxy clusters constitute a statistically uniform sample to which visual galaxy/star reclassifications were applied. Cluster shape within 2.0 h-1 Mpc from the adopted cluster centre (the mean and the median of all galaxy coordinates, the position of the brightest and of the third brightest galaxy in the cluster) was determined through its ellipticity calculated using two methods: the covariance ellipse method (hereafter CEM) and the method based on Minkowski functionals (hereafter MFM). We investigated ellipticity dependence on the radius of circular annuli, in which ellipticity was calculated. This was realized by varying the radius from 0.5 to 2 Mpc in steps of 0.25 Mpc. By performing Monte Carlo simulations, we generated clusters to which the two ellipticity methods were applied. We found that the covariance ellipse method works better than the method based on Minkowski functionals. We also found that ellipticity distributions are different for different methods used. Using the ellipticity-redshift relation, we investigated the possibility of cluster evolution in the low-redshift Universe. The correlation of cluster ellipticities with redshifts is undoubtly an indicator of structural evolution. Using the t-Student statistics, we found a statistically significant correlation between ellipticity and redshift at the significance level of α = 0.95. In one of the two shape determination methods we found that ellipticity grew with redshift, while the other method gave opposite results. Monte Carlo simulations showed that only ellipticities calculated at the distance of 1.5 Mpc from cluster centre in the Minkowski functional method are robust enough to be taken into account, but for that radius we did not find any relation between e and z. Since CEM pointed towards the existence of the e(z) relation, we conclude that such an effect is real though rather weak. A detailed study of the e(z) relation showed that the observed relation is nonlinear, and the number of elongated structures grows rapidly for z>0.14.

  16. Markov Chain Monte Carlo Joint Analysis of Chandra X-Ray Imaging Spectroscopy and Sunyaev-Zel'dovich Effect Data

    NASA Technical Reports Server (NTRS)

    Bonamente, Massimillano; Joy, Marshall K.; Carlstrom, John E.; Reese, Erik D.; LaRoque, Samuel J.

    2004-01-01

    X-ray and Sunyaev-Zel'dovich effect data can be combined to determine the distance to galaxy clusters. High-resolution X-ray data are now available from Chandra, which provides both spatial and spectral information, and Sunyaev-Zel'dovich effect data were obtained from the BIMA and Owens Valley Radio Observatory (OVRO) arrays. We introduce a Markov Chain Monte Carlo procedure for the joint analysis of X-ray and Sunyaev- Zel'dovich effect data. The advantages of this method are the high computational efficiency and the ability to measure simultaneously the probability distribution of all parameters of interest, such as the spatial and spectral properties of the cluster gas and also for derivative quantities such as the distance to the cluster. We demonstrate this technique by applying it to the Chandra X-ray data and the OVRO radio data for the galaxy cluster A611. Comparisons with traditional likelihood ratio methods reveal the robustness of the method. This method will be used in follow-up paper to determine the distances to a large sample of galaxy cluster.

  17. Pearson's chi-square test and rank correlation inferences for clustered data.

    PubMed

    Shih, Joanna H; Fay, Michael P

    2017-09-01

    Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  18. Object Tracking Using Adaptive Covariance Descriptor and Clustering-Based Model Updating for Visual Surveillance

    PubMed Central

    Qin, Lei; Snoussi, Hichem; Abdallah, Fahed

    2014-01-01

    We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883

  19. Statistical design and analysis plan for an impact evaluation of an HIV treatment and prevention intervention for female sex workers in Zimbabwe: a study protocol for a cluster randomised controlled trial.

    PubMed

    Hargreaves, James R; Fearon, Elizabeth; Davey, Calum; Phillips, Andrew; Cambiano, Valentina; Cowan, Frances M

    2016-01-05

    Pragmatic cluster-randomised trials should seek to make unbiased estimates of effect and be reported according to CONSORT principles, and the study population should be representative of the target population. This is challenging when conducting trials amongst 'hidden' populations without a sample frame. We describe a pair-matched cluster-randomised trial of a combination HIV-prevention intervention to reduce the proportion of female sex workers (FSW) with a detectable HIV viral load in Zimbabwe, recruiting via respondent driven sampling (RDS). We will cross-sectionally survey approximately 200 FSW at baseline and at endline to characterise each of 14 sites. RDS is a variant of chain referral sampling and has been adapted to approximate random sampling. Primary analysis will use the 'RDS-2' method to estimate cluster summaries and will adapt Hayes and Moulton's '2-step' method to adjust effect estimates for individual-level confounders and further adjust for cluster baseline prevalence. We will adapt CONSORT to accommodate RDS. In the absence of observable refusal rates, we will compare the recruitment process between matched pairs. We will need to investigate whether cluster-specific recruitment or the intervention itself affects the accuracy of the RDS estimation process, potentially causing differential biases. To do this, we will calculate RDS-diagnostic statistics for each cluster at each time point and compare these statistics within matched pairs and time points. Sensitivity analyses will assess the impact of potential biases arising from assumptions made by the RDS-2 estimation. We are not aware of any other completed pragmatic cluster RCTs that are recruiting participants using RDS. Our statistical design and analysis approach seeks to transparently document participant recruitment and allow an assessment of the representativeness of the study to the target population, a key aspect of pragmatic trials. The challenges we have faced in the design of this trial are likely to be shared in other contexts aiming to serve the needs of legally and/or socially marginalised populations for which no sampling frame exists and especially when the social networks of participants are both the target of intervention and the means of recruitment. The trial was registered at Pan African Clinical Trials Registry (PACTR201312000722390) on 9 December 2013.

  20. The observed clustering of damaging extratropical cyclones in Europe

    NASA Astrophysics Data System (ADS)

    Cusack, Stephen

    2016-04-01

    The clustering of severe European windstorms on annual timescales has substantial impacts on the (re-)insurance industry. Our knowledge of the risk is limited by large uncertainties in estimates of clustering from typical historical storm data sets covering the past few decades. Eight storm data sets are gathered for analysis in this study in order to reduce these uncertainties. Six of the data sets contain more than 100 years of severe storm information to reduce sampling errors, and observational errors are reduced by the diversity of information sources and analysis methods between storm data sets. All storm severity measures used in this study reflect damage, to suit (re-)insurance applications. The shortest storm data set of 42 years provides indications of stronger clustering with severity, particularly for regions off the main storm track in central Europe and France. However, clustering estimates have very large sampling and observational errors, exemplified by large changes in estimates in central Europe upon removal of one stormy season, 1989/1990. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm data sets show increased clustering between more severe storms from return periods (RPs) of 0.5 years to the longest measured RPs of about 20 years. Further, they contain signs of stronger clustering off the main storm track, and weaker clustering for smaller-sized areas, though these signals are more uncertain as they are drawn from smaller data samples. These new ultra-long storm data sets provide new information on clustering to improve our management of this risk.

  1. The environment of x ray selected BL Lacs: Host galaxies and galaxy clustering

    NASA Technical Reports Server (NTRS)

    Wurtz, Ron; Stocke, John T.; Ellingson, Erica; Yee, Howard K. C.

    1993-01-01

    Using the Canada-France-Hawaii Telescope, we have imaged a complete, flux-limited sample of Einstein Medium Sensitivity Survey BL Lacertae objects in order to study the properties of BL Lac host galaxies and to use quantitative methods to determine the richness of their galaxy cluster environments.

  2. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    NASA Astrophysics Data System (ADS)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  3. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

    PubMed

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-03-13

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  4. Survey methods for assessing land cover map accuracy

    USGS Publications Warehouse

    Nusser, S.M.; Klaas, E.E.

    2003-01-01

    The increasing availability of digital photographic materials has fueled efforts by agencies and organizations to generate land cover maps for states, regions, and the United States as a whole. Regardless of the information sources and classification methods used, land cover maps are subject to numerous sources of error. In order to understand the quality of the information contained in these maps, it is desirable to generate statistically valid estimates of accuracy rates describing misclassification errors. We explored a full sample survey framework for creating accuracy assessment study designs that balance statistical and operational considerations in relation to study objectives for a regional assessment of GAP land cover maps. We focused not only on appropriate sample designs and estimation approaches, but on aspects of the data collection process, such as gaining cooperation of land owners and using pixel clusters as an observation unit. The approach was tested in a pilot study to assess the accuracy of Iowa GAP land cover maps. A stratified two-stage cluster sampling design addressed sample size requirements for land covers and the need for geographic spread while minimizing operational effort. Recruitment methods used for private land owners yielded high response rates, minimizing a source of nonresponse error. Collecting data for a 9-pixel cluster centered on the sampled pixel was simple to implement, and provided better information on rarer vegetation classes as well as substantial gains in precision relative to observing data at a single-pixel.

  5. The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

    NASA Astrophysics Data System (ADS)

    Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

    2012-07-01

    We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.

  6. A new method to search for high-redshift clusters using photometric redshifts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castignani, G.; Celotti, A.; Chiaberge, M.

    2014-09-10

    We describe a new method (Poisson probability method, PPM) to search for high-redshift galaxy clusters and groups by using photometric redshift information and galaxy number counts. The method relies on Poisson statistics and is primarily introduced to search for megaparsec-scale environments around a specific beacon. The PPM is tailored to both the properties of the FR I radio galaxies in the Chiaberge et al. sample, which are selected within the COSMOS survey, and to the specific data set used. We test the efficiency of our method of searching for cluster candidates against simulations. Two different approaches are adopted. (1) Wemore » use two z ∼ 1 X-ray detected cluster candidates found in the COSMOS survey and we shift them to higher redshift up to z = 2. We find that the PPM detects the cluster candidates up to z = 1.5, and it correctly estimates both the redshift and size of the two clusters. (2) We simulate spherically symmetric clusters of different size and richness, and we locate them at different redshifts (i.e., z = 1.0, 1.5, and 2.0) in the COSMOS field. We find that the PPM detects the simulated clusters within the considered redshift range with a statistical 1σ redshift accuracy of ∼0.05. The PPM is an efficient alternative method for high-redshift cluster searches that may also be applied to both present and future wide field surveys such as SDSS Stripe 82, LSST, and Euclid. Accurate photometric redshifts and a survey depth similar or better than that of COSMOS (e.g., I < 25) are required.« less

  7. Clustered Single Cellulosic Fiber Dissolution Kinetics and Mechanisms through Optical Microscopy under Limited Dissolving Conditions.

    PubMed

    Mäkelä, Valtteri; Wahlström, Ronny; Holopainen-Mantila, Ulla; Kilpeläinen, Ilkka; King, Alistair W T

    2018-05-14

    Herein, we describe a new method of assessing the kinetics of dissolution of single fibers by dissolution under limited dissolving conditions. The dissolution is followed by optical microscopy under limited dissolving conditions. Videos of the dissolution were processed in ImageJ to yield kinetics for dissolution, based on the disappearance of pixels associated with intact fibers. Data processing was performed using the Python language, utilizing available scientific libraries. The methods of processing the data include clustering of the single fiber data, identifying clusters associated with different fiber types, producing average dissolution traces and also extraction of practical parameters, such as, time taken to dissolve 25, 50, 75, 95, and 99.5% of the clustered fibers. In addition to these simple parameters, exponential fitting was also performed yielding rate constants for fiber dissolution. Fits for sample and cluster averages were variable, although demonstrating first-order kinetics for dissolution overall. To illustrate this process, two reference pulps (a bleached softwood kraft pulp and a bleached hardwood pre-hydrolysis kraft pulp) and their cellulase-treated versions were analyzed. As expected, differences in the kinetics and dissolution mechanisms between these samples were observed. Our initial interpretations are presented, based on the combined mechanistic observations and single fiber dissolution kinetics for these different samples. While the dissolution mechanisms observed were similar to those published previously, the more direct link of mechanistic information with the kinetics improve our understanding of cell wall structure and pre-treatments, toward improved processability.

  8. Rheological Characterization and Cluster Classification of Iranian Commercial Foods, Drinks and Desserts to Recommend for Esophageal Dysphagia Diets

    PubMed Central

    ZARGARAAN, Azizollaah; OMARAEE, Yasaman; RASTMANESH, Reza; TAHERI, Negin; FADAVI, Ghasem; FADAEI, Morteza; MOHAMMADIFAR, Mohammad Amin

    2013-01-01

    Abstract Background In the absence of dysphagia-oriented food products, rheological characterization of available food items is of importance for safe swallowing and adequate nutrient intake of dysphagic patients. In this way, introducing alternative items (with similar ease of swallow) is helpful to improve quality of life and nutritional intake of esophageal cancer dysphagia patients. The present study aimed at rheological characterization and cluster classification of potentially suitable foodstuffs marketed in Iran for their possible use in dysphagia diets. Methods In this descriptive study, rheological data were obtained during January and February 2012 in Rheology Lab of National Nutrition and Food Technology Research Institute Tehran, Iran. Steady state and oscillatory shear parameters of 39 commercial samples were obtained using a Physica MCR 301 rheometer (Anton-Paar, GmbH, Graz, Austria). Matlab Fuzzy Logic Toolbox (R2012 a) was utilized for cluster classification of the samples. Results Using an extended list of rheological parameters and fuzzy logic methods, 39 commercial samples (drinks, main courses and desserts) were divided to 5 clusters and degree of membership to each cluster was stated by a number between 0 and 0.99. Conclusion Considering apparent viscosity of foodstuffs as a single criterion for classification of dysphagia-oriented food products is shortcoming of current guidelines in dysphagia diets. Authors proposed to some revisions in classification of dysphagia-oriented food products and including more rheological parameters (especially, viscoelastic parameters) in the classification. PMID:26060647

  9. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  10. Developing cluster strategy of apples dodol SMEs by integration K-means clustering and analytical hierarchy process method

    NASA Astrophysics Data System (ADS)

    Mustaniroh, S. A.; Effendi, U.; Silalahi, R. L. R.; Sari, T.; Ala, M.

    2018-03-01

    The purposes of this research were to determine the grouping of apples dodol small and medium enterprises (SMEs) in Batu City and to determine an appropriate development strategy for each cluster. The methods used for clustering SMEs was k-means. The Analytical Hierarchy Process (AHP) approach was then applied to determine the development strategy priority for each cluster. The variables used in grouping include production capacity per month, length of operation, investment value, average sales revenue per month, amount of SMEs assets, and the number of workers. Several factors were considered in AHP include industry cluster, government, as well as related and supporting industries. Data was collected using the methods of questionaire and interviews. SMEs respondents were selected among SMEs appels dodol in Batu City using purposive sampling. The result showed that two clusters were formed from five apples dodol SMEs. The 1stcluster of apples dodol SMEs, classified as small enterprises, included SME A, SME C, and SME D. The 2ndcluster of SMEs apples dodol, classified as medium enterprises, consisted of SME B and SME E. The AHP results indicated that the priority development strategy for the 1stcluster of apples dodol SMEs was improving quality and the product standardisation, while for the 2nd cluster was increasing the marketing access.

  11. Finite temperature properties of clusters by replica exchange metadynamics: the water nonamer.

    PubMed

    Zhai, Yingteng; Laio, Alessandro; Tosatti, Erio; Gong, Xin-Gao

    2011-03-02

    We introduce an approach for the accurate calculation of thermal properties of classical nanoclusters. On the basis of a recently developed enhanced sampling technique, replica exchange metadynamics, the method yields the true free energy of each relevant cluster structure, directly sampling its basin and measuring its occupancy in full equilibrium. All entropy sources, whether vibrational, rotational anharmonic, or especially configurational, the latter often forgotten in many cluster studies, are automatically included. For the present demonstration, we choose the water nonamer (H(2)O)(9), an extremely simple cluster, which nonetheless displays a sufficient complexity and interesting physics in its relevant structure spectrum. Within a standard TIP4P potential description of water, we find that the nonamer second relevant structure possesses a higher configurational entropy than the first, so that the two free energies surprisingly cross for increasing temperature.

  12. Finite Temperature Properties of Clusters by Replica Exchange Metadynamics: The Water Nonamer

    NASA Astrophysics Data System (ADS)

    Zhai, Yingteng; Laio, Alessandro; Tosatti, Erio; Gong, Xingao

    2012-02-01

    We introduce an approach for the accurate calculation of thermal properties of classical nanoclusters. Based on a recently developed enhanced sampling technique, replica exchange metadynamics, the method yields the true free energy of each relevant cluster structure, directly sampling its basin and measuring its occupancy in full equilibrium. All entropy sources, whether vibrational, rotational anharmonic and especially configurational -- the latter often forgotten in many cluster studies -- are automatically included. For the present demonstration we choose the water nonamer (H2O)9, an extremely simple cluster which nonetheless displays a sufficient complexity and interesting physics in its relevant structure spectrum. Within a standard TIP4P potential description of water, we find that the nonamer second relevant structure possesses a higher configurational entropy than the first, so that the two free energies surprisingly cross for increasing temperature.

  13. Evolution of the degree of substructures in simulated galaxy clusters

    NASA Astrophysics Data System (ADS)

    De Boni, Cristiano; Böhringer, Hans; Chon, Gayoung; Dolag, Klaus

    2018-05-01

    We study the evolution of substructure in the mass distribution with mass, redshift and radius in a sample of simulated galaxy clusters. The sample, containing 1226 objects, spans the mass range M200 = 1014 - 1.74 × 1015 M⊙ h-1 in six redshift bins from z = 0 to z = 1.179. We consider three different diagnostics: 1) subhalos identified with SUBFIND; 2) overdense regions localized by dividing the cluster into octants; 3) offset between the potential minimum and the center of mass. The octant analysis is a new method that we introduce in this work. We find that none of the diagnostics indicate a correlation between the mass of the cluster and the fraction of substructures. On the other hand, all the diagnostics suggest an evolution of substructures with redshift. For SUBFIND halos, the mass fraction is constant with redshift at Rvir, but shows a mild evolution at R200 and R500. Also, the fraction of clusters with at least a subhalo more massive than one thirtieth of the total mass is less than 20%. Our new method based on the octants returns a mass fraction in substructures which has a strong evolution with redshift at all radii. The offsets also evolve strongly with redshift. We also find a strong correlation for individual clusters between the offset and the fraction of substructures identified with the octant analysis. Our work puts strong constraints on the amount of substructures we expect to find in galaxy clusters and on their evolution with redshift.

  14. The observed clustering of damaging extra-tropical cyclones in Europe

    NASA Astrophysics Data System (ADS)

    Cusack, S.

    2015-12-01

    The clustering of severe European windstorms on annual timescales has substantial impacts on the re/insurance industry. Management of the risk is impaired by large uncertainties in estimates of clustering from historical storm datasets typically covering the past few decades. The uncertainties are unusually large because clustering depends on the variance of storm counts. Eight storm datasets are gathered for analysis in this study in order to reduce these uncertainties. Six of the datasets contain more than 100~years of severe storm information to reduce sampling errors, and the diversity of information sources and analysis methods between datasets sample observational errors. All storm severity measures used in this study reflect damage, to suit re/insurance applications. It is found that the shortest storm dataset of 42 years in length provides estimates of clustering with very large sampling and observational errors. The dataset does provide some useful information: indications of stronger clustering for more severe storms, particularly for southern countries off the main storm track. However, substantially different results are produced by removal of one stormy season, 1989/1990, which illustrates the large uncertainties from a 42-year dataset. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm datasets show a greater degree of clustering with increasing storm severity and suggest clustering of severe storms is much more material than weaker storms. Further, they contain signs of stronger clustering in areas off the main storm track, and weaker clustering for smaller-sized areas, though these signals are smaller than uncertainties in actual values. Both the improvement of existing storm records and development of new historical storm datasets would help to improve management of this risk.

  15. Inferring Viral Dynamics in Chronically HCV Infected Patients from the Spatial Distribution of Infected Hepatocytes

    DOE PAGES

    Graw, Frederik; Balagopal, Ashwin; Kandathil, Abraham J.; ...

    2014-11-13

    Chronic liver infection by hepatitis C virus (HCV) is a major public health concern. Despite partly successful treatment options, several aspects of intrahepatic HCV infection dynamics are still poorly understood, including the preferred mode of viral propagation, as well as the proportion of infected hepatocytes. Answers to these questions have important implications for the development of therapeutic interventions. In this study, we present methods to analyze the spatial distribution of infected hepatocytes obtained by single cell laser capture microdissection from liver biopsy samples of patients chronically infected with HCV. By characterizing the internal structure of clusters of infected cells, wemore » are able to evaluate hypotheses about intrahepatic infection dynamics. We found that individual clusters on biopsy samples range in size from 4-50 infected cells. In addition, the HCV RNA content in a cluster declines from the cell that presumably founded the cluster to cells at the maximal cluster extension. These observations support the idea that HCV infection in the liver is seeded randomly (e.g. from the blood) and then spreads locally. Assuming that the amount of intracellular HCV RNA is a proxy for how long a cell has been infected, we estimate based on models of intracellular HCV RNA replication and accumulation that cells in clusters have been infected on average for less than a week. Further, we do not find a relationship between the cluster size and the estimated cluster expansion time. Lastly, our method represents a novel approach to make inferences about infection dynamics in solid tissues from static spatial data.« less

  16. Inferring Viral Dynamics in Chronically HCV Infected Patients from the Spatial Distribution of Infected Hepatocytes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Graw, Frederik; Balagopal, Ashwin; Kandathil, Abraham J.

    Chronic liver infection by hepatitis C virus (HCV) is a major public health concern. Despite partly successful treatment options, several aspects of intrahepatic HCV infection dynamics are still poorly understood, including the preferred mode of viral propagation, as well as the proportion of infected hepatocytes. Answers to these questions have important implications for the development of therapeutic interventions. In this study, we present methods to analyze the spatial distribution of infected hepatocytes obtained by single cell laser capture microdissection from liver biopsy samples of patients chronically infected with HCV. By characterizing the internal structure of clusters of infected cells, wemore » are able to evaluate hypotheses about intrahepatic infection dynamics. We found that individual clusters on biopsy samples range in size from 4-50 infected cells. In addition, the HCV RNA content in a cluster declines from the cell that presumably founded the cluster to cells at the maximal cluster extension. These observations support the idea that HCV infection in the liver is seeded randomly (e.g. from the blood) and then spreads locally. Assuming that the amount of intracellular HCV RNA is a proxy for how long a cell has been infected, we estimate based on models of intracellular HCV RNA replication and accumulation that cells in clusters have been infected on average for less than a week. Further, we do not find a relationship between the cluster size and the estimated cluster expansion time. Lastly, our method represents a novel approach to make inferences about infection dynamics in solid tissues from static spatial data.« less

  17. Preparation and characterization of chemically defined oligomers of rabbit immunoglobulin G molecules for the complement binding studies.

    PubMed Central

    Wright, J K; Tschopp, J; Jaton, J C

    1980-01-01

    Pure dimers, trimers, tetramers and pentamers of rabbit non-immune IgG (immunoglobulin G) or antibody IgG were prepared by polymerization in the presence of the bifunctional cross-linking reagent dithiobis (succinimidylpropionate). Oligomerization was performed either in the presence of polysaccharide antigen and specific monomeric antibody (method A) or by random cross-linking of non-immune rabbit IgG in the absence of antigen (method B). By repeated gel-filtration chromatography, samples prepared by both methods exhibited a single band in analytical sodium dodecyl sulphate/polyacrylamide-gel electrophoresis. The electrophoretic mobilities of samples prepared by method A were slightly greater than those for the corresponding samples prepared by method B. This might suggest a role played by antigen in the orientation of IgG molecules within the clusters, which may be more compact than those formed by random cross-linking. The average numbers of cross-linker molecules per oligomer varied between 3 and 6 for clusters made by method A and between 1 and 3 for clusters made by method B. Ultracentrifugal analyses of the oligomers yielded sedimentation coefficients (S20,w) of 9.6S for the dimer, 11.2S for the trimer, 13.6S for the tetramer and 16.1S for the pentamer. Comparison of the observed sedimentation coefficients with those predicted by various hydrodynamic models suggested these oligomers possessed open and linear structures. Reduction of the cross-linking molecules converted oligomers into monomeric species of IgG. C.d. spectra of some oligomers studied in the range 200-250 nm were essentially the same as that of monomeric IgG molecules, thus strongly suggesting no major conformation changes in IgG molecules within clusters. These oligomers were found to be stable for up to 2 months when stored at -70 degrees C. Images Fig. 1. Fig. 4. PMID:7188424

  18. Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

    PubMed

    Stefurak, Tres; Calhoun, Georgia B

    2007-01-01

    The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.

  19. Early Results from Swift AGN and Cluster Survey

    NASA Astrophysics Data System (ADS)

    Dai, Xinyu; Griffin, Rhiannon; Nugent, Jenna; Kochanek, Christopher S.; Bregman, Joel N.

    2016-04-01

    The Swift AGN and Cluster Survey (SACS) uses 125 deg^2 of Swift X-ray Telescope serendipitous fields with variable depths surrounding gamma-ray bursts to provide a medium depth (4 × 10^-15 erg cm^-2 s^-1) and area survey filling the gap between deep, narrow Chandra/XMM-Newton surveys and wide, shallow ROSAT surveys. Here, we present the first two papers in a series of publications for SACS. In the first paper, we introduce our method and catalog of 22,563 point sources and 442 extended sources. SACS provides excellent constraints on the AGN and cluster number counts at the bright end with negligible uncertainties due to cosmic variance, and these constraints are consistent with previous measurements. The depth and areal coverage of SACS is well suited for galaxy cluster surveys outside the local universe, reaching z > 1 for massive clusters. In the second paper, we use SDSS DR8 data to study the 203 extended SACS sources that are located within the SDSS footprint. We search for galaxy over-densities in 3-D space using SDSS galaxies and their photometric redshifts near the Swift galaxy cluster candidates. We find 103 Swift clusters with a > 3σ over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmations as galaxy clusters. We present a series of cluster properties including the redshift, BCG magnitude, BCG-to-X-ray center offset, optical richness, X-ray luminosity and red sequences. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≤ 0.3 and 80% complete for z ≤ 0.4, consistent with the survey depth of SDSS. These analysis results suggest that our Swift cluster selection algorithm presented in our first paper has yielded a statistically well-defined cluster sample for further studying cluster evolution and cosmology. In the end, we will discuss our ongoing optical identification of z>0.5 cluster sample, using MDM, KPNO, CTIO, and Magellan data, and discuss SACS as a pilot for eROSITA deep surveys.

  20. Do X-ray dark or underluminous galaxy clusters exist?

    NASA Astrophysics Data System (ADS)

    Andreon, S.; Moretti, A.

    2011-12-01

    We study the X-ray properties of a color-selected sample of clusters at 0.1 < z < 0.3, to quantify the real aboundance of the population of X-ray dark or underluminous clusters and at the same time the spurious detection contamination level of color-selected cluster catalogs. Starting from a local sample of color-selected clusters, we restrict our attention to those with sufficiently deep X-ray observations to probe their X-ray luminosity down to very faint values and without introducing any X-ray bias. This allowed us to have an X-ray- unbiased sample of 33 clusters to measure the LX-richness relation. Swift 1.4 Ms X-ray observations show that at least 89% of the color-detected clusters are real objects with a potential well deep enough to heat and retain an intracluster medium. The percentage rises to 94% when one includes the single spectroscopically confirmed color-selected cluster whose X-ray emission is not secured. Looking at our results from the opposite perspective, the percentage of X-ray dark clusters among color-selected clusters is very low: at most about 11 per cent (at 90% confidence). Supplementing our data with those from literature, we conclude that X-ray- and color- cluster surveys sample the same population and consequently that in this regard we can safely use clusters selected with any of the two methods for cosmological purposes. This is an essential and promising piece of information for upcoming surveys in both the optical/IR (DES, EUCLID) and X-ray (eRosita). Richness correlates with X-ray luminosity with a large scatter, 0.51 ± 0.08 (0.44 ± 0.07) dex in lgLX at a given richness, when Lx is measured in a 500 (1070) kpc aperture. We release data and software to estimate the X-ray flux, or its upper limit, of a source with over-Poisson background fluctuations (found in this work to be ~20% on cluster angular scales) and to fit X-ray luminosity vs richness if there is an intrinsic scatter. These Bayesian applications rigorously account for boundaries (e.g., the X-ray luminosity and the richness cannot be negative).

  1. Application of adaptive cluster sampling to low-density populations of freshwater mussels

    USGS Publications Warehouse

    Smith, D.R.; Villella, R.F.; Lemarie, D.P.

    2003-01-01

    Freshwater mussels appear to be promising candidates for adaptive cluster sampling because they are benthic macroinvertebrates that cluster spatially and are frequently found at low densities. We applied adaptive cluster sampling to estimate density of freshwater mussels at 24 sites along the Cacapon River, WV, where a preliminary timed search indicated that mussels were present at low density. Adaptive cluster sampling increased yield of individual mussels and detection of uncommon species; however, it did not improve precision of density estimates. Because finding uncommon species, collecting individuals of those species, and estimating their densities are important conservation activities, additional research is warranted on application of adaptive cluster sampling to freshwater mussels. However, at this time we do not recommend routine application of adaptive cluster sampling to freshwater mussel populations. The ultimate, and currently unanswered, question is how to tell when adaptive cluster sampling should be used, i.e., when is a population sufficiently rare and clustered for adaptive cluster sampling to be efficient and practical? A cost-effective procedure needs to be developed to identify biological populations for which adaptive cluster sampling is appropriate.

  2. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    PubMed Central

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  3. Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora.

    PubMed

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M Teresa; Martín, María P

    2009-07-29

    Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence.

  4. Molecular Taxonomy of Phytopathogenic Fungi: A Case Study in Peronospora

    PubMed Central

    Göker, Markus; García-Blázquez, Gema; Voglmayr, Hermann; Tellería, M. Teresa; Martín, María P.

    2009-01-01

    Background Inappropriate taxon definitions may have severe consequences in many areas. For instance, biologically sensible species delimitation of plant pathogens is crucial for measures such as plant protection or biological control and for comparative studies involving model organisms. However, delimiting species is challenging in the case of organisms for which often only molecular data are available, such as prokaryotes, fungi, and many unicellular eukaryotes. Even in the case of organisms with well-established morphological characteristics, molecular taxonomy is often necessary to emend current taxonomic concepts and to analyze DNA sequences directly sampled from the environment. Typically, for this purpose clustering approaches to delineate molecular operational taxonomic units have been applied using arbitrary choices regarding the distance threshold values, and the clustering algorithms. Methodology Here, we report on a clustering optimization method to establish a molecular taxonomy of Peronospora based on ITS nrDNA sequences. Peronospora is the largest genus within the downy mildews, which are obligate parasites of higher plants, and includes various economically important pathogens. The method determines the distance function and clustering setting that result in an optimal agreement with selected reference data. Optimization was based on both taxonomy-based and host-based reference information, yielding the same outcome. Resampling and permutation methods indicate that the method is robust regarding taxon sampling and errors in the reference data. Tests with newly obtained ITS sequences demonstrate the use of the re-classified dataset in molecular identification of downy mildews. Conclusions A corrected taxonomy is provided for all Peronospora ITS sequences contained in public databases. Clustering optimization appears to be broadly applicable in automated, sequence-based taxonomy. The method connects traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both traditional species concepts and genetic divergence. PMID:19641601

  5. Self-similarity of temperature profiles in distant galaxy clusters: the quest for a universal law

    NASA Astrophysics Data System (ADS)

    Baldi, A.; Ettori, S.; Molendi, S.; Gastaldello, F.

    2012-09-01

    Context. We present the XMM-Newton temperature profiles of 12 bright (LX > 4 × 1044 erg s-1) clusters of galaxies at 0.4 < z < 0.9, having an average temperature in the range 5 ≲ kT ≲ 11 keV. Aims: The main goal of this paper is to study for the first time the temperature profiles of a sample of high-redshift clusters, to investigate their properties, and to define a universal law to describe the temperature radial profiles in galaxy clusters as a function of both cosmic time and their state of relaxation. Methods: We performed a spatially resolved spectral analysis, using Cash statistics, to measure the temperature in the intracluster medium at different radii. Results: We extracted temperature profiles for the clusters in our sample, finding that all profiles are declining toward larger radii. The normalized temperature profiles (normalized by the mean temperature T500) are found to be generally self-similar. The sample was subdivided into five cool-core (CC) and seven non cool-core (NCC) clusters by introducing a pseudo-entropy ratio σ = (TIN/TOUT) × (EMIN/EMOUT)-1/3 and defining the objects with σ < 0.6 as CC clusters and those with σ ≥ 0.6 as NCC clusters. The profiles of CC and NCC clusters differ mainly in the central regions, with the latter exhibiting a slightly flatter central profile. A significant dependence of the temperature profiles on the pseudo-entropy ratio σ is detected by fitting a function of r and σ, showing an indication that the outer part of the profiles becomes steeper for higher values of σ (i.e. transitioning toward the NCC clusters). No significant evidence of redshift evolution could be found within the redshift range sampled by our clusters (0.4 < z < 0.9). A comparison of our high-z sample with intermediate clusters at 0.1 < z < 0.3 showed how the CC and NCC cluster temperature profiles have experienced some sort of evolution. This can happen because higher z clusters are at a less advanced stage of their formation and did not have enough time to create a relaxed structure, which is characterized by a central temperature dip in CC clusters and by flatter profiles in NCC clusters. Conclusions: This is the first time that a systematic study of the temperature profiles of galaxy clusters at z > 0.4 has been attempted. We were able to define the closest possible relation to a universal law for the temperature profiles of galaxy clusters at 0.1 < z < 0.9, showing a dependence on both the relaxation state of the clusters and the redshift. Appendix A is only available in electronic form at http://www.aanda.org

  6. A Fast Projection-Based Algorithm for Clustering Big Data.

    PubMed

    Wu, Yun; He, Zhiquan; Lin, Hao; Zheng, Yufei; Zhang, Jingfen; Xu, Dong

    2018-06-07

    With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.

  7. Probability of coincidental similarity among the orbits of small bodies - I. Pairing

    NASA Astrophysics Data System (ADS)

    Jopek, Tadeusz Jan; Bronikowska, Małgorzata

    2017-09-01

    Probability of coincidental clustering among orbits of comets, asteroids and meteoroids depends on many factors like: the size of the orbital sample searched for clusters or the size of the identified group, it is different for groups of 2,3,4,… members. Probability of coincidental clustering is assessed by the numerical simulation, therefore, it depends also on the method used for the synthetic orbits generation. We have tested the impact of some of these factors. For a given size of the orbital sample we have assessed probability of random pairing among several orbital populations of different sizes. We have found how these probabilities vary with the size of the orbital samples. Finally, keeping fixed size of the orbital sample we have shown that the probability of random pairing can be significantly different for the orbital samples obtained by different observation techniques. Also for the user convenience we have obtained several formulae which, for given size of the orbital sample can be used to calculate the similarity threshold corresponding to the small value of the probability of coincidental similarity among two orbits.

  8. ATCA observations of the MACS-Planck Radio Halo Cluster Project. II. Radio observations of an intermediate redshift cluster sample

    NASA Astrophysics Data System (ADS)

    Martinez Aviles, G.; Johnston-Hollitt, M.; Ferrari, C.; Venturi, T.; Democles, J.; Dallacasa, D.; Cassano, R.; Brunetti, G.; Giacintucci, S.; Pratt, G. W.; Arnaud, M.; Aghanim, N.; Brown, S.; Douspis, M.; Hurier, J.; Intema, H. T.; Langer, M.; Macario, G.; Pointecouteau, E.

    2018-04-01

    Aim. A fraction of galaxy clusters host diffuse radio sources whose origins are investigated through multi-wavelength studies of cluster samples. We investigate the presence of diffuse radio emission in a sample of seven galaxy clusters in the largely unexplored intermediate redshift range (0.3 < z < 0.44). Methods: In search of diffuse emission, deep radio imaging of the clusters are presented from wide band (1.1-3.1 GHz), full resolution ( 5 arcsec) observations with the Australia Telescope Compact Array (ATCA). The visibilities were also imaged at lower resolution after point source modelling and subtraction and after a taper was applied to achieve better sensitivity to low surface brightness diffuse radio emission. In case of non-detection of diffuse sources, we set upper limits for the radio power of injected diffuse radio sources in the field of our observations. Furthermore, we discuss the dynamical state of the observed clusters based on an X-ray morphological analysis with XMM-Newton. Results: We detect a giant radio halo in PSZ2 G284.97-23.69 (z = 0.39) and a possible diffuse source in the nearly relaxed cluster PSZ2 G262.73-40.92 (z = 0.421). Our sample contains three highly disturbed massive clusters without clear traces of diffuse emission at the observed frequencies. We were able to inject modelled radio haloes with low values of total flux density to set upper detection limits; however, with our high-frequency observations we cannot exclude the presence of RH in these systems because of the sensitivity of our observations in combination with the high z of the observed clusters. The reduced images are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/611/A94

  9. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.

    PubMed

    Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra

    2016-11-20

    The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. X-ray versus infrared selection of distant galaxy clusters: A case study using the XMM-LSS and SpARCS cluster samples

    NASA Astrophysics Data System (ADS)

    Willis, J. P.; Ramos-Ceja, M. E.; Muzzin, A.; Pacaud, F.; Yee, H. K. C.; Wilson, G.

    2018-04-01

    We present a comparison of two samples of z > 0.8 galaxy clusters selected using different wavelength-dependent techniques and examine the physical differences between them. We consider 18 clusters from the X-ray selected XMM-LSS distant cluster survey and 92 clusters from the optical-MIR selected SpARCS cluster survey. Both samples are selected from the same approximately 9 square degree sky area and we examine them using common XMM-Newton, Spitzer-SWIRE and CFHT Legacy Survey data. Clusters from each sample are compared employing aperture measures of X-ray and MIR emission. We divide the SpARCS distant cluster sample into three sub-samples: a) X-ray bright, b) X-ray faint, MIR bright, and c) X-ray faint, MIR faint clusters. We determine that X-ray and MIR selected clusters display very similar surface brightness distributions of galaxy MIR light. In addition, the average location and amplitude of the galaxy red sequence as measured from stacked colour histograms is very similar in the X-ray and MIR-selected samples. The sub-sample of X-ray faint, MIR bright clusters displays a distribution of BCG-barycentre position offsets which extends to higher values than all other samples. This observation indicates that such clusters may exist in a more disturbed state compared to the majority of the distant cluster population sampled by XMM-LSS and SpARCS. This conclusion is supported by stacked X-ray images for the X-ray faint, MIR bright cluster sub-sample that display weak, centrally-concentrated X-ray emission, consistent with a population of growing clusters accreting from an extended envelope of material.

  11. Whole-genome analysis of mycobacteria from birds at the San Diego Zoo

    PubMed Central

    Pfeiffer, Wayne; Braun, Josephine; Burchell, Jennifer; Witte, Carmel L.; Rideout, Bruce A.

    2017-01-01

    Methods Mycobacteria isolated from more than 100 birds diagnosed with avian mycobacteriosis at the San Diego Zoo and its Safari Park were cultured postmortem and had their whole genomes sequenced. Computational workflows were developed and applied to identify the mycobacterial species in each DNA sample, to find single-nucleotide polymorphisms (SNPs) between samples of the same species, to further differentiate SNPs between as many as three different genotypes within a single sample, and to identify which samples are closely clustered genomically. Results Nine species of mycobacteria were found in 123 samples from 105 birds. The most common species were Mycobacterium avium and Mycobacterium genavense, which were in 49 and 48 birds, respectively. Most birds contained only a single mycobacterial species, but two birds contained a mixture of two species. The M. avium samples represent diverse strains of M. avium avium and M. avium hominissuis, with many pairs of samples differing by hundreds or thousands of SNPs across their common genome. By contrast, the M. genavense samples are much closer genomically; samples from 46 of 48 birds differ from each other by less than 110 SNPs. Some birds contained two, three, or even four genotypes of the same bacterial species. Such infections were found in 4 of 49 birds (8%) with M. avium and in 11 of 48 birds (23%) with M. genavense. Most were mixed infections, in which the bird was infected by multiple mycobacterial strains, but three infections with two genotypes differing by ≤ 10 SNPs were likely the result of within-host evolution. The samples from 31 birds with M. avium can be grouped into nine clusters within which any sample is ≤ 12 SNPs from at least one other sample in the cluster. Similarly, the samples from 40 birds with M. genavense can be grouped into ten such clusters. Information about these genomic clusters is being used in an ongoing, companion study of mycobacterial transmission to help inform management of bird collections. PMID:28267758

  12. Cherry-picking functionally relevant substates from long md trajectories using a stratified sampling approach.

    PubMed

    Chandramouli, Balasubramanian; Mancini, Giordano

    2016-01-01

    Classical Molecular Dynamics (MD) simulations can provide insights at the nanoscopic scale into protein dynamics. Currently, simulations of large proteins and complexes can be routinely carried out in the ns-μs time regime. Clustering of MD trajectories is often performed to identify selective conformations and to compare simulation and experimental data coming from different sources on closely related systems. However, clustering techniques are usually applied without a careful validation of results and benchmark studies involving the application of different algorithms to MD data often deal with relatively small peptides instead of average or large proteins; finally clustering is often applied as a means to analyze refined data and also as a way to simplify further analysis of trajectories. Herein, we propose a strategy to classify MD data while carefully benchmarking the performance of clustering algorithms and internal validation criteria for such methods. We demonstrate the method on two showcase systems with different features, and compare the classification of trajectories in real and PCA space. We posit that the prototype procedure adopted here could be highly fruitful in clustering large trajectories of multiple systems or that resulting especially from enhanced sampling techniques like replica exchange simulations. Copyright: © 2016 by Fabrizio Serra editore, Pisa · Roma.

  13. Microfluidic cell isolation technology for drug testing of single tumor cells and their clusters.

    PubMed

    Bithi, Swastika S; Vanapalli, Siva A

    2017-02-02

    Drug assays with patient-derived cells such as circulating tumor cells requires manipulating small sample volumes without loss of rare disease-causing cells. Here, we report an effective technology for isolating and analyzing individual tumor cells and their clusters from minute sample volumes using an optimized microfluidic device integrated with pipettes. The method involves using hand pipetting to create an array of cell-laden nanoliter-sized droplets immobilized in a microfluidic device without loss of tumor cells during the pipetting process. Using this technology, we demonstrate single-cell analysis of tumor cell response to the chemotherapy drug doxorubicin. We find that even though individual tumor cells display diverse uptake profiles of the drug, the onset of apoptosis is determined by accumulation of a critical intracellular concentration of doxorubicin. Experiments with clusters of tumor cells compartmentalized in microfluidic drops reveal that cells within a cluster have higher viability than their single-cell counterparts when exposed to doxorubicin. This result suggests that circulating tumor cell clusters might be able to better survive chemotherapy drug treatment. Our technology is a promising tool for understanding tumor cell-drug interactions in patient-derived samples including rare cells.

  14. Familial Clustering and DRD4 Effects on Electroencephalogram Measures in Multiplex Families with Attention Deficit/Hyperactivity Disorder

    ERIC Educational Resources Information Center

    Loo, Sandra K.; Hale, T. Sigi; Hanada, Grant; Macion, James; Shrestha, Anshu; McGough, James J.; McCracken, James T.; Nelson, Stanley; Smalley, Susan L.

    2010-01-01

    Objective: The current study tests electroencephalogram (EEG) measures as a potential endophenotype for attention deficit/hyperactivity disorder (ADHD) by examining sibling and parent-offspring similarity, familial clustering with the disorder, and association with the dopamine receptor D4 (DRD4) candidate gene. Method: The sample consists of 531…

  15. HICOSMO: cosmology with a complete sample of galaxy clusters - II. Cosmological results

    NASA Astrophysics Data System (ADS)

    Schellenberger, G.; Reiprich, T. H.

    2017-10-01

    The X-ray bright, hot gas in the potential well of a galaxy cluster enables systematic X-ray studies of samples of galaxy clusters to constrain cosmological parameters. HIFLUGCS consists of the 64 X-ray brightest galaxy clusters in the Universe, building up a local sample. Here, we utilize this sample to determine, for the first time, individual hydrostatic mass estimates for all the clusters of the sample and, by making use of the completeness of the sample, we quantify constraints on the two interesting cosmological parameters, Ωm and σ8. We apply our total hydrostatic and gas mass estimates from the X-ray analysis to a Bayesian cosmological likelihood analysis and leave several parameters free to be constrained. We find Ωm = 0.30 ± 0.01 and σ8 = 0.79 ± 0.03 (statistical uncertainties, 68 per cent credibility level) using our default analysis strategy combining both a mass function analysis and the gas mass fraction results. The main sources of biases that we correct here are (1) the influence of galaxy groups (incompleteness in parent samples and differing behaviour of the Lx-M relation), (2) the hydrostatic mass bias, (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other physical effects (non-negligible neutrino mass). We find that galaxy groups introduce a strong bias, since their number density seems to be over predicted by the halo mass function. On the other hand, incorporating baryonic effects does not result in a significant change in the constraints. The total (uncorrected) systematic uncertainties (∼20 per cent) clearly dominate the statistical uncertainties on cosmological parameters for our sample.

  16. Clustering of attitudes towards obesity: a mixed methods study of Australian parents and children.

    PubMed

    Olds, Tim; Thomas, Samantha; Lewis, Sophie; Petkov, John

    2013-10-12

    Current population-based anti-obesity campaigns often target individuals based on either weight or socio-demographic characteristics, and give a 'mass' message about personal responsibility. There is a recognition that attempts to influence attitudes and opinions may be more effective if they resonate with the beliefs that different groups have about the causes of, and solutions for, obesity. Limited research has explored how attitudinal factors may inform the development of both upstream and downstream social marketing initiatives. Computer-assisted face-to-face interviews were conducted with 159 parents and 184 of their children (aged 9-18 years old) in two Australian states. A mixed methods approach was used to assess attitudes towards obesity, and elucidate why different groups held various attitudes towards obesity. Participants were quantitatively assessed on eight dimensions relating to the severity and extent, causes and responsibility, possible remedies, and messaging strategies. Cluster analysis was used to determine attitudinal clusters. Participants were also able to qualify each answer. Qualitative responses were analysed both within and across attitudinal clusters using a constant comparative method. Three clusters were identified. Concerned Internalisers (27% of the sample) judged that obesity was a serious health problem, that Australia had among the highest levels of obesity in the world and that prevalence was rapidly increasing. They situated the causes and remedies for the obesity crisis in individual choices. Concerned Externalisers (38% of the sample) held similar views about the severity and extent of the obesity crisis. However, they saw responsibility and remedies as a societal rather than an individual issue. The final cluster, the Moderates, which contained significantly more children and males, believed that obesity was not such an important public health issue, and judged the extent of obesity to be less extreme than the other clusters. Attitudinal clusters provide new information and insights which may be useful in tailoring anti-obesity social marketing initiatives.

  17. X-ray versus infrared selection of distant galaxy clusters: a case study using the XMM-LSS and SpARCS cluster samples

    NASA Astrophysics Data System (ADS)

    Willis, J. P.; Ramos-Ceja, M. E.; Muzzin, A.; Pacaud, F.; Yee, H. K. C.; Wilson, G.

    2018-07-01

    We present a comparison of two samples of z> 0.8 galaxy clusters selected using different wavelength-dependent techniques and examine the physical differences between them. We consider 18 clusters from the X-ray-selected XMM Large Scale Structure (LSS) distant cluster survey and 92 clusters from the optical-mid-infrared (MIR)-selected Spitzer Adaptation of the Red Sequence Cluster survey (SpARCS) cluster survey. Both samples are selected from the same approximately 9 sq deg sky area and we examine them using common XMM-Newton, Spitizer Wide-Area Infrared Extra-galactic (SWIRE) survey, and Canada-France-Hawaii Telescope Legacy Survey data. Clusters from each sample are compared employing aperture measures of X-ray and MIR emission. We divide the SpARCS distant cluster sample into three sub-samples: (i) X-ray bright, (ii) X-ray faint, MIR bright, and (iii) X-ray faint, MIR faint clusters. We determine that X-ray- and MIR-selected clusters display very similar surface brightness distributions of galaxy MIR light. In addition, the average location and amplitude of the galaxy red sequence as measured from stacked colour histograms is very similar in the X-ray- and MIR-selected samples. The sub-sample of X-ray faint, MIR bright clusters displays a distribution of brightest cluster galaxy-barycentre position offsets which extends to higher values than all other samples. This observation indicates that such clusters may exist in a more disturbed state compared to the majority of the distant cluster population sampled by XMM-LSS and SpARCS. This conclusion is supported by stacked X-ray images for the X-ray faint, MIR bright cluster sub-sample that display weak, centrally concentrated X-ray emission, consistent with a population of growing clusters accreting from an extended envelope of material.

  18. WINGS-SPE Spectroscopy in the WIde-field Nearby Galaxy-cluster Survey

    NASA Astrophysics Data System (ADS)

    Cava, A.; Bettoni, D.; Poggianti, B. M.; Couch, W. J.; Moles, M.; Varela, J.; Biviano, A.; D'Onofrio, M.; Dressler, A.; Fasano, G.; Fritz, J.; Kjærgaard, P.; Ramella, M.; Valentinuzzi, T.

    2009-03-01

    Aims: We present the results from a comprehensive spectroscopic survey of the WINGS (WIde-field Nearby Galaxy-cluster Survey) clusters, a program called WINGS-SPE. The WINGS-SPE sample consists of 48 clusters, 22 of which are in the southern sky and 26 in the north. The main goals of this spectroscopic survey are: (1) to study the dynamics and kinematics of the WINGS clusters and their constituent galaxies, (2) to explore the link between the spectral properties and the morphological evolution in different density environments and across a wide range of cluster X-ray luminosities and optical properties. Methods: Using multi-object fiber-fed spectrographs, we observed our sample of WINGS cluster galaxies at an intermediate resolution of 6-9 Å and, using a cross-correlation technique, we measured redshifts with a mean accuracy of ~45 km s-1. Results: We present redshift measurements for 6137 galaxies and their first analyses. Details of the spectroscopic observations are reported. The WINGS-SPE has ~30% overlap with previously published data sets, allowing us both to perform a complete comparison with the literature and to extend the catalogs. Conclusions: Using our redshifts, we calculate the velocity dispersion for all the clusters in the WINGS-SPE sample. We almost triple the number of member galaxies known in each cluster with respect to previous works. We also investigate the X-ray luminosity vs. velocity dispersion relation for our WINGS-SPE clusters, and find it to be consistent with the form Lx ∝ σ_v^4. Table 4, containing the complete redshift catalog, is only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/495/707

  19. Sample size calculations for the design of cluster randomized trials: A summary of methodology.

    PubMed

    Gao, Fei; Earnest, Arul; Matchar, David B; Campbell, Michael J; Machin, David

    2015-05-01

    Cluster randomized trial designs are growing in popularity in, for example, cardiovascular medicine research and other clinical areas and parallel statistical developments concerned with the design and analysis of these trials have been stimulated. Nevertheless, reviews suggest that design issues associated with cluster randomized trials are often poorly appreciated and there remain inadequacies in, for example, describing how the trial size is determined and the associated results are presented. In this paper, our aim is to provide pragmatic guidance for researchers on the methods of calculating sample sizes. We focus attention on designs with the primary purpose of comparing two interventions with respect to continuous, binary, ordered categorical, incidence rate and time-to-event outcome variables. Issues of aggregate and non-aggregate cluster trials, adjustment for variation in cluster size and the effect size are detailed. The problem of establishing the anticipated magnitude of between- and within-cluster variation to enable planning values of the intra-cluster correlation coefficient and the coefficient of variation are also described. Illustrative examples of calculations of trial sizes for each endpoint type are included. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Chemometrics-based Approach in Analysis of Arnicae flos

    PubMed Central

    Zheleva-Dimitrova, Dimitrina Zh.; Balabanova, Vessela; Gevrenova, Reneta; Doichinova, Irini; Vitkova, Antonina

    2015-01-01

    Introduction: Arnica montana flowers have a long history as herbal medicines for external use on injuries and rheumatic complaints. Objective: To investigate Arnicae flos of cultivated accessions from Bulgaria, Poland, Germany, Finland, and Pharmacy store for phenolic derivatives and sesquiterpene lactones (STLs). Materials and Methods: Samples of Arnica from nine origins were prepared by ultrasound-assisted extraction with 80% methanol for phenolic compounds analysis. Subsequent reverse-phase high-performance liquid chromatography (HPLC) separation of the analytes was performed using gradient elution and ultraviolet detection at 280 and 310 nm (phenolic acids), and 360 nm (flavonoids). Total STLs were determined in chloroform extracts by solid-phase extraction-HPLC at 225 nm. The HPLC generated chromatographic data were analyzed using principal component analysis (PCA) and hierarchical clustering (HC). Results: The highest total amount of phenolic acids was found in the sample from Botanical Garden at Joensuu University, Finland (2.36 mg/g dw). Astragalin, isoquercitrin, and isorhamnetin 3-glucoside were the main flavonol glycosides being present up to 3.37 mg/g (astragalin). Three well-defined clusters were distinguished by PCA and HC. Cluster C1 comprised of the German and Finnish accessions characterized by the highest content of flavonols. Cluster C2 included the Bulgarian and Polish samples presenting a low content of flavonoids. Cluster C3 consisted only of one sample from a pharmacy store. Conclusion: A validated HPLC method for simultaneous determination of phenolic acids, flavonoid glycosides, and aglycones in A. montana flowers was developed. The PCA loading plot showed that quercetin, kaempferol, and isorhamnetin can be used to distinguish different Arnica accessions. SUMMARY A principal component analysis (PCA) on 13 phenolic compounds and total amount of sesquiterpene lactones in Arnicae flos collection tended to cluster the studied 9 accessions into three main groups. The profiles obtained demonstrated that the samples from Germany and Finland are characterized by greater amounts of phenolic derivatives than the Bulgarian and Polish ones. The PCA loading plot showed that quercetin, kaemferol and isorhamnetin can be used to distinguish different arnica accessions. PMID:27013791

  1. Validation of spot-testing kits to determine iodine content in salt.

    PubMed Central

    Pandav, C. S.; Arora, N. K.; Krishnan, A.; Sankar, R.; Pandav, S.; Karmarkar, M. G.

    2000-01-01

    Iodine deficiency disorders are a major public health problem, and salt iodization is the most widely practised intervention for their elimination. For the intervention to be successful and sustainable, it is vital to monitor the iodine content of salt regularly. Iodometric titration, the traditional method for measuring iodine content, has problems related to accessibility and cost. The newer spot-testing kits are inexpensive, require minimal training, and provide immediate results. Using data from surveys to assess the availability of iodized salt in two states in India, Madhya Pradesh and the National Capital Territory of Delhi, we tested the suitability of such a kit in field situations. Salt samples from Delhi were collected from 30 schools, chosen using the Expanded Programme on Immunization (EPI) cluster sampling technique. A single observer made the measurement for iodine content using the kit. Salt samples from Madhya Pradesh were from 30 rural and 30 urban clusters, identified by using census data and the EPI cluster sampling technique. In each cluster, salt samples were collected from 10 randomly selected households and all retailers. The 15 investigators performing the survey estimated the iodine content of salt samples in the field using the kit. All the samples were brought to the central laboratory in Delhi, where iodine content was estimated using iodometric titration as a reference method. The agreement between the kit and titration values decreased as the number of observers increased. Although sensitivity was not much affected by the increase in the number of observers (93.3% for a single observer and 93.9% for multiple observers), specificity decreased sharply (90.4% for a single observer and 40.4% for multiple observers). Due to the low specificity and resulting high numbers of false-positives for the kit when used by multiple observers ("real-life situations"), kits were likely to consistently overestimate the availability of iodized salt. This overestimation could result in complacency. Therefore, we conclude that until a valid alternative is available, the titration method should be used for monitoring the iodine content of salt at all levels, from producer to consumer, to ensure effectiveness of the programme. PMID:10994281

  2. Impact of Sampling Density on the Extent of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2014-01-01

    Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430

  3. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    PubMed

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  4. Development of a method for the determination of Fusarium fungi on corn using mid-infrared spectroscopy with attenuated total reflection and chemometrics.

    PubMed

    Kos, Gregor; Lohninger, Hans; Krska, Rudolf

    2003-03-01

    A novel method, which enables the determination of fungal infection with Fusarium graminearum on corn within minutes, is presented. The ground sample was sieved and the particle size fraction between >250 and 100 microm was used for mid-infrared/attenuated total reflection (ATR) measurements. The sample was pressed onto the ATR crystal, and reproducible pressure was applied. After the spectra were recorded, they were subjected to principle component analysis (PCA) and classified using cluster analysis. Observed changes in the spectra reflected changes in protein, carbohydrate, and lipid contents. Ergosterol (for the total fungal biomass) and the toxin deoxynivalenol (DON; a secondary metabolite) of Fusarium fungi served as reference parameters, because of their relevance for the examination of corn based food and feed. The repeatability was highly improved by sieving prior to recording the spectra, resulting in a better clustering in PCA score/score plots. The developed method enabled the separation of samples with a toxin content of as low as 310 microg/kg from noncontaminated (blank) samples. Investigated concentration ranges were 880-3600 microg/kg for ergosterol and 310-2596 microg/kg for DON. The percentage of correctly classified samples was up to 100% for individual samples compared with a number of blank samples.

  5. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.

    PubMed

    Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi

    2015-01-01

    Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability--the basis of cluster generation--is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.

  6. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    PubMed

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  7. Testing the Large-scale Environments of Cool-core and Non-cool-core Clusters with Clustering Bias

    NASA Astrophysics Data System (ADS)

    Medezinski, Elinor; Battaglia, Nicholas; Coupon, Jean; Cen, Renyue; Gaspari, Massimo; Strauss, Michael A.; Spergel, David N.

    2017-02-01

    There are well-observed differences between cool-core (CC) and non-cool-core (NCC) clusters, but the origin of this distinction is still largely unknown. Competing theories can be divided into internal (inside-out), in which internal physical processes transform or maintain the NCC phase, and external (outside-in), in which the cluster type is determined by its initial conditions, which in turn leads to different formation histories (I.e., assembly bias). We propose a new method that uses the relative assembly bias of CC to NCC clusters, as determined via the two-point cluster-galaxy cross-correlation function (CCF), to test whether formation history plays a role in determining their nature. We apply our method to 48 ACCEPT clusters, which have well resolved central entropies, and cross-correlate with the SDSS-III/BOSS LOWZ galaxy catalog. We find that the relative bias of NCC over CC clusters is b = 1.42 ± 0.35 (1.6σ different from unity). Our measurement is limited by the small number of clusters with core entropy information within the BOSS footprint, 14 CC and 34 NCC clusters. Future compilations of X-ray cluster samples, combined with deep all-sky redshift surveys, will be able to better constrain the relative assembly bias of CC and NCC clusters and determine the origin of the bimodality.

  8. Testing the Large-scale Environments of Cool-core and Non-cool-core Clusters with Clustering Bias

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Medezinski, Elinor; Battaglia, Nicholas; Cen, Renyue

    2017-02-10

    There are well-observed differences between cool-core (CC) and non-cool-core (NCC) clusters, but the origin of this distinction is still largely unknown. Competing theories can be divided into internal (inside-out), in which internal physical processes transform or maintain the NCC phase, and external (outside-in), in which the cluster type is determined by its initial conditions, which in turn leads to different formation histories (i.e., assembly bias). We propose a new method that uses the relative assembly bias of CC to NCC clusters, as determined via the two-point cluster-galaxy cross-correlation function (CCF), to test whether formation history plays a role in determiningmore » their nature. We apply our method to 48 ACCEPT clusters, which have well resolved central entropies, and cross-correlate with the SDSS-III/BOSS LOWZ galaxy catalog. We find that the relative bias of NCC over CC clusters is b = 1.42 ± 0.35 (1.6 σ different from unity). Our measurement is limited by the small number of clusters with core entropy information within the BOSS footprint, 14 CC and 34 NCC clusters. Future compilations of X-ray cluster samples, combined with deep all-sky redshift surveys, will be able to better constrain the relative assembly bias of CC and NCC clusters and determine the origin of the bimodality.« less

  9. Estimation after classification using lot quality assurance sampling: corrections for curtailed sampling with application to evaluating polio vaccination campaigns.

    PubMed

    Olives, Casey; Valadez, Joseph J; Pagano, Marcello

    2014-03-01

    To assess the bias incurred when curtailment of Lot Quality Assurance Sampling (LQAS) is ignored, to present unbiased estimators, to consider the impact of cluster sampling by simulation and to apply our method to published polio immunization data from Nigeria. We present estimators of coverage when using two kinds of curtailed LQAS strategies: semicurtailed and curtailed. We study the proposed estimators with independent and clustered data using three field-tested LQAS designs for assessing polio vaccination coverage, with samples of size 60 and decision rules of 9, 21 and 33, and compare them to biased maximum likelihood estimators. Lastly, we present estimates of polio vaccination coverage from previously published data in 20 local government authorities (LGAs) from five Nigerian states. Simulations illustrate substantial bias if one ignores the curtailed sampling design. Proposed estimators show no bias. Clustering does not affect the bias of these estimators. Across simulations, standard errors show signs of inflation as clustering increases. Neither sampling strategy nor LQAS design influences estimates of polio vaccination coverage in 20 Nigerian LGAs. When coverage is low, semicurtailed LQAS strategies considerably reduces the sample size required to make a decision. Curtailed LQAS designs further reduce the sample size when coverage is high. Results presented dispel the misconception that curtailed LQAS data are unsuitable for estimation. These findings augment the utility of LQAS as a tool for monitoring vaccination efforts by demonstrating that unbiased estimation using curtailed designs is not only possible but these designs also reduce the sample size. © 2014 John Wiley & Sons Ltd.

  10. Cancer detection based on Raman spectra super-paramagnetic clustering

    NASA Astrophysics Data System (ADS)

    González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual

    2016-08-01

    The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.

  11. Wedge sampling for computing clustering coefficients and triangle counts on large graphs

    DOE PAGES

    Seshadhri, C.; Pinar, Ali; Kolda, Tamara G.

    2014-05-08

    Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Despite the importance of these triadic measures, algorithms to compute them can be extremely expensive. We discuss the method of wedge sampling. This versatile technique allows for the fast and accurate approximation of various types of clustering coefficients and triangle counts. Furthermore, these techniques are extensible to counting directed triangles in digraphs. Our methods come with provable andmore » practical time-approximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the state of the art, while providing nearly the accuracy of full enumeration.« less

  12. Evaluation of different approaches for identifying optimal sites to predict mean hillslope soil moisture content

    NASA Astrophysics Data System (ADS)

    Liao, Kaihua; Zhou, Zhiwen; Lai, Xiaoming; Zhu, Qing; Feng, Huihui

    2017-04-01

    The identification of representative soil moisture sampling sites is important for the validation of remotely sensed mean soil moisture in a certain area and ground-based soil moisture measurements in catchment or hillslope hydrological studies. Numerous approaches have been developed to identify optimal sites for predicting mean soil moisture. Each method has certain advantages and disadvantages, but they have rarely been evaluated and compared. In our study, surface (0-20 cm) soil moisture data from January 2013 to March 2016 (a total of 43 sampling days) were collected at 77 sampling sites on a mixed land-use (tea and bamboo) hillslope in the hilly area of Taihu Lake Basin, China. A total of 10 methods (temporal stability (TS) analyses based on 2 indices, K-means clustering based on 6 kinds of inputs and 2 random sampling strategies) were evaluated for determining optimal sampling sites for mean soil moisture estimation. They were TS analyses based on the smallest index of temporal stability (ITS, a combination of the mean relative difference and standard deviation of relative difference (SDRD)) and based on the smallest SDRD, K-means clustering based on soil properties and terrain indices (EFs), repeated soil moisture measurements (Theta), EFs plus one-time soil moisture data (EFsTheta), and the principal components derived from EFs (EFs-PCA), Theta (Theta-PCA), and EFsTheta (EFsTheta-PCA), and global and stratified random sampling strategies. Results showed that the TS based on the smallest ITS was better (RMSE = 0.023 m3 m-3) than that based on the smallest SDRD (RMSE = 0.034 m3 m-3). The K-means clustering based on EFsTheta (-PCA) was better (RMSE <0.020 m3 m-3) than these based on EFs (-PCA) and Theta (-PCA). The sampling design stratified by the land use was more efficient than the global random method. Forty and 60 sampling sites are needed for stratified sampling and global sampling respectively to make their performances comparable to the best K-means method (EFsTheta-PCA). Overall, TS required only one site, but its accuracy was limited. The best K-means method required <8 sites and yielded high accuracy, but extra soil and terrain information is necessary when using this method. The stratified sampling strategy can only be used if no pre-knowledge about soil moisture variation is available. This information will help in selecting the optimal methods for estimation the area mean soil moisture.

  13. [Seasonality of clustering of fever and diarrhea in Beijing, 2009-2015].

    PubMed

    Li, X T; Chen, Y W; He, Z Y; Li, S; Gao, Z Y; He, X; Wang, Q Y

    2017-01-10

    Objective: To understand the seasonal distribution of the clustering of fever and diarrhea. Methods: Concentration degree and circular distribution methods were used to analyze the seasonal distribution of the clustering of fever and diarrhea in Beijing from 2009 to 2015. The information were collected from the Infectious Disease Surveillance Information System of Beijing. Results: The M values of the clustering of fever and diarrhea were 0.57 and 0.47. Circular distribution results showed that the clustering of fever and diarrhea angle dispersion index R values were 0.57 and 0.46 respectively, the sample average angle of Rayleigh' s test Z values were 414.14, 148.09 respectively (all P <0.01). The clustering of fever and diarrhea had seasonality. The incidence peak of fever was on October 13, and the epidemic period was during August 13-December 14. The incidence peak of diarrhea was on July 31, and the epidemic period was during May 20-October 11. Conclusion: The clustering of fever had obvious seasonality in Beijing, which mainly occurred in autumn and winter. The cluster of diarrhea had certain seasonality, which mainly occurred in summer and autumn.

  14. Weak Lensing by Galaxy Clusters: from Pixels to Cosmology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gruen, Daniel

    The story of the origin and evolution of our Universe is told, equivalently, by space-time itself and by the structures that grow inside of it. Clusters of galaxies are the frontier of bottom-up structure formation. They are the most massive objects to have collapsed at the present epoch. By that virtue, their abundance and structural parameters are highly sensitive to the composition and evolution of the Universe. The most common probe of cluster cosmology, abundance, uses samples of clusters selected by some observable. Applying a mass-observable relation (MOR), cosmological parameters can be constrained by comparing the sample to predicted clustermore » abundances as a function of observable and redshift. Arguably, however, cluster probes have not yet entered the era of per cent level precision cosmology. The primary reason for this is our imperfect understanding of the MORs. The overall normalization, the slope of mass vs. observable, the redshift evolution, and the degree and correlation of intrinsic scatters of observables at fixed mass have to be constrained for interpreting abundances correctly. Mass measurement of clusters by means of the differential deflection of light from background sources in their gravitational field, i.e. weak lensing, is a powerful approach for achieving this. This thesis presents new methods for and scientific results of weak lensing measurements of clusters of galaxies. The former include, on the data reduction side, (i) the correction of CCD images for non-linear effects due to the electric fields of accumulated charges and (ii) a method for masking artifact features in sets of overlapping images of the sky by comparison to the median image. Also, (iii) I develop a method for the selection of background galaxy samples based on their color and apparent magnitude that includes a new correction for contamination with cluster member galaxies. The main scientific results are the following. (i) For the Hubble Frontier Field cluster RXC J2248.7--4431 our lensing analysis constrains mass and concentration of the cluster halo and we confirm the large mass predicted by X-ray and Sunyaev-Zel’dovich (SZ) observations. The study of cluster members shows the relation of galaxy morphology to luminosity and environment. (ii) Our lensing mass measurements for 12 clusters are consistent with X-ray masses derived under the assumption of hydrostatic equilibrium of the intra-cluster gas. We confirm the MORs derived by the South Pole Telescope collaboration for the detection significance of the cluster SZ signal in their survey. We find discrepancies, however, with the Planck SZ MOR. We hypothesize that these are related either to a shallower slope of the MOR or a size-, redshift- or noise-dependent bias in SZ signal extraction. (iii) Finally, using a combination of simulations and theoretical models for the variation of cluster profiles at fixed mass, we find that the latter is a significant contribution to the uncertainty of cluster lensing mass measurements. A cosmic variance model, such as the one we develop, is necessary for MOR constraints to be accurate at the level required for future surveys.« less

  15. The clustering evolution of distant red galaxies in the GOODS-MUSIC sample

    NASA Astrophysics Data System (ADS)

    Grazian, A.; Fontana, A.; Moscardini, L.; Salimbeni, S.; Menci, N.; Giallongo, E.; de Santis, C.; Gallozzi, S.; Nonino, M.; Cristiani, S.; Vanzella, E.

    2006-07-01

    Aims.We study the clustering properties of Distant Red Galaxies (DRGs) to test whether they are the progenitors of local massive galaxies. Methods.We use the GOODS-MUSIC sample, a catalog of ~3000 Ks-selected galaxies based on VLT and HST observation of the GOODS-South field with extended multi-wavelength coverage (from 0.3 to 8~μm) and accurate estimates of the photometric redshifts to select 179 DRGs with J-Ks≥ 1.3 in an area of 135 sq. arcmin.Results.We first show that the J-Ks≥ 1.3 criterion selects a rather heterogeneous sample of galaxies, going from the targeted high-redshift luminous evolved systems, to a significant fraction of lower redshift (1

  16. Sensory characteristics and consumer acceptability of fermented soybean paste (Doenjang).

    PubMed

    Kim, H G; Hong, J H; Song, C K; Shin, H W; Kim, K O

    2010-09-01

    This study was conducted to examine the sensory profiles of fermented soybean paste (Doenjang), to understand consumers' acceptability of different types of Doenjang samples and to identify the sensory characteristics that drive consumer acceptability of Doenjang products. Descriptive analysis and consumer acceptability test were conducted for 7 different types of Doenjang samples. The samples included 2 types of Doenjang made by either traditional or commercially modified methods. For the descriptive analysis, 8 trained panelists developed and evaluated 31 descriptors. There were significant differences in all 31 attributes among the samples. Principal component analysis was also performed to summarize the sensory characteristics of the samples. In consumer testing, 200 consumers evaluated the acceptability of Doenjang samples. Significant differences in consumer acceptability were observed among the samples. The consumers preferred the Doenjang samples manufactured using a commercially modified method. In overall point of view, most consumers liked the Doenjang samples that had strong "sweet" and "MSG (monosodium glutamate)" tastes. It appears that "sweet" and "MSG" tastes are the drivers of liking for Doenjang. "Salty" taste, "meju,"traditional Korean soy sauce," and "fermented fish" odor/flavors seem to be the drivers of disliking for Doenjang. Cluster analysis identified 3 subconsumer segments sharing a common preference pattern for the 7 samples within a cluster. The results showed that each consumer cluster preferred different Doenjang samples. External preference mapping was performed to establish the relationships between the sensory attributes and consumer acceptability in each cluster. Consumption of the fermented soybean products is gradually expanding around the world, due to their various health benefits. Therefore, understanding sensory characteristics and consumer acceptability of Doenjang are becoming increasingly important. The intense and complex flavor characteristics of Doenjang make it difficult to obtain a comprehensive sensory profiling and drivers of liking. The finding of this study can be applied to development of a new product that has better consumer acceptability. Also this study can be a useful and effective guideline to researchers who intend to examine the sensory characteristics and consumer acceptability of fermented soybean pastes.

  17. Some characteristics of matrix-assisted UV laser desorption/ionization mass spectrometric analysis of large proteins

    NASA Astrophysics Data System (ADS)

    Perera, I. K.; Kantartzoglou, S.; Dyer, P. E.

    1996-12-01

    We have performed experiments to explore the characteristics of the matrix-assisted laser desorption/ionization (MALDI) process and to ascertain optimal operational conditions for observing intact molecular ions of large proteins. In this study, several methods have been adopted for the preparation of analyte samples. Of these, the samples prepared with the simple dried-droplet method were found to be the most suitable for the generation of the large molecular clusters, while the near-uniform spin-coated samples were observed to produce highly reproducible molecular ion signals of relatively high mass resolutions. A resulting mass spectrum which illustrates the formation of cluster ions up to the 26-mer [26M+H]+ of bovine insulin corresponding to a mass of about 150,000 Da, is presented. The effect of fluence on the extent of clustering of protein molecules has been studied, the results revealing the existence of an optimum fluence for detecting the large cluster ions. Investigations have also indicated that the use of polyethylene-coated metallic substrates as sample supports can considerably reduce the fragmentation of the matrix/analyte molecular ions and the desorption of "neat" MALDI matrices deposited on these polyethylene-coated sample probes enhance their aggregation, forming up to the heptamer [7M+H]+ of the matrix, ferulic acid. The dependence of the mass resolution on the applied acceleration voltage and the desorption fluence has been examined and the results obtained are discussed in terms of a simple analysis of the linear time-of-flight mass spectrometer. A spectrum of chicken egg lysozyme (M~14,306) displaying the high mass resolutions (M/[Delta]M~690) that can be attained when the mass spectrometer is operated in the reflectron mode is also presented.

  18. Intraclass Correlation Coefficients for Obesity Indicators and Energy Balance-Related Behaviors among New York City Public Elementary Schools

    ERIC Educational Resources Information Center

    Gray, Heewon Lee; Burgermaster, Marissa; Tipton, Elizabeth; Contento, Isobel R.; Koch, Pamela A.; Di Noia, Jennifer

    2016-01-01

    Objective: Sample size and statistical power calculation should consider clustering effects when schools are the unit of randomization in intervention studies. The objective of the current study was to investigate how student outcomes are clustered within schools in an obesity prevention trial. Method: Baseline data from the Food, Health &…

  19. Community and Cluster Centre Residential Services for Adults with Intellectual Disability: Long-Term Results from an Australian-Matched Sample

    ERIC Educational Resources Information Center

    Young, L.

    2006-01-01

    Background: Changes in residential accommodation models for adults with intellectual disability (ID) over the last 20 years in Australia, the United Kingdom and the United States have involved relocation from institutions primarily into dispersed homes in the community. But an evolving alternative service style is the cluster centre. Methods: This…

  20. The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

    ERIC Educational Resources Information Center

    McNeish, Daniel M.; Stapleton, Laura M.

    2016-01-01

    Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the…

  1. Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach.

    PubMed

    Andreatta, Massimo; Lund, Ole; Nielsen, Morten

    2013-01-01

    Proteins recognizing short peptide fragments play a central role in cellular signaling. As a result of high-throughput technologies, peptide-binding protein specificities can be studied using large peptide libraries at dramatically lower cost and time. Interpretation of such large peptide datasets, however, is a complex task, especially when the data contain multiple receptor binding motifs, and/or the motifs are found at different locations within distinct peptides. The algorithm presented in this article, based on Gibbs sampling, identifies multiple specificities in peptide data by performing two essential tasks simultaneously: alignment and clustering of peptide data. We apply the method to de-convolute binding motifs in a panel of peptide datasets with different degrees of complexity spanning from the simplest case of pre-aligned fixed-length peptides to cases of unaligned peptide datasets of variable length. Example applications described in this article include mixtures of binders to different MHC class I and class II alleles, distinct classes of ligands for SH3 domains and sub-specificities of the HLA-A*02:01 molecule. The Gibbs clustering method is available online as a web server at http://www.cbs.dtu.dk/services/GibbsCluster.

  2. Identifying pathogenic processes by integrating microarray data with prior knowledge

    PubMed Central

    2014-01-01

    Background It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping. Results Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation. Conclusion Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways. PMID:24758699

  3. Competing risks regression for clustered data

    PubMed Central

    Zhou, Bingqing; Fine, Jason; Latouche, Aurelien; Labopin, Myriam

    2012-01-01

    A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine–Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry. PMID:22045910

  4. Biased phylodynamic inferences from analysing clusters of viral sequences

    PubMed Central

    Xiang, Fei; Frost, Simon D. W.

    2017-01-01

    Abstract Phylogenetic methods are being increasingly used to help understand the transmission dynamics of measurably evolving viruses, including HIV. Clusters of highly similar sequences are often observed, which appear to follow a ‘power law’ behaviour, with a small number of very large clusters. These clusters may help to identify subpopulations in an epidemic, and inform where intervention strategies should be implemented. However, clustering of samples does not necessarily imply the presence of a subpopulation with high transmission rates, as groups of closely related viruses can also occur due to non-epidemiological effects such as over-sampling. It is important to ensure that observed phylogenetic clustering reflects true heterogeneity in the transmitting population, and is not being driven by non-epidemiological effects. We qualify the effect of using a falsely identified ‘transmission cluster’ of sequences to estimate phylodynamic parameters including the effective population size and exponential growth rate under several demographic scenarios. Our simulation studies show that taking the maximum size cluster to re-estimate parameters from trees simulated under a randomly mixing, constant population size coalescent process systematically underestimates the overall effective population size. In addition, the transmission cluster wrongly resembles an exponential or logistic growth model 99% of the time. We also illustrate the consequences of false clusters in exponentially growing coalescent and birth-death trees, where again, the growth rate is skewed upwards. This has clear implications for identifying clusters in large viral databases, where a false cluster could result in wasted intervention resources. PMID:28852573

  5. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    PubMed

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  6. Stochastic multi-reference perturbation theory with application to the linearized coupled cluster method

    NASA Astrophysics Data System (ADS)

    Jeanmairet, Guillaume; Sharma, Sandeep; Alavi, Ali

    2017-01-01

    In this article we report a stochastic evaluation of the recently proposed multireference linearized coupled cluster theory [S. Sharma and A. Alavi, J. Chem. Phys. 143, 102815 (2015)]. In this method, both the zeroth-order and first-order wavefunctions are sampled stochastically by propagating simultaneously two populations of signed walkers. The sampling of the zeroth-order wavefunction follows a set of stochastic processes identical to the one used in the full configuration interaction quantum Monte Carlo (FCIQMC) method. To sample the first-order wavefunction, the usual FCIQMC algorithm is augmented with a source term that spawns walkers in the sampled first-order wavefunction from the zeroth-order wavefunction. The second-order energy is also computed stochastically but requires no additional overhead outside of the added cost of sampling the first-order wavefunction. This fully stochastic method opens up the possibility of simultaneously treating large active spaces to account for static correlation and recovering the dynamical correlation using perturbation theory. The method is used to study a few benchmark systems including the carbon dimer and aromatic molecules. We have computed the singlet-triplet gaps of benzene and m-xylylene. For m-xylylene, which has proved difficult for standard complete active space self consistent field theory with perturbative correction, we find the singlet-triplet gap to be in good agreement with the experimental values.

  7. Sleep, Dietary, and Exercise Behavioral Clusters among Truck Drivers with Obesity: Implications for Interventions

    PubMed Central

    Olson, Ryan; Thompson, Sharon V.; Wipfli, Brad; Hanson, Ginger; Elliot, Diane L.; Anger, W. Kent; Bodner, Todd; Hammer, Leslie B.; Hohn, Elliot; Perrin, Nancy A.

    2015-01-01

    Objective Our objectives were to describe a sample of truck drivers, identify clusters of drivers with similar patterns in behaviors affecting energy balance (sleep, diet, and exercise), and test for cluster differences in health and psychosocial factors. Methods Participants’ (n=452, BMI M=37.2, 86.4% male) self-reported behaviors were dichotomized prior to hierarchical cluster analysis, which identified groups with similar behavior co-variation. Cluster differences were tested with generalized estimating equations. Results Five behavioral clusters were identified that differed significantly in age, smoking status, diabetes prevalence, lost work days, stress, and social support, but not in BMI. Cluster 2, characterized by the best sleep quality, had significantly lower lost workdays and stress than other clusters. Conclusions Weight management interventions for drivers should explicitly address sleep, and may be maximally effective after establishing socially supportive work environments that reduce stress exposures. PMID:26949883

  8. Plane-Based Sampling for Ray Casting Algorithm in Sequential Medical Images

    PubMed Central

    Lin, Lili; Chen, Shengyong; Shao, Yan; Gu, Zichun

    2013-01-01

    This paper proposes a plane-based sampling method to improve the traditional Ray Casting Algorithm (RCA) for the fast reconstruction of a three-dimensional biomedical model from sequential images. In the novel method, the optical properties of all sampling points depend on the intersection points when a ray travels through an equidistant parallel plan cluster of the volume dataset. The results show that the method improves the rendering speed at over three times compared with the conventional algorithm and the image quality is well guaranteed. PMID:23424608

  9. Learning Bayesian Networks from Correlated Data

    NASA Astrophysics Data System (ADS)

    Bae, Harold; Monti, Stefano; Montano, Monty; Steinberg, Martin H.; Perls, Thomas T.; Sebastiani, Paola

    2016-05-01

    Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.

  10. Damage evolution analysis of coal samples under cyclic loading based on single-link cluster method

    NASA Astrophysics Data System (ADS)

    Zhang, Zhibo; Wang, Enyuan; Li, Nan; Li, Xuelong; Wang, Xiaoran; Li, Zhonghui

    2018-05-01

    In this paper, the acoustic emission (AE) response of coal samples under cyclic loading is measured. The results show that there is good positive relation between AE parameters and stress. The AE signal of coal samples under cyclic loading exhibits an obvious Kaiser Effect. The single-link cluster (SLC) method is applied to analyze the spatial evolution characteristics of AE events and the damage evolution process of coal samples. It is found that a subset scale of the SLC structure becomes smaller and smaller when the number of cyclic loading increases, and there is a negative linear relationship between the subset scale and the degree of damage. The spatial correlation length ξ of an SLC structure is calculated. The results show that ξ fluctuates around a certain value from the second cyclic loading process to the fifth cyclic loading process, but spatial correlation length ξ clearly increases in the sixth loading process. Based on the criterion of microcrack density, the coal sample failure process is the transformation from small-scale damage to large-scale damage, which is the reason for changes in the spatial correlation length. Through a systematic analysis, the SLC method is an effective method to research the damage evolution process of coal samples under cyclic loading, and will provide important reference values for studying coal bursts.

  11. Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA

    NASA Astrophysics Data System (ADS)

    Perren, G. I.; Piatti, A. E.; Vázquez, R. A.

    2017-06-01

    Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89

  12. Mass Distribution in Galaxy Cluster Cores

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogan, M. T.; McNamara, B. R.; Pulido, F.

    Many processes within galaxy clusters, such as those believed to govern the onset of thermally unstable cooling and active galactic nucleus feedback, are dependent upon local dynamical timescales. However, accurate mapping of the mass distribution within individual clusters is challenging, particularly toward cluster centers where the total mass budget has substantial radially dependent contributions from the stellar ( M {sub *}), gas ( M {sub gas}), and dark matter ( M {sub DM}) components. In this paper we use a small sample of galaxy clusters with deep Chandra observations and good ancillary tracers of their gravitating mass at both largemore » and small radii to develop a method for determining mass profiles that span a wide radial range and extend down into the central galaxy. We also consider potential observational pitfalls in understanding cooling in hot cluster atmospheres, and find tentative evidence for a relationship between the radial extent of cooling X-ray gas and nebular H α emission in cool-core clusters. At large radii the entropy profiles of our clusters agree with the baseline power law of K ∝ r {sup 1.1} expected from gravity alone. At smaller radii our entropy profiles become shallower but continue with a power law of the form K ∝ r {sup 0.67} down to our resolution limit. Among this small sample of cool-core clusters we therefore find no support for the existence of a central flat “entropy floor.”.« less

  13. The XXL Survey. II. The bright cluster sample: catalogue and luminosity function

    NASA Astrophysics Data System (ADS)

    Pacaud, F.; Clerc, N.; Giles, P. A.; Adami, C.; Sadibekova, T.; Pierre, M.; Maughan, B. J.; Lieu, M.; Le Fèvre, J. P.; Alis, S.; Altieri, B.; Ardila, F.; Baldry, I.; Benoist, C.; Birkinshaw, M.; Chiappetti, L.; Démoclès, J.; Eckert, D.; Evrard, A. E.; Faccioli, L.; Gastaldello, F.; Guennou, L.; Horellou, C.; Iovino, A.; Koulouridis, E.; Le Brun, V.; Lidman, C.; Liske, J.; Maurogordato, S.; Menanteau, F.; Owers, M.; Poggianti, B.; Pomarède, D.; Pompei, E.; Ponman, T. J.; Rapetti, D.; Reiprich, T. H.; Smith, G. P.; Tuffs, R.; Valageas, P.; Valtchanov, I.; Willis, J. P.; Ziparo, F.

    2016-06-01

    Context. The XXL Survey is the largest survey carried out by the XMM-Newton satellite and covers a total area of 50 square degrees distributed over two fields. It primarily aims at investigating the large-scale structures of the Universe using the distribution of galaxy clusters and active galactic nuclei as tracers of the matter distribution. The survey will ultimately uncover several hundreds of galaxy clusters out to a redshift of ~2 at a sensitivity of ~10-14 erg s-1 cm-2 in the [0.5-2] keV band. Aims: This article presents the XXL bright cluster sample, a subsample of 100 galaxy clusters selected from the full XXL catalogue by setting a lower limit of 3 × 10-14 erg s-1 cm-2 on the source flux within a 1' aperture. Methods: The selection function was estimated using a mixture of Monte Carlo simulations and analytical recipes that closely reproduce the source selection process. An extensive spectroscopic follow-up provided redshifts for 97 of the 100 clusters. We derived accurate X-ray parameters for all the sources. Scaling relations were self-consistently derived from the same sample in other publications of the series. On this basis, we study the number density, luminosity function, and spatial distribution of the sample. Results: The bright cluster sample consists of systems with masses between M500 = 7 × 1013 and 3 × 1014 M⊙, mostly located between z = 0.1 and 0.5. The observed sky density of clusters is slightly below the predictions from the WMAP9 model, and significantly below the prediction from the Planck 2015 cosmology. In general, within the current uncertainties of the cluster mass calibration, models with higher values of σ8 and/or ΩM appear more difficult to accommodate. We provide tight constraints on the cluster differential luminosity function and find no hint of evolution out to z ~ 1. We also find strong evidence for the presence of large-scale structures in the XXL bright cluster sample and identify five new superclusters. Based on observations obtained with XMM-Newton, an ESA science mission with instruments and contributions directly funded by ESA Member States and NASA. Based on observations made with ESO Telescopes at the La Silla and Paranal Observatories under programme ID 089.A-0666 and LP191.A-0268.The Master Catalogue is available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/592/A2

  14. *K-means and cluster models for cancer signatures.

    PubMed

    Kakushadze, Zura; Yu, Willie

    2017-09-01

    We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.

  15. Inclusion of trial functions in the Langevin equation path integral ground state method: Application to parahydrogen clusters and their isotopologues

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schmidt, Matthew; Constable, Steve; Ing, Christopher

    2014-06-21

    We developed and studied the implementation of trial wavefunctions in the newly proposed Langevin equation Path Integral Ground State (LePIGS) method [S. Constable, M. Schmidt, C. Ing, T. Zeng, and P.-N. Roy, J. Phys. Chem. A 117, 7461 (2013)]. The LePIGS method is based on the Path Integral Ground State (PIGS) formalism combined with Path Integral Molecular Dynamics sampling using a Langevin equation based sampling of the canonical distribution. This LePIGS method originally incorporated a trivial trial wavefunction, ψ{sub T}, equal to unity. The present paper assesses the effectiveness of three different trial wavefunctions on three isotopes of hydrogen formore » cluster sizes N = 4, 8, and 13. The trial wavefunctions of interest are the unity trial wavefunction used in the original LePIGS work, a Jastrow trial wavefunction that includes correlations due to hard-core repulsions, and a normal mode trial wavefunction that includes information on the equilibrium geometry. Based on this analysis, we opt for the Jastrow wavefunction to calculate energetic and structural properties for parahydrogen, orthodeuterium, and paratritium clusters of size N = 4 − 19, 33. Energetic and structural properties are obtained and compared to earlier work based on Monte Carlo PIGS simulations to study the accuracy of the proposed approach. The new results for paratritium clusters will serve as benchmark for future studies. This paper provides a detailed, yet general method for optimizing the necessary parameters required for the study of the ground state of a large variety of systems.« less

  16. The Luminosity Function of Star Clusters in 20 Star-Forming Galaxies Based on Hubble Legacy Archive Photometry

    NASA Astrophysics Data System (ADS)

    Bowers, Ariel; Whitmore, B. C.; Chandar, R.; Larsen, S. S.

    2014-01-01

    Luminosity functions have been determined for star cluster populations in 20 nearby (4 - 30 Mpc), star-forming galaxies based on ACS source lists generated by the Hubble Legacy Archive (http://hla.stsci.edu). These cluster catalogs provide one of the largest sets of uniform, automatically-generated cluster candidates available in the literature at present. Comparisons are made with other recently generated cluster catalogs demonstrating that the HLA-generated catalogs are of similar quality, but in general do not go as deep. A typical cluster luminosity function can be approximated by a power-law, dN/dL ∝ Lα, with an average value for α of -2.37 and rms scatter = 0.18. A comparison of fitting results based on methods which use binned and unbinned data shows good agreement, although there may be a systematic tendency for the unbinned (maximum-likelihood) method to give slightly more negative values of α for galaxies with steper luminosity functions. Our uniform database results in a small scatter (0.5 magnitude) in the correlation between the magnitude of the brightest cluster (Mbrightest) and Log of the number of clusters brighter than MI = -9 (Log N). We also examine the magnitude of the brightest cluster vs. Log SFR for a sample including LIRGS and ULIRGS.

  17. Comprehensive analysis of Polygoni Multiflori Radix of different geographical origins using ultra-high-performance liquid chromatography fingerprints and multivariate chemometric methods.

    PubMed

    Sun, Li-Li; Wang, Meng; Zhang, Hui-Jie; Liu, Ya-Nan; Ren, Xiao-Liang; Deng, Yan-Ru; Qi, Ai-Di

    2018-01-01

    Polygoni Multiflori Radix (PMR) is increasingly being used not just as a traditional herbal medicine but also as a popular functional food. In this study, multivariate chemometric methods and mass spectrometry were combined to analyze the ultra-high-performance liquid chromatograph (UPLC) fingerprints of PMR from six different geographical origins. A chemometric strategy based on multivariate curve resolution-alternating least squares (MCR-ALS) and three classification methods is proposed to analyze the UPLC fingerprints obtained. Common chromatographic problems, including the background contribution, baseline contribution, and peak overlap, were handled by the established MCR-ALS model. A total of 22 components were resolved. Moreover, relative species concentrations were obtained from the MCR-ALS model, which was used for multivariate classification analysis. Principal component analysis (PCA) and Ward's method have been applied to classify 72 PMR samples from six different geographical regions. The PCA score plot showed that the PMR samples fell into four clusters, which related to the geographical location and climate of the source areas. The results were then corroborated by Ward's method. In addition, according to the variance-weighted distance between cluster centers obtained from Ward's method, five components were identified as the most significant variables (chemical markers) for cluster discrimination. A counter-propagation artificial neural network has been applied to confirm and predict the effects of chemical markers on different samples. Finally, the five chemical markers were identified by UPLC-quadrupole time-of-flight mass spectrometer. Components 3, 12, 16, 18, and 19 were identified as 2,3,5,4'-tetrahydroxy-stilbene-2-O-β-d-glucoside, emodin-8-O-β-d-glucopyranoside, emodin-8-O-(6'-O-acetyl)-β-d-glucopyranoside, emodin, and physcion, respectively. In conclusion, the proposed method can be applied for the comprehensive analysis of natural samples. Copyright © 2016. Published by Elsevier B.V.

  18. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    NASA Astrophysics Data System (ADS)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  19. THE DYNAMICS OF MERGING CLUSTERS: A MONTE CARLO SOLUTION APPLIED TO THE BULLET AND MUSKET BALL CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dawson, William A., E-mail: wadawson@ucdavis.edu

    2013-08-01

    Merging galaxy clusters have become one of the most important probes of dark matter, providing evidence for dark matter over modified gravity and even constraints on the dark matter self-interaction cross-section. To properly constrain the dark matter cross-section it is necessary to understand the dynamics of the merger, as the inferred cross-section is a function of both the velocity of the collision and the observed time since collision. While the best understanding of merging system dynamics comes from N-body simulations, these are computationally intensive and often explore only a limited volume of the merger phase space allowed by observed parametermore » uncertainty. Simple analytic models exist but the assumptions of these methods invalidate their results near the collision time, plus error propagation of the highly correlated merger parameters is unfeasible. To address these weaknesses I develop a Monte Carlo method to discern the properties of dissociative mergers and propagate the uncertainty of the measured cluster parameters in an accurate and Bayesian manner. I introduce this method, verify it against an existing hydrodynamic N-body simulation, and apply it to two known dissociative mergers: 1ES 0657-558 (Bullet Cluster) and DLSCL J0916.2+2951 (Musket Ball Cluster). I find that this method surpasses existing analytic models-providing accurate (10% level) dynamic parameter and uncertainty estimates throughout the merger history. This, coupled with minimal required a priori information (subcluster mass, redshift, and projected separation) and relatively fast computation ({approx}6 CPU hours), makes this method ideal for large samples of dissociative merging clusters.« less

  20. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    PubMed Central

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  1. A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data.

    PubMed

    Ray, Shubhra Sankar; Ganivada, Avatharam; Pal, Sankar K

    2016-09-01

    A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of β -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.

  2. A stellar census in globular clusters with MUSE: The contribution of rotation to cluster dynamics studied with 200 000 stars

    NASA Astrophysics Data System (ADS)

    Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.

    2018-02-01

    This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.

  3. Tailoring magnetic properties of Co nanocluster assembled films using hydrogen

    NASA Astrophysics Data System (ADS)

    Romero, C. P.; Volodin, A.; Paddubrouskaya, H.; Van Bael, M. J.; Van Haesendonck, C.; Lievens, P.

    2018-07-01

    Tailoring magnetic properties in nanocluster assembled cobalt (Co) thin films was achieved by admitting a small percentage of H2 gas (∼2%) into the Co gas phase cluster formation chamber prior to deposition. The oxygen content in the films is considerably reduced by the presence of hydrogen during the cluster formation, leading to enhanced magnetic interactions between clusters. Two sets of Co samples were fabricated, one without hydrogen gas and one with hydrogen gas. Magnetic properties of the non-hydrogenated and the hydrogen-treated Co nanocluster assembled films are comparatively studied using magnetic force microscopy and vibrating sample magnetometry. When comparing the two sets of samples the considerably larger coercive field of the H2-treated Co nanocluster film and the extended micrometer-sized magnetic domain structure confirm the enhancement of magnetic interactions between clusters. The thickness of the antiferromagnetic CoO layer is controlled with this procedure and modifies the exchange bias effect in these films. The exchange bias shift is lower for the H2-treated Co nanocluster film, which indicates that a thinner antiferromagnetic CoO reduces the coupling with the ferromagnetic Co. The hydrogen-treatment method can be used to tailor the oxidation levels thus controlling the magnetic properties of ferromagnetic cluster-assembled films.

  4. Widespread Micropollutant Monitoring in the Hudson River Estuary Reveals Spatiotemporal Micropollutant Clusters and Their Sources.

    PubMed

    Carpenter, Corey M G; Helbling, Damian E

    2018-06-05

    The objective of this study was to identify sources of micropollutants in the Hudson River Estuary (HRE). We collected 127 grab samples at 17 sites along the HRE over 2 years and screened for up to 200 micropollutants. We quantified 168 of the micropollutants in at least one of the samples. Atrazine, gabapentin, metolachlor, and sucralose were measured in every sample. We used data-driven unsupervised methods to cluster the micropollutants on the basis of their spatiotemporal occurrence and normalized-concentration patterns. Three major clusters of micropollutants were identified: ubiquitous and mixed-use (core micropollutants), sourced from sewage treatment plant outfalls (STP micropollutants), and derived from diffuse upstream sources (diffuse micropollutants). Each of these clusters was further refined into subclusters that were linked to specific sources on the basis of relationships identified through geospatial analysis of watershed features. Evaluation of cumulative loadings of each subcluster revealed that the Mohawk River and Rondout Creek are major contributors of most core micropollutants and STP micropollutants and the upper HRE is a major contributor of diffuse micropollutants. These data provide the first comprehensive evaluation of micropollutants in the HRE and define distinct spatiotemporal micropollutant clusters that are linked to sources and conserved across surface water systems around the world.

  5. PRIMUS: Galaxy clustering as a function of luminosity and color at 0.2 < z < 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skibba, Ramin A.; Smith, M. Stephen M.; Coil, Alison L.

    2014-04-01

    We present measurements of the luminosity and color-dependence of galaxy clustering at 0.2 < z < 1.0 in the Prism Multi-object Survey. We quantify the clustering with the redshift-space and projected two-point correlation functions, ξ(r{sub p} , π) and w{sub p} (r{sub p} ), using volume-limited samples constructed from a parent sample of over ∼130, 000 galaxies with robust redshifts in seven independent fields covering 9 deg{sup 2} of sky. We quantify how the scale-dependent clustering amplitude increases with increasing luminosity and redder color, with relatively small errors over large volumes. We find that red galaxies have stronger small-scale (0.1more » Mpc h {sup –1} < r{sub p} < 1 Mpc h {sup –1}) clustering and steeper correlation functions compared to blue galaxies, as well as a strong color dependent clustering within the red sequence alone. We interpret our measured clustering trends in terms of galaxy bias and obtain values of b {sub gal} ≈ 0.9-2.5, quantifying how galaxies are biased tracers of dark matter depending on their luminosity and color. We also interpret the color dependence with mock catalogs, and find that the clustering of blue galaxies is nearly constant with color, while redder galaxies have stronger clustering in the one-halo term due to a higher satellite galaxy fraction. In addition, we measure the evolution of the clustering strength and bias, and we do not detect statistically significant departures from passive evolution. We argue that the luminosity- and color-environment (or halo mass) relations of galaxies have not significantly evolved since z ∼ 1. Finally, using jackknife subsampling methods, we find that sampling fluctuations are important and that the COSMOS field is generally an outlier, due to having more overdense structures than other fields; we find that 'cosmic variance' can be a significant source of uncertainty for high-redshift clustering measurements.« less

  6. PRIMUS: Galaxy Clustering as a Function of Luminosity and Color at 0.2 < z < 1

    NASA Astrophysics Data System (ADS)

    Skibba, Ramin A.; Smith, M. Stephen M.; Coil, Alison L.; Moustakas, John; Aird, James; Blanton, Michael R.; Bray, Aaron D.; Cool, Richard J.; Eisenstein, Daniel J.; Mendez, Alexander J.; Wong, Kenneth C.; Zhu, Guangtun

    2014-04-01

    We present measurements of the luminosity and color-dependence of galaxy clustering at 0.2 < z < 1.0 in the Prism Multi-object Survey. We quantify the clustering with the redshift-space and projected two-point correlation functions, ξ(rp , π) and wp (rp ), using volume-limited samples constructed from a parent sample of over ~130, 000 galaxies with robust redshifts in seven independent fields covering 9 deg2 of sky. We quantify how the scale-dependent clustering amplitude increases with increasing luminosity and redder color, with relatively small errors over large volumes. We find that red galaxies have stronger small-scale (0.1 Mpc h -1 < rp < 1 Mpc h -1) clustering and steeper correlation functions compared to blue galaxies, as well as a strong color dependent clustering within the red sequence alone. We interpret our measured clustering trends in terms of galaxy bias and obtain values of b gal ≈ 0.9-2.5, quantifying how galaxies are biased tracers of dark matter depending on their luminosity and color. We also interpret the color dependence with mock catalogs, and find that the clustering of blue galaxies is nearly constant with color, while redder galaxies have stronger clustering in the one-halo term due to a higher satellite galaxy fraction. In addition, we measure the evolution of the clustering strength and bias, and we do not detect statistically significant departures from passive evolution. We argue that the luminosity- and color-environment (or halo mass) relations of galaxies have not significantly evolved since z ~ 1. Finally, using jackknife subsampling methods, we find that sampling fluctuations are important and that the COSMOS field is generally an outlier, due to having more overdense structures than other fields; we find that "cosmic variance" can be a significant source of uncertainty for high-redshift clustering measurements.

  7. Multimorbidity and health-related quality of life (HRQoL) in a nationally representative population sample: implications of count versus cluster method for defining multimorbidity on HRQoL.

    PubMed

    Wang, Lili; Palmer, Andrew J; Cocker, Fiona; Sanderson, Kristy

    2017-01-09

    No universally accepted definition of multimorbidity (MM) exists, and implications of different definitions have not been explored. This study examined the performance of the count and cluster definitions of multimorbidity on the sociodemographic profile and health-related quality of life (HRQoL) in a general population. Data were derived from the nationally representative 2007 Australian National Survey of Mental Health and Wellbeing (n = 8841). The HRQoL scores were measured using the Assessment of Quality of Life (AQoL-4D) instrument. The simple count (2+ & 3+ conditions) and hierarchical cluster methods were used to define/identify clusters of multimorbidity. Linear regression was used to assess the associations between HRQoL and multimorbidity as defined by the different methods. The assessment of multimorbidity, which was defined using the count method, resulting in the prevalence of 26% (MM2+) and 10.1% (MM3+). Statistically significant clusters identified through hierarchical cluster analysis included heart or circulatory conditions (CVD)/arthritis (cluster-1, 9%) and major depressive disorder (MDD)/anxiety (cluster-2, 4%). A sensitivity analysis suggested that the stability of the clusters resulted from hierarchical clustering. The sociodemographic profiles were similar between MM2+, MM3+ and cluster-1, but were different from cluster-2. HRQoL was negatively associated with MM2+ (β: -0.18, SE: -0.01, p < 0.001), MM3+ (β: -0.23, SE: -0.02, p < 0.001), cluster-1 (β: -0.10, SE: 0.01, p < 0.001) and cluster-2 (β: -0.36, SE: 0.01, p < 0.001). Our findings confirm the existence of an inverse relationship between multimorbidity and HRQoL in the Australian population and indicate that the hierarchical clustering approach is validated when the outcome of interest is HRQoL from this head-to-head comparison. Moreover, a simple count fails to identify if there are specific conditions of interest that are driving poorer HRQoL. Researchers should exercise caution when selecting a definition of multimorbidity because it may significantly influence the study outcomes.

  8. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    PubMed Central

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591

  9. Effective optimization using sample persistence: A case study on quantum annealers and various Monte Carlo optimization methods

    NASA Astrophysics Data System (ADS)

    Karimi, Hamed; Rosenberg, Gili; Katzgraber, Helmut G.

    2017-10-01

    We present and apply a general-purpose, multistart algorithm for improving the performance of low-energy samplers used for solving optimization problems. The algorithm iteratively fixes the value of a large portion of the variables to values that have a high probability of being optimal. The resulting problems are smaller and less connected, and samplers tend to give better low-energy samples for these problems. The algorithm is trivially parallelizable since each start in the multistart algorithm is independent, and could be applied to any heuristic solver that can be run multiple times to give a sample. We present results for several classes of hard problems solved using simulated annealing, path-integral quantum Monte Carlo, parallel tempering with isoenergetic cluster moves, and a quantum annealer, and show that the success metrics and the scaling are improved substantially. When combined with this algorithm, the quantum annealer's scaling was substantially improved for native Chimera graph problems. In addition, with this algorithm the scaling of the time to solution of the quantum annealer is comparable to the Hamze-de Freitas-Selby algorithm on the weak-strong cluster problems introduced by Boixo et al. Parallel tempering with isoenergetic cluster moves was able to consistently solve three-dimensional spin glass problems with 8000 variables when combined with our method, whereas without our method it could not solve any.

  10. Could the clinical interpretability of subgroups detected using clustering methods be improved by using a novel two-stage approach?

    PubMed

    Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice

    2015-01-01

    Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.

  11. Toward cost-efficient sampling methods

    NASA Astrophysics Data System (ADS)

    Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie

    2015-09-01

    The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.

  12. Enhancing local health department disaster response capacity with rapid community needs assessments: validation of a computerized program for binary attribute cluster sampling.

    PubMed

    Groenewold, Matthew R

    2006-01-01

    Local health departments are among the first agencies to respond to disasters or other mass emergencies. However, they often lack the ability to handle large-scale events. Plans including locally developed and deployed tools may enhance local response. Simplified cluster sampling methods can be useful in assessing community needs after a sudden-onset, short duration event. Using an adaptation of the methodology used by the World Health Organization Expanded Programme on Immunization (EPI), a Microsoft Access-based application for two-stage cluster sampling of residential addresses in Louisville/Jefferson County Metro, Kentucky was developed. The sampling frame was derived from geographically referenced data on residential addresses and political districts available through the Louisville/Jefferson County Information Consortium (LOJIC). The program randomly selected 30 clusters, defined as election precincts, from within the area of interest, and then, randomly selected 10 residential addresses from each cluster. The program, called the Rapid Assessment Tools Package (RATP), was tested in terms of accuracy and precision using data on a dichotomous characteristic of residential addresses available from the local tax assessor database. A series of 30 samples were produced and analyzed with respect to their precision and accuracy in estimating the prevalence of the study attribute. Point estimates with 95% confidence intervals were calculated by determining the proportion of the study attribute values in each of the samples and compared with the population proportion. To estimate the design effect, corresponding simple random samples of 300 addresses were taken after each of the 30 cluster samples. The sample proportion fell within +/-10 absolute percentage points of the true proportion in 80% of the samples. In 93.3% of the samples, the point estimate fell within +/-12.5%, and 96.7% fell within +/-15%. All of the point estimates fell within +/-20% of the true proportion. Estimates of the design effect ranged from 0.926 to 1.436 (mean = 1.157, median = 1.170) for the 30 samples. Although prospective evaluation of its performance in field trials or a real emergency is required to confirm its utility, this study suggests that the RATP, a locally designed and deployed tool, may provide population-based estimates of community needs or the extent of event-related consequences that are precise enough to serve as the basis for the initial post-event decisions regarding relief efforts.

  13. A modified cluster-sampling method for post-disaster rapid assessment of needs.

    PubMed Central

    Malilay, J.; Flanders, W. D.; Brogan, D.

    1996-01-01

    The cluster-sampling method can be used to conduct rapid assessment of health and other needs in communities affected by natural disasters. It is modelled on WHO's Expanded Programme on Immunization method of estimating immunization coverage, but has been modified to provide (1) estimates of the population remaining in an area, and (2) estimates of the number of people in the post-disaster area with specific needs. This approach differs from that used previously in other disasters where rapid needs assessments only estimated the proportion of the population with specific needs. We propose a modified n x k survey design to estimate the remaining population, severity of damage, the proportion and number of people with specific needs, the number of damaged or destroyed and remaining housing units, and the changes in these estimates over a period of time as part of the survey. PMID:8823962

  14. Free energy reconstruction from steered dynamics without post-processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Athenes, Manuel, E-mail: Manuel.Athenes@cea.f; Condensed Matter and Materials Division, Physics and Life Sciences Directorate, LLNL, Livermore, CA 94551; Marinica, Mihai-Cosmin

    2010-09-20

    Various methods achieving importance sampling in ensembles of nonequilibrium trajectories enable one to estimate free energy differences and, by maximum-likelihood post-processing, to reconstruct free energy landscapes. Here, based on Bayes theorem, we propose a more direct method in which a posterior likelihood function is used both to construct the steered dynamics and to infer the contribution to equilibrium of all the sampled states. The method is implemented with two steering schedules. First, using non-autonomous steering, we calculate the migration barrier of the vacancy in Fe-{alpha}. Second, using an autonomous scheduling related to metadynamics and equivalent to temperature-accelerated molecular dynamics, wemore » accurately reconstruct the two-dimensional free energy landscape of the 38-atom Lennard-Jones cluster as a function of an orientational bond-order parameter and energy, down to the solid-solid structural transition temperature of the cluster and without maximum-likelihood post-processing.« less

  15. Permutation Tests of Hierarchical Cluster Analyses of Carrion Communities and Their Potential Use in Forensic Entomology.

    PubMed

    van der Ham, Joris L

    2016-05-19

    Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. Estimation of Rank Correlation for Clustered Data

    PubMed Central

    Rosner, Bernard; Glynn, Robert

    2017-01-01

    It is well known that the sample correlation coefficient (Rxy) is the maximum likelihood estimator (MLE) of the Pearson correlation (ρxy) for i.i.d. bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the MLE of ρxy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (a) converting ranks of both X and Y to the probit scale, (b) estimating the Pearson correlation between probit scores for X and Y, and (c) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. PMID:28399615

  17. Authentication of monofloral Yemeni Sidr honey using ultraviolet spectroscopy and chemometric analysis.

    PubMed

    Roshan, Abdul-Rahman A; Gad, Haidy A; El-Ahmady, Sherweit H; Khanbash, Mohamed S; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M

    2013-08-14

    This work describes a simple model developed for the authentication of monofloral Yemeni Sidr honey using UV spectroscopy together with chemometric techniques of hierarchical cluster analysis (HCA), principal component analysis (PCA), and soft independent modeling of class analogy (SIMCA). The model was constructed using 13 genuine Sidr honey samples and challenged with 25 honey samples of different botanical origins. HCA and PCA were successfully able to present a preliminary clustering pattern to segregate the genuine Sidr samples from the lower priced local polyfloral and non-Sidr samples. The SIMCA model presented a clear demarcation of the samples and was used to identify genuine Sidr honey samples as well as detect admixture with lower priced polyfloral honey by detection limits >10%. The constructed model presents a simple and efficient method of analysis and may serve as a basis for the authentication of other honey types worldwide.

  18. A field test of three LQAS designs to assess the prevalence of acute malnutrition.

    PubMed

    Deitchler, Megan; Valadez, Joseph J; Egge, Kari; Fernandez, Soledad; Hennigan, Mary

    2007-08-01

    The conventional method for assessing the prevalence of Global Acute Malnutrition (GAM) in emergency settings is the 30 x 30 cluster-survey. This study describes alternative approaches: three Lot Quality Assurance Sampling (LQAS) designs to assess GAM. The LQAS designs were field-tested and their results compared with those from a 30 x 30 cluster-survey. Computer simulations confirmed that small clusters instead of a simple random sample could be used for LQAS assessments of GAM. Three LQAS designs were developed (33 x 6, 67 x 3, Sequential design) to assess GAM thresholds of 10, 15 and 20%. The designs were field-tested simultaneously with a 30 x 30 cluster-survey in Siraro, Ethiopia during June 2003. Using a nested study design, anthropometric, morbidity and vaccination data were collected on all children 6-59 months in sampled households. Hypothesis tests about GAM thresholds were conducted for each LQAS design. Point estimates were obtained for the 30 x 30 cluster-survey and the 33 x 6 and 67 x 3 LQAS designs. Hypothesis tests showed GAM as <10% for the 33 x 6 design and GAM as > or =10% for the 67 x 3 and Sequential designs. Point estimates for the 33 x 6 and 67 x 3 designs were similar to those of the 30 x 30 cluster-survey for GAM (6.7%, CI = 3.2-10.2%; 8.2%, CI = 4.3-12.1%, 7.4%, CI = 4.8-9.9%) and all other indicators. The CIs for the LQAS designs were only slightly wider than the CIs for the 30 x 30 cluster-survey; yet the LQAS designs required substantially less time to administer. The LQAS designs provide statistically appropriate alternatives to the more time-consuming 30 x 30 cluster-survey. However, additional field-testing is needed using independent samples rather than a nested study design.

  19. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    NASA Astrophysics Data System (ADS)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  20. [Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

    PubMed

    Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

    2015-12-01

    We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.

  1. Statistical Analysis of Large Scale Structure by the Discrete Wavelet Transform

    NASA Astrophysics Data System (ADS)

    Pando, Jesus

    1997-10-01

    The discrete wavelet transform (DWT) is developed as a general statistical tool for the study of large scale structures (LSS) in astrophysics. The DWT is used in all aspects of structure identification including cluster analysis, spectrum and two-point correlation studies, scale-scale correlation analysis and to measure deviations from Gaussian behavior. The techniques developed are demonstrated on 'academic' signals, on simulated models of the Lymanα (Lyα) forests, and on observational data of the Lyα forests. This technique can detect clustering in the Ly-α clouds where traditional techniques such as the two-point correlation function have failed. The position and strength of these clusters in both real and simulated data is determined and it is shown that clusters exist on scales as large as at least 20 h-1 Mpc at significance levels of 2-4 σ. Furthermore, it is found that the strength distribution of the clusters can be used to distinguish between real data and simulated samples even where other traditional methods have failed to detect differences. Second, a method for measuring the power spectrum of a density field using the DWT is developed. All common features determined by the usual Fourier power spectrum can be calculated by the DWT. These features, such as the index of a power law or typical scales, can be detected even when the samples are geometrically complex, the samples are incomplete, or the mean density on larger scales is not known (the infrared uncertainty). Using this method the spectra of Ly-α forests in both simulated and real samples is calculated. Third, a method for measuring hierarchical clustering is introduced. Because hierarchical evolution is characterized by a set of rules of how larger dark matter halos are formed by the merging of smaller halos, scale-scale correlations of the density field should be one of the most sensitive quantities in determining the merging history. We show that these correlations can be completely determined by the correlations between discrete wavelet coefficients on adjacent scales and at nearly the same spatial position, Cj,j+12/cdot2. Scale-scale correlations on two samples of the QSO Ly-α forests absorption spectra are computed. Lastly, higher order statistics are developed to detect deviations from Gaussian behavior. These higher order statistics are necessary to fully characterize the Ly-α forests because the usual 2nd order statistics, such as the two-point correlation function or power spectrum, give inconclusive results. It is shown how this technique takes advantage of the locality of the DWT to circumvent the central limit theorem. A non-Gaussian spectrum is defined and this spectrum reveals not only the magnitude, but the scales of non-Gaussianity. When applied to simulated and observational samples of the Ly-α clouds, it is found that different popular models of structure formation have different spectra while two, independent observational data sets, have the same spectra. Moreover, the non-Gaussian spectra of real data sets are significantly different from the spectra of various possible random samples. (Abstract shortened by UMI.)

  2. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population

    PubMed Central

    Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi

    2015-01-01

    Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. PMID:26339613

  3. Weighing the Giants - I. Weak-lensing masses for 51 massive galaxy clusters: project overview, data analysis methods and cluster images

    NASA Astrophysics Data System (ADS)

    von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

    2014-03-01

    This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic uncertainty for our weak-lensing mass measurements. In accompanying papers, we discuss the key aspects of our photometric calibration and photometric redshift measurements (Kelly et al.), and measure cluster masses using two methods, including a novel Bayesian weak-lensing approach that makes full use of the photometric redshift probability distributions for individual background galaxies (Applegate et al.). In subsequent papers, we will incorporate these weak-lensing mass measurements into a self-consistent framework to simultaneously determine cluster scaling relations and cosmological parameters.

  4. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    NASA Astrophysics Data System (ADS)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  5. Unsupervised classification of multivariate geostatistical data: Two algorithms

    NASA Astrophysics Data System (ADS)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  6. Characterization of Omega-WINGS galaxy clusters. I. Stellar light and mass profiles

    NASA Astrophysics Data System (ADS)

    Cariddi, S.; D'Onofrio, M.; Fasano, G.; Poggianti, B. M.; Moretti, A.; Gullieuszik, M.; Bettoni, D.; Sciarratta, M.

    2018-02-01

    Context. Galaxy clusters are the largest virialized structures in the observable Universe. Knowledge of their properties provides many useful astrophysical and cosmological information. Aims: Our aim is to derive the luminosity and stellar mass profiles of the nearby galaxy clusters of the Omega-WINGS survey and to study the main scaling relations valid for such systems. Methods: We merged data from the WINGS and Omega-WINGS databases, sorted the sources according to the distance from the brightest cluster galaxy (BCG), and calculated the integrated luminosity profiles in the B and V bands, taking into account extinction, photometric and spatial completeness, K correction, and background contribution. Then, by exploiting the spectroscopic sample we derived the stellar mass profiles of the clusters. Results: We obtained the luminosity profiles of 46 galaxy clusters, reaching r200 in 30 cases, and the stellar mass profiles of 42 of our objects. We successfully fitted all the integrated luminosity growth profiles with one or two embedded Sérsic components, deriving the main clusters parameters. Finally, we checked the main scaling relation among the clusters parameters in comparison with those obtained for a selected sample of early-type galaxies (ETGs) of the same clusters. Conclusions: We found that the nearby galaxy clusters are non-homologous structures such as ETGs and exhibit a color-magnitude (CM) red-sequence relation very similar to that observed for galaxies in clusters. These properties are not expected in the current cluster formation scenarios. In particular the existence of a CM relation for clusters, shown here for the first time, suggests that the baryonic structures grow and evolve in a similar way at all scales.

  7. The panchromatic Hubble Andromeda Treasury. V. Ages and masses of the year 1 stellar clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fouesneau, Morgan; Johnson, L. Clifton; Weisz, Daniel R.

    We present ages and masses for 601 star clusters in M31 from the analysis of the six filter integrated light measurements from near-ultraviolet to near-infrared wavelengths, made as part of the Panchromatic Hubble Andromeda Treasury (PHAT). We derive the ages and masses using a probabilistic technique, which accounts for the effects of stochastic sampling of the stellar initial mass function. Tests on synthetic data show that this method, in conjunction with the exquisite sensitivity of the PHAT observations and their broad wavelength baseline, provides robust age and mass recovery for clusters ranging from ∼10{sup 2} to 2 × 10{sup 6}more » M {sub ☉}. We find that the cluster age distribution is consistent with being uniform over the past 100 Myr, which suggests a weak effect of cluster disruption within M31. The age distribution of older (>100 Myr) clusters falls toward old ages, consistent with a power-law decline of index –1, likely from a combination of fading and disruption of the clusters. We find that the mass distribution of the whole sample can be well described by a single power law with a spectral index of –1.9 ± 0.1 over the range of 10{sup 3}-3 × 10{sup 5} M {sub ☉}. However, if we subdivide the sample by galactocentric radius, we find that the age distributions remain unchanged. However, the mass spectral index varies significantly, showing best-fit values between –2.2 and –1.8, with the shallower slope in the highest star formation intensity regions. We explore the robustness of our study to potential systematics and conclude that the cluster mass function may vary with respect to environment.« less

  8. HIV Transmission Networks in the San Diego–Tijuana Border Region

    PubMed Central

    Mehta, Sanjay R.; Wertheim, Joel O.; Brouwer, Kimberly C.; Wagner, Karla D.; Chaillon, Antoine; Strathdee, Steffanie; Patterson, Thomas L.; Rangel, Maria G.; Vargas, Mlenka; Murrell, Ben; Garfein, Richard; Little, Susan J.; Smith, Davey M.

    2015-01-01

    Background HIV sequence data can be used to reconstruct local transmission networks. Along international borders, like the San Diego–Tijuana region, understanding the dynamics of HIV transmission across reported risks, racial/ethnic groups, and geography can help direct effective prevention efforts on both sides of the border. Methods We gathered sociodemographic, geographic, clinical, and viral sequence data from HIV infected individuals participating in ten studies in the San Diego–Tijuana border region. Phylogenetic and network analysis was performed to infer putative relationships between HIV sequences. Correlates of identified clusters were evaluated and spatiotemporal relationships were explored using Bayesian phylogeographic analysis. Findings After quality filtering, 843 HIV sequences with associated demographic data and 263 background sequences from the region were analyzed, and 138 clusters were inferred (2–23 individuals). Overall, the rate of clustering did not differ by ethnicity, residence, or sex, but bisexuals were less likely to cluster than heterosexuals or men who have sex with men (p = 0.043), and individuals identifying as white (p ≤ 0.01) were more likely to cluster than other races. Clustering individuals were also 3.5 years younger than non-clustering individuals (p < 0.001). Although the sampled San Diego and Tijuana epidemics were phylogenetically compartmentalized, five clusters contained individuals residing on both sides of the border. Interpretation This study sampled ~ 7% of HIV infected individuals in the border region, and although the sampled networks on each side of the border were largely separate, there was evidence of persistent bidirectional cross-border transmissions that linked risk groups, thus highlighting the importance of the border region as a “melting pot” of risk groups. Funding NIH, VA, and Pendleton Foundation. PMID:26629540

  9. Sub-tesla-field magnetization of vibrated magnetic nanoreagents for screening tumor markers

    NASA Astrophysics Data System (ADS)

    Chieh, Jen-Jie; Huang, Kai-Wen; Shi, Jin-Cheng

    2015-02-01

    Magnetic nanoreagents (MNRs), consisting of liquid solutions and magnetic nanoparticles (MNPs) coated with bioprobes, have been widely used in biomedical disciplines. For in vitro tests of serum biomarkers, numerous MNR-based magnetic immunoassay methods or schemes have been developed; however, their applications are limited. In this study, a vibrating sample magnetometer (VSM) was used for screening tumor biomarkers based on the same MNRs as those used in other immunoassay methods. The examination mechanism is that examined tumor biomarkers are typically conjugated to the bioprobes coated on MNPs to form magnetic clusters. Consequently, the sub-Tesla-field magnetization (Msub-T) of MNRs, including magnetic clusters, exceeds that of MNRs containing only separate MNPs. For human serum samples, proteins other than the targeted biomarkers induce the formation of magnetic clusters with increased Msub-T because of weak nonspecific binding. In this study, this interference problem was suppressed by the vibration condition in the VSM and analysis. Based on a referenced Msub-T,0 value defined by the average Msub-T value of a normal person's serum samples, including general proteins and few tumor biomarkers, the difference ΔMsub-T between the measured Msub-T and the reference Msub-T,0 determined the expression of only target tumor biomarkers in the tested serum samples. By using common MNRs with an alpha-fetoprotein-antibody coating, this study demonstrated that a current VSM can perform clinical screening of hepatocellular carcinoma.

  10. Method for evaluating wind turbine wake effects on wind farm performance

    NASA Technical Reports Server (NTRS)

    Neustadter, H. E.; Spera, D. A.

    1985-01-01

    A method of testing the performance of a cluster of wind turbine units an data analysis equations are presented which together form a simple and direct procedure for determining the reduction in energy output caused by the wake of an upwind turbine. This method appears to solve the problems presented by data scatter and wind variability. Test data from the three-unit Mod-2 wind turbine cluster at Goldendale, Washington, are analyzed to illustrate the application of the proposed method. In this sample case the reduction in energy was found to be about 10 percent when the Mod-2 units were separated a distance equal to seven diameters and winds were below rated.

  11. The luminosity function of star clusters in 20 star-forming galaxies based on Hubble legacy archive photometry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitmore, Bradley C.; Bowers, Ariel S.; Lindsay, Kevin

    2014-04-01

    Luminosity functions (LFs) have been determined for star cluster populations in 20 nearby (4-30 Mpc), star-forming galaxies based on Advanced Camera for Surveys source lists generated by the Hubble Legacy Archive (HLA). These cluster catalogs provide one of the largest sets of uniform, automatically generated cluster candidates available in the literature at present. Comparisons are made with other recently generated cluster catalogs demonstrating that the HLA-generated catalogs are of similar quality, but in general do not go as deep. A typical cluster LF can be approximated by a power law, dN/dL∝L {sup α}, with an average value for α ofmore » –2.37 and rms scatter = 0.18 when using the F814W ('I') band. A comparison of fitting results based on methods that use binned and unbinned data shows good agreement, although there may be a systematic tendency for the unbinned (maximum likelihood) method to give slightly more negative values of α for galaxies with steeper LFs. We find that galaxies with high rates of star formation (or equivalently, with the brightest or largest numbers of clusters) have a slight tendency to have shallower values of α. In particular, the Antennae galaxy (NGC 4038/39), a merging system with a relatively high star formation rate (SFR), has the second flattest LF in the sample. A tentative correlation may also be present between Hubble type and values of α, in the sense that later type galaxies (i.e., Sd and Sm) appear to have flatter LFs. Hence, while there do appear to be some weak correlations, the relative similarity in the values of α for a large number of star-forming galaxies suggests that, to first order, the LFs are fairly universal. We examine the bright end of the LFs and find evidence for a downturn, although it only pertains to about 1% of the clusters. Our uniform database results in a small scatter (≈0.4 to 0.5 mag) in the correlation between the magnitude of the brightest cluster (M {sub brightest}) and log of the number of clusters brighter than M{sub I} = –9 (log N). We also examine the magnitude of the brightest cluster versus log SFR for a sample including both dwarf galaxies and ULIRGs. This shows that the correlation extends over roughly six orders of magnitude but with scatter that is larger than for our spiral sample, probably because of the high levels of extinction in many of the LIRGs.« less

  12. Variation of heavy metals in recent sediments from Piratininga Lagoon (Brazil): interpretation of geochemical data with the aid of multivariate analysis

    NASA Astrophysics Data System (ADS)

    Huang, W.; Campredon, R.; Abrao, J. J.; Bernat, M.; Latouche, C.

    1994-06-01

    In the last decade, the Atlantic coast of south-eastern Brazil has been affected by increasing deforestation and anthropogenic effluents. Sediments in the coastal lagoons have recorded the process of such environmental change. Thirty-seven sediment samples from three cores in Piratininga Lagoon, Rio de Janeiro, were analyzed for their major components and minor element concentrations in order to examine geochemical characteristics and the depositional environment and to investigate the variation of heavy metals of environmental concern. Two multivariate analysis methods, principal component analysis and cluster analysis, were performed on the analytical data set to help visualize the sample clusters and the element associations. On the whole, the sediment samples from each core are similar and the sample clusters corresponding to the three cores are clearly separated, as a result of the different conditions of sedimentation. Some changes in the depositional environment are recognized using the results of multivariate analysis. The enrichment of Pb, Cu, and Zn in the upper parts of cores is in agreement with increasing anthropogenic influx (pollution).

  13. Competitive Deep-Belief Networks for Underwater Acoustic Target Recognition

    PubMed Central

    Shen, Sheng; Yao, Xiaohui; Sheng, Meiping; Wang, Chen

    2018-01-01

    Underwater acoustic target recognition based on ship-radiated noise belongs to the small-sample-size recognition problems. A competitive deep-belief network is proposed to learn features with more discriminative information from labeled and unlabeled samples. The proposed model consists of four stages: (1) A standard restricted Boltzmann machine is pretrained using a large number of unlabeled data to initialize its parameters; (2) the hidden units are grouped according to categories, which provides an initial clustering model for competitive learning; (3) competitive training and back-propagation algorithms are used to update the parameters to accomplish the task of clustering; (4) by applying layer-wise training and supervised fine-tuning, a deep neural network is built to obtain features. Experimental results show that the proposed method can achieve classification accuracy of 90.89%, which is 8.95% higher than the accuracy obtained by the compared methods. In addition, the highest accuracy of our method is obtained with fewer features than other methods. PMID:29570642

  14. Occurrence of Radio Minihalos in a Mass-Limited Sample of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Giacintucci, Simona; Markevitch, Maxim; Cassano, Rossella; Venturi, Tiziana; Clarke, Tracy E.; Brunetti, Gianfranco

    2017-01-01

    We investigate the occurrence of radio minihalos-diffuse radio sources of unknown origin observed in the cores of some galaxy clusters-in a statistical sample of 58 clusters drawn from the Planck Sunyaev-Zeldovich cluster catalog using a mass cut (M(sub 500) greater than 6 x 10(exp 14) solar mass). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present. Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores-at least 12 out of 15 (80%)-in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or "warm cores." These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.

  15. Elemental Mixing State of Aerosol Particles Collected in Central Amazonia during GoAmazon2014/15

    DOE PAGES

    Fraund, Matthew; Pham, Don; Bonanno, Daniel; ...

    2017-09-15

    Two complementary techniques, Scanning Transmission X-ray Microscopy/Near Edge Fine Structure spectroscopy (STXM/NEXAFS) and Scanning Electron Microscopy/Energy Dispersive X-ray spectroscopy (SEM/EDX), have been quantitatively combined to characterize individual atmospheric particles. This pair of techniques was applied to particle samples at three sampling sites (ATTO, ZF2, and T3) in the Amazon basin as part of the Observations and Modeling of the Green Ocean Amazon (GoAmazon2014/5) field campaign during the dry season of 2014. The combined data was subjected to k-means clustering using mass fractions of the following elements: C, N, O, Na, Mg, P, S, Cl, K, Ca, Mn, Fe, Ni, andmore » Zn. Cluster analysis identified 12 particle types, across different sampling sites and particle sizes. Samples from the remote Amazon Tall Tower Observatory (ATTO, also T0a) exhibited less cluster variety and fewer anthropogenic clusters than samples collected at the sites nearer to the Manaus metropolitan region, ZF2 (also T0t) or T3. Samples from the ZF2 site contained aged/anthropogenic clusters not readily explained by transport from ATTO or Manaus, possibly suggesting the effects of long range atmospheric transport or other local aerosol sources present during sampling. In addition, this data set allowed for recently established diversity parameters to be calculated. All sample periods had high mixing state indices (χ) that were >0.8. Two individual particle diversity (D i) populations were observed, with particles <0.5 μm having a D i of ~2.4 and >0.5 μm particles having a D i of ~3.6, which likely correspond to fresh and aged aerosols respectively. The diversity parameters determined by the quantitative method presented here will serve to aid in the accurate representation of aerosol mixing state, source apportionment, and aging in both less polluted and more industrialized environments in the Amazon Basin.« less

  16. Elemental Mixing State of Aerosol Particles Collected in Central Amazonia during GoAmazon2014/15

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fraund, Matthew; Pham, Don; Bonanno, Daniel

    Two complementary techniques, Scanning Transmission X-ray Microscopy/Near Edge Fine Structure spectroscopy (STXM/NEXAFS) and Scanning Electron Microscopy/Energy Dispersive X-ray spectroscopy (SEM/EDX), have been quantitatively combined to characterize individual atmospheric particles. This pair of techniques was applied to particle samples at three sampling sites (ATTO, ZF2, and T3) in the Amazon basin as part of the Observations and Modeling of the Green Ocean Amazon (GoAmazon2014/5) field campaign during the dry season of 2014. The combined data was subjected to k-means clustering using mass fractions of the following elements: C, N, O, Na, Mg, P, S, Cl, K, Ca, Mn, Fe, Ni, andmore » Zn. Cluster analysis identified 12 particle types, across different sampling sites and particle sizes. Samples from the remote Amazon Tall Tower Observatory (ATTO, also T0a) exhibited less cluster variety and fewer anthropogenic clusters than samples collected at the sites nearer to the Manaus metropolitan region, ZF2 (also T0t) or T3. Samples from the ZF2 site contained aged/anthropogenic clusters not readily explained by transport from ATTO or Manaus, possibly suggesting the effects of long range atmospheric transport or other local aerosol sources present during sampling. In addition, this data set allowed for recently established diversity parameters to be calculated. All sample periods had high mixing state indices (χ) that were >0.8. Two individual particle diversity (D i) populations were observed, with particles <0.5 μm having a D i of ~2.4 and >0.5 μm particles having a D i of ~3.6, which likely correspond to fresh and aged aerosols respectively. The diversity parameters determined by the quantitative method presented here will serve to aid in the accurate representation of aerosol mixing state, source apportionment, and aging in both less polluted and more industrialized environments in the Amazon Basin.« less

  17. Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials

    PubMed Central

    Martin, James; Taljaard, Monica; Girling, Alan; Hemming, Karla

    2016-01-01

    Background Stepped-wedge cluster randomised trials (SW-CRT) are increasingly being used in health policy and services research, but unless they are conducted and reported to the highest methodological standards, they are unlikely to be useful to decision-makers. Sample size calculations for these designs require allowance for clustering, time effects and repeated measures. Methods We carried out a methodological review of SW-CRTs up to October 2014. We assessed adherence to reporting each of the 9 sample size calculation items recommended in the 2012 extension of the CONSORT statement to cluster trials. Results We identified 32 completed trials and 28 independent protocols published between 1987 and 2014. Of these, 45 (75%) reported a sample size calculation, with a median of 5.0 (IQR 2.5–6.0) of the 9 CONSORT items reported. Of those that reported a sample size calculation, the majority, 33 (73%), allowed for clustering, but just 15 (33%) allowed for time effects. There was a small increase in the proportions reporting a sample size calculation (from 64% before to 84% after publication of the CONSORT extension, p=0.07). The type of design (cohort or cross-sectional) was not reported clearly in the majority of studies, but cohort designs seemed to be most prevalent. Sample size calculations in cohort designs were particularly poor with only 3 out of 24 (13%) of these studies allowing for repeated measures. Discussion The quality of reporting of sample size items in stepped-wedge trials is suboptimal. There is an urgent need for dissemination of the appropriate guidelines for reporting and methodological development to match the proliferation of the use of this design in practice. Time effects and repeated measures should be considered in all SW-CRT power calculations, and there should be clarity in reporting trials as cohort or cross-sectional designs. PMID:26846897

  18. Clustering gene expression regulators: new approach to disease subtyping.

    PubMed

    Pyatnitskiy, Mikhail; Mazo, Ilya; Shkrob, Maria; Schwartz, Elena; Kotelnikova, Ekaterina

    2014-01-01

    One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient.

  19. Clustering Gene Expression Regulators: New Approach to Disease Subtyping

    PubMed Central

    Pyatnitskiy, Mikhail; Mazo, Ilya; Shkrob, Maria; Schwartz, Elena; Kotelnikova, Ekaterina

    2014-01-01

    One of the main challenges in modern medicine is to stratify different patient groups in terms of underlying disease molecular mechanisms as to develop more personalized approach to therapy. Here we propose novel method for disease subtyping based on analysis of activated expression regulators on a sample-by-sample basis. Our approach relies on Sub-Network Enrichment Analysis algorithm (SNEA) which identifies gene subnetworks with significant concordant changes in expression between two conditions. Subnetwork consists of central regulator and downstream genes connected by relations extracted from global literature-extracted regulation database. Regulators found in each patient separately are clustered together and assigned activity scores which are used for final patients grouping. We show that our approach performs well compared to other related methods and at the same time provides researchers with complementary level of understanding of pathway-level biology behind a disease by identification of significant expression regulators. We have observed the reasonable grouping of neuromuscular disorders (triggered by structural damage vs triggered by unknown mechanisms), that was not revealed using standard expression profile clustering. For another experiment we were able to suggest the clusters of regulators, responsible for colorectal carcinoma vs adenoma discrimination and identify frequently genetically changed regulators that could be of specific importance for the individual characteristics of cancer development. Proposed approach can be regarded as biologically meaningful feature selection, reducing tens of thousands of genes down to dozens of clusters of regulators. Obtained clusters of regulators make possible to generate valuable biological hypotheses about molecular mechanisms related to a clinical outcome for individual patient. PMID:24416320

  20. Progress toward Synthesis and Characterization of Rare-Earth Nanoparticles

    NASA Astrophysics Data System (ADS)

    Romero, Dulce G.; Ho, Pei-Chun; Attar, Saeed; Margosan, Dennis

    2010-03-01

    Magnetic nanoparticles exhibit interesting phenomena, such as enhanced magnetization and reduced magnetic ordering temperature (i.e. superparamagnetism), which has technical applications in industry, including magnetic storage, magnetic imaging, and magnetic refrigeration. We used the inverse micelle technique to synthesize Gd and Nd nanoparticles given its potential to control the cluster size, amount of aggregation, and prevent oxidation of the rare-earth elements. Gd and Nd were reduced by NaBH4 from the chloride salt. The produced clusters were characterized by X-ray diffraction (XRD), scanning electron microscopy (SEM), and energy dispersive X-ray spectroscopy (EDX). The results from the XRD show that the majority of the peaks match those of the surfactant, DDAB. No peaks of Gd were observed due to excess surfactant or amorphous clusters. However, the results from the SEM and EDX indicate the presence of Gd and Nd in our clusters microscopically, and current synthesized samples contain impurities. We are using liquid-liquid extraction method to purify the sample, and the results will be discussed.

  1. Sampling designs for HIV molecular epidemiology with application to Honduras.

    PubMed

    Shepherd, Bryan E; Rossini, Anthony J; Soto, Ramon Jeremias; De Rivera, Ivette Lorenzana; Mullins, James I

    2005-11-01

    Proper sampling is essential to characterize the molecular epidemiology of human immunodeficiency virus (HIV). HIV sampling frames are difficult to identify, so most studies use convenience samples. We discuss statistically valid and feasible sampling techniques that overcome some of the potential for bias due to convenience sampling and ensure better representation of the study population. We employ a sampling design called stratified cluster sampling. This first divides the population into geographical and/or social strata. Within each stratum, a population of clusters is chosen from groups, locations, or facilities where HIV-positive individuals might be found. Some clusters are randomly selected within strata and individuals are randomly selected within clusters. Variation and cost help determine the number of clusters and the number of individuals within clusters that are to be sampled. We illustrate the approach through a study designed to survey the heterogeneity of subtype B strains in Honduras.

  2. The ROSAT Brightest Cluster Sample - I. The compilation of the sample and the cluster log N-log S distribution

    NASA Astrophysics Data System (ADS)

    Ebeling, H.; Edge, A. C.; Bohringer, H.; Allen, S. W.; Crawford, C. S.; Fabian, A. C.; Voges, W.; Huchra, J. P.

    1998-12-01

    We present a 90 per cent flux-complete sample of the 201 X-ray-brightest clusters of galaxies in the northern hemisphere (delta>=0 deg), at high Galactic latitudes (|b|>=20 deg), with measured redshifts z<=0.3 and fluxes higher than 4.4x10^-12 erg cm^-2 s^-1 in the 0.1-2.4 keV band. The sample, called the ROSAT Brightest Cluster Sample (BCS), is selected from ROSAT All-Sky Survey data and is the largest X-ray-selected cluster sample compiled to date. In addition to Abell clusters, which form the bulk of the sample, the BCS also contains the X-ray-brightest Zwicky clusters and other clusters selected from their X-ray properties alone. Effort has been made to ensure the highest possible completeness of the sample and the smallest possible contamination by non-cluster X-ray sources. X-ray fluxes are computed using an algorithm tailored for the detection and characterization of X-ray emission from galaxy clusters. These fluxes are accurate to better than 15 per cent (mean 1sigma error). We find the cumulative logN-logS distribution of clusters to follow a power law kappa S^alpha with alpha=1.31^+0.06_-0.03 (errors are the 10th and 90th percentiles) down to fluxes of 2x10^-12 erg cm^-2 s^-1, i.e. considerably below the BCS flux limit. Although our best-fitting slope disagrees formally with the canonical value of -1.5 for a Euclidean distribution, the BCS logN-logS distribution is consistent with a non-evolving cluster population if cosmological effects are taken into account. Our sample will allow us to examine large-scale structure in the northern hemisphere, determine the spatial cluster-cluster correlation function, investigate correlations between the X-ray and optical properties of the clusters, establish the X-ray luminosity function for galaxy clusters, and discuss the implications of the results for cluster evolution.

  3. A Radio-Map Automatic Construction Algorithm Based on Crowdsourcing

    PubMed Central

    Yu, Ning; Xiao, Chenxian; Wu, Yinfeng; Feng, Renjian

    2016-01-01

    Traditional radio-map-based localization methods need to sample a large number of location fingerprints offline, which requires huge amount of human and material resources. To solve the high sampling cost problem, an automatic radio-map construction algorithm based on crowdsourcing is proposed. The algorithm employs the crowd-sourced information provided by a large number of users when they are walking in the buildings as the source of location fingerprint data. Through the variation characteristics of users’ smartphone sensors, the indoor anchors (doors) are identified and their locations are regarded as reference positions of the whole radio-map. The AP-Cluster method is used to cluster the crowdsourced fingerprints to acquire the representative fingerprints. According to the reference positions and the similarity between fingerprints, the representative fingerprints are linked to their corresponding physical locations and the radio-map is generated. Experimental results demonstrate that the proposed algorithm reduces the cost of fingerprint sampling and radio-map construction and guarantees the localization accuracy. The proposed method does not require users’ explicit participation, which effectively solves the resource-consumption problem when a location fingerprint database is established. PMID:27070623

  4. Sampling methods for stellar masses and the mmax-Mecl relation in the starburst dwarf galaxy NGC 4214

    NASA Astrophysics Data System (ADS)

    Weidner, Carsten; Kroupa, Pavel; Pflamm-Altenburg, Jan

    2014-07-01

    It has been claimed in the recent literature that a non-trivial relation between the mass of the most-massive star, mmax, in a star cluster and its embedded star cluster mass (the mmax - Mecl relation) is falsified by observations of the most-massive stars and the Hα luminosity of young star clusters in the starburst dwarf galaxy NGC 4214. Here, it is shown by comparing the NGC 4214 results with observations from the Milky Way that NGC 4214 agrees very well with the predictions of the mmax - Mecl relation and with the integrated galactic stellar initial mass function theory. The difference in conclusions is based on a high degree of degeneracy between expectations from random sampling and those from the mmax - Mecl relation, but are also due to interpreting mmax as a truncation mass in a randomly sampled initial mass function. Additional analysis of galaxies with lower SFRs than those currently presented in the literature will be required to break this degeneracy.

  5. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    PubMed

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Photoinduced nucleation: a novel tool for detecting molecules in air at ultra-low concentrations

    DOEpatents

    Katz, Joseph L.; Lihavainen, Heikki; Rudek, Markus M.; Salter, Brian C.

    2002-01-01

    A method and apparatus for determining the presence of molecules in a gas at concentrations of less than about 100 ppb. Light having wavelengths in the range from about 200 nm to about 350 nm is used to illuminate a flowing sample of the gas causing the molecules if present to form clusters. A mixture of the illuminated gas and a vapor is cooled until the vapor is supersaturated so that there is a small rate of homogeneous nucleation. The supersaturated vapor condenses on the clusters thus causing the clusters to grow to a size sufficient to be counted by light scattering and then the clusters are counted.

  7. Area and Family Effects on the Psychopathology of the Millennium Cohort Study Children and Their Older Siblings

    ERIC Educational Resources Information Center

    Flouri, Eirini; Tzavidis, Nikos; Kallis, Constantinos

    2010-01-01

    Background: To model and compare contextual (area and family) effects on the psychopathology of children nested in families nested in areas. Method: Data from the first two sweeps of the UK's Millennium Cohort Study were used. The final study sample was 9,630 children clustered in 6,052 families clustered in 1,681 Lower-layer Super Output Areas.…

  8. Changes to Serum Sample Tube and Processing Methodology Does Not Cause Inter-Individual Variation in Automated Whole Serum N-Glycan Profiling in Health and Disease

    PubMed Central

    Shubhakar, Archana; Kalla, Rahul; Nimmo, Elaine R.; Fernandes, Daryl L.; Satsangi, Jack; Spencer, Daniel I. R.

    2015-01-01

    Introduction Serum N-glycans have been identified as putative biomarkers for numerous diseases. The impact of different serum sample tubes and processing methods on N-glycan analysis has received relatively little attention. This study aimed to determine the effect of different sample tubes and processing methods on the whole serum N-glycan profile in both health and disease. A secondary objective was to describe a robot automated N-glycan release, labeling and cleanup process for use in a biomarker discovery system. Methods 25 patients with active and quiescent inflammatory bowel disease and controls had three different serum sample tubes taken at the same draw. Two different processing methods were used for three types of tube (with and without gel-separation medium). Samples were randomised and processed in a blinded fashion. Whole serum N-glycan release, 2-aminobenzamide labeling and cleanup was automated using a Hamilton Microlab STARlet Liquid Handling robot. Samples were analysed using a hydrophilic interaction liquid chromatography/ethylene bridged hybrid(BEH) column on an ultra-high performance liquid chromatography instrument. Data were analysed quantitatively by pairwise correlation and hierarchical clustering using the area under each chromatogram peak. Qualitatively, a blinded assessor attempted to match chromatograms to each individual. Results There was small intra-individual variation in serum N-glycan profiles from samples collected using different sample processing methods. Intra-individual correlation coefficients were between 0.99 and 1. Unsupervised hierarchical clustering and principal coordinate analyses accurately matched samples from the same individual. Qualitative analysis demonstrated good chromatogram overlay and a blinded assessor was able to accurately match individuals based on chromatogram profile, regardless of disease status. Conclusions The three different serum sample tubes processed using the described methods cause minimal inter-individual variation in serum whole N-glycan profile when processed using an automated workstream. This has important implications for N-glycan biomarker discovery studies using different serum processing standard operating procedures. PMID:25831126

  9. Culture-independent discovery of natural products from soil metagenomes.

    PubMed

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  10. Effect of DNA extraction and sample preservation method on rumen bacterial population.

    PubMed

    Fliegerova, Katerina; Tapio, Ilma; Bonin, Aurelie; Mrazek, Jakub; Callegari, Maria Luisa; Bani, Paolo; Bayat, Alireza; Vilkki, Johanna; Kopečný, Jan; Shingfield, Kevin J; Boyer, Frederic; Coissac, Eric; Taberlet, Pierre; Wallace, R John

    2014-10-01

    The comparison of the bacterial profile of intracellular (iDNA) and extracellular DNA (eDNA) isolated from cow rumen content stored under different conditions was conducted. The influence of rumen fluid treatment (cheesecloth squeezed, centrifuged, filtered), storage temperature (RT, -80 °C) and cryoprotectants (PBS-glycerol, ethanol) on quality and quantity parameters of extracted DNA was evaluated by bacterial DGGE analysis, real-time PCR quantification and metabarcoding approach using high-throughput sequencing. Samples clustered according to the type of extracted DNA due to considerable differences between iDNA and eDNA bacterial profiles, while storage temperature and cryoprotectants additives had little effect on sample clustering. The numbers of Firmicutes and Bacteroidetes were lower (P < 0.01) in eDNA samples. The qPCR indicated significantly higher amount of Firmicutes in iDNA sample frozen with glycerol (P < 0.01). Deep sequencing analysis of iDNA samples revealed the prevalence of Bacteroidetes and similarity of samples frozen with and without cryoprotectants, which differed from sample stored with ethanol at room temperature. Centrifugation and consequent filtration of rumen fluid subjected to the eDNA isolation procedure considerably changed the ratio of molecular operational taxonomic units (MOTUs) of Bacteroidetes and Firmicutes. Intracellular DNA extraction using bead-beating method from cheesecloth sieved rumen content mixed with PBS-glycerol and stored at -80 °C was found as the optimal method to study ruminal bacterial profile. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters

    NASA Astrophysics Data System (ADS)

    Bayliss, Matthew

    2017-08-01

    We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics |*| the infamous |*|gastrophysics|*| in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.

  12. The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters

    NASA Astrophysics Data System (ADS)

    Bayliss, Matthew

    2017-09-01

    We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics -- the infamous ``gastrophysics''-- in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.

  13. Integrated spectral properties of 22 small angular diameter galactic open clusters

    NASA Astrophysics Data System (ADS)

    Ahumada, A. V.; Clariá, J. J.; Bica, E.

    2007-10-01

    Aims:Flux-calibrated integrated spectra of a sample of 22 Galactic open clusters of small angular diameter are presented. With one exception (ESO 429-SC2), all objects have Galactic longitudes in the range 208° < l < 33°. The spectra cover the range ≈3600-6800 Å, with a resolution of ≈14 Å. The properties of the present cluster sample are compared with those of well-studied clusters located in two 90° sectors, centred at l = 257° and l = 347°. The dissolution rate of Galactic open clusters in these two sectors is examined. Methods: Using the equivalent widths of the Balmer lines and comparing line intensities and continuum distribution of the cluster spectra with those of template cluster spectra with known properties, we derive both foreground reddening values and ages. Thus, we provide information independent of that determined through colour-magnitude diagrams. Results: The derived E(B-V) values for the whole sample vary from 0.0 in ESO 445-SC74 to 1.90 in Pismis 24, while the ages range from ~3 Myr (NGC 6604 and BH 151) to ~3.5 Gyr (Ruprecht 2). For six clusters (Dolidze 34, ESO 429-SC2, ESO 445-SC74, Ruprecht 2, BH 151 and Hogg 9) the foreground E(B-V) colour excesses and ages are determined for the first time. The results obtained for the remaining clusters show, in general terms, good agreement with previous photometric results. Conclusions: The age and reddening distributions of the present sample match those of known clusters in the two selected Galactic sectors. The present results would favour a major dissolution rate of star clusters in these two sectors. Two new solar-metallicity templates are defined corresponding to the age groups of (4-5) Myr and 30 Myr among those of Piatti et al. (2002, MNRAS, 335, 233). The Piatti et al. templates of 20 Myr and (3-4) Gyr are here redefined. Based on observations made at Complejo Astronómico El Leoncito, which is operated under agreement between the Consejo Nacional de Investigaciones Científicas y Técnicas de la República Argentina (CONICET) and the National Universities of La Plata, Córdoba and San Juan, Argentina. Tables [see full text]- [see full text] and Appendix are only available in electronic form at http://www.aanda.org

  14. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  15. SomInaClust: detection of cancer genes based on somatic mutation patterns of inactivation and clustering.

    PubMed

    Van den Eynden, Jimmy; Fierro, Ana Carolina; Verbeke, Lieven P C; Marchal, Kathleen

    2015-04-23

    With the advances in high throughput technologies, increasing amounts of cancer somatic mutation data are being generated and made available. Only a small number of (driver) mutations occur in driver genes and are responsible for carcinogenesis, while the majority of (passenger) mutations do not influence tumour biology. In this study, SomInaClust is introduced, a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively. SomInaClust starts from the observation that oncogenes mainly contain mutations that, due to positive selection, cluster at similar positions in a gene across patient samples, whereas tumour suppressor genes contain a high number of protein-truncating mutations throughout the entire gene length. The method was shown to prioritize driver genes in 9 different solid cancers. Furthermore it was found to be complementary to existing similar-purpose methods with the additional advantages that it has a higher sensitivity, also for rare mutations (occurring in less than 1% of all samples), and it accurately classifies candidate driver genes in putative oncogenes and tumour suppressor genes. Pathway enrichment analysis showed that the identified genes belong to known cancer signalling pathways, and that the distinction between oncogenes and tumour suppressor genes is biologically relevant. SomInaClust was shown to detect candidate driver genes based on somatic mutation patterns of inactivation and clustering and to distinguish oncogenes from tumour suppressor genes. The method could be used for the identification of new cancer genes or to filter mutation data for further data-integration purposes.

  16. Epidemiological study of bovine brucellosis in three agro-ecological areas of central Oromiya, Ethiopia.

    PubMed

    Jergefa, T; Kelay, B; Bekana, M; Teshale, S; Gustafson, H; Kindahl, H

    2009-12-01

    A cross-sectional sero-epidemiological study of bovine brucellosis was conducted between September 2005 and March 2006 in three separate agroecological areas of central Oromiya, Ethiopia. In this study, a total of 176 clusters (farms) and 1,238 animals were selected, using the one-stage cluster sampling method. Fifty-nine clusters and 423 animals were selected from the lowland areas; 58 clusters and 385 animals from the midland areas and 59 clusters and 430 animals from the highlands. Serum samples were collected from a total of 1,238 animals older than six months. The rose bengal plate test and complement fixation test were used as screening and confirmatory tests, respectively, to detect Brucella seropositivity. Questionnaires were also administered to 176 households to gather information on the farm and livestock. Results showed that the overall seroprevalence of bovine brucellosis at the individual animal level was 2.9% (low). The seroprevalence was 4.2% in the lowlands, 1.0% in the midlands and 3.4% in the highlands. The overall seroprevalence at the herd level was 13.6% (moderate). At the herd level, seroprevalence in the lowlands was 17%; in the midlands: 5.1%; and in highland areas: 18.6%. Logistic regression analysis, revealed that the breed of cattle and the method of disposing of aborted foetuses and foetal membranes had a statistically significant effect on individual animal seroprevalence (p < 0.05). In lowland areas, the breed (p < 0.05), animal management system (p <0.05), mating method (p < 0.05), herd size (p < 0.05) and source of replacement stock (p <0.05) all had significant effects on individual animal seroprevalence.

  17. Insights into quasar UV spectra using unsupervised clustering analysis

    NASA Astrophysics Data System (ADS)

    Tammour, A.; Gallagher, S. C.; Daley, M.; Richards, G. T.

    2016-06-01

    Machine learning techniques can provide powerful tools to detect patterns in multidimensional parameter space. We use K-means - a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabelled data - to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey (SDSS-DR10) of Paris et al. Detecting patterns in large data sets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 Å line, the C IV 1549 Å line, and the C III] 1908 Å blend in samples of broad absorption line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing spectra energy distribution (SED) probed by the strength of He II and the Si III]/C III] ratio. We find this to be particularly evident when the properties of C III] are used to find the clusters, while those of Mg II proved to be less strongly correlated with the properties of the other lines in the spectra such as the width of C IV or the Si III]/C III] ratio. We conclude that unsupervised clustering methods (such as K-means) are powerful methods for finding `natural' binning boundaries in multidimensional data sets and discuss caveats and future work.

  18. The Mass Function of Abell Clusters

    NASA Astrophysics Data System (ADS)

    Chen, J.; Huchra, J. P.; McNamara, B. R.; Mader, J.

    1998-12-01

    The velocity dispersion and mass functions for rich clusters of galaxies provide important constraints on models of the formation of Large-Scale Structure (e.g., Frenk et al. 1990). However, prior estimates of the velocity dispersion or mass function for galaxy clusters have been based on either very small samples of clusters (Bahcall and Cen 1993; Zabludoff et al. 1994) or large but incomplete samples (e.g., the Girardi et al. (1998) determination from a sample of clusters with more than 30 measured galaxy redshifts). In contrast, we approach the problem by constructing a volume-limited sample of Abell clusters. We collected individual galaxy redshifts for our sample from two major galaxy velocity databases, the NASA Extragalactic Database, NED, maintained at IPAC, and ZCAT, maintained at SAO. We assembled a database with velocity information for possible cluster members and then selected cluster members based on both spatial and velocity data. Cluster velocity dispersions and masses were calculated following the procedures of Danese, De Zotti, and di Tullio (1980) and Heisler, Tremaine, and Bahcall (1985), respectively. The final velocity dispersion and mass functions were analyzed in order to constrain cosmological parameters by comparison to the results of N-body simulations. Our data for the cluster sample as a whole and for the individual clusters (spatial maps and velocity histograms) in our sample is available on-line at http://cfa-www.harvard.edu/ huchra/clusters. This website will be updated as more data becomes available in the master redshift compilations, and will be expanded to include more clusters and large groups of galaxies.

  19. Sparsity-weighted outlier FLOODing (OFLOOD) method: Efficient rare event sampling method using sparsity of distribution.

    PubMed

    Harada, Ryuhei; Nakamura, Tomotake; Shigeta, Yasuteru

    2016-03-30

    As an extension of the Outlier FLOODing (OFLOOD) method [Harada et al., J. Comput. Chem. 2015, 36, 763], the sparsity of the outliers defined by a hierarchical clustering algorithm, FlexDice, was considered to achieve an efficient conformational search as sparsity-weighted "OFLOOD." In OFLOOD, FlexDice detects areas of sparse distribution as outliers. The outliers are regarded as candidates that have high potential to promote conformational transitions and are employed as initial structures for conformational resampling by restarting molecular dynamics simulations. When detecting outliers, FlexDice defines a rank in the hierarchy for each outlier, which relates to sparsity in the distribution. In this study, we define a lower rank (first ranked), a medium rank (second ranked), and the highest rank (third ranked) outliers, respectively. For instance, the first-ranked outliers are located in a given conformational space away from the clusters (highly sparse distribution), whereas those with the third-ranked outliers are nearby the clusters (a moderately sparse distribution). To achieve the conformational search efficiently, resampling from the outliers with a given rank is performed. As demonstrations, this method was applied to several model systems: Alanine dipeptide, Met-enkephalin, Trp-cage, T4 lysozyme, and glutamine binding protein. In each demonstration, the present method successfully reproduced transitions among metastable states. In particular, the first-ranked OFLOOD highly accelerated the exploration of conformational space by expanding the edges. In contrast, the third-ranked OFLOOD reproduced local transitions among neighboring metastable states intensively. For quantitatively evaluations of sampled snapshots, free energy calculations were performed with a combination of umbrella samplings, providing rigorous landscapes of the biomolecules. © 2015 Wiley Periodicals, Inc.

  20. The Swift AGN and Cluster Survey

    NASA Astrophysics Data System (ADS)

    Danae Griffin, Rhiannon; Dai, Xinyu; Kochanek, Christopher S.; Bregman, Joel N.; Nugent, Jenna

    2016-01-01

    The Swift active galactic nucleus (AGN) and Cluster Survey (SACS) uses 125 deg^2 of Swift X-ray Telescope serendipitous fields with variable depths surrounding X-ray bursts to provide a medium depth (4 × 10^-15 erg cm^-2 s^-1) and area survey filling the gap between deep, narrow Chandra/XMM-Newton surveys and wide, shallow ROSAT surveys. Here, we present the first two papers in a series of publications for SACS. In the first paper, we introduce our method and catalog of 22,563 point sources and 442 extended sources. We examine the number counts of the AGN and galaxy cluster populations. SACS provides excellent constraints on the AGN number counts at the bright end with negligible uncertainties due to cosmic variance, and these constraints are consistent with previous measurements. The depth and areal coverage of SACS is well suited for galaxy cluster surveys outside the local universe, reaching z ˜ 1 for massive clusters. In the second paper, we use Sloan Digital Sky Survey (SDSS) DR8 data to study the 203 extended SACS sources that are located within the SDSS footprint. We search for galaxy over-densities in 3-D space using SDSS galaxies and their photometric redshifts near the Swift galaxy cluster candidates. We find 103 Swift clusters with a > 3σ over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmations as galaxy clusters. We present a series of cluster properties including the redshift, BCG magnitude, BCG-to-X-ray center offset, optical richness, X-ray luminosity and red sequences. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≤ 0.3 and 80% complete for z ≤ 0.4, consistent with the survey depth of SDSS. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 2 and 1 matches in optical, X-ray and SZ catalogs, respectively, so the majority of these clusters are new detections. These analysis results suggest that our Swift cluster selection algorithm presented in our first paper has yielded a statistically well-defined cluster sample for further studying cluster evolution and cosmology.

  1. Construcción de un catálogo de cúmulos de galaxias en proceso de colisión

    NASA Astrophysics Data System (ADS)

    de los Ríos, M.; Domínguez, M. J.; Paz, D.

    2015-08-01

    In this work we present first results of the identification of colliding galaxy clusters in galaxy catalogs with redshift measurements (SDSS, 2DF), and introduce the methodology. We calibrated a method by studying the merger trees of clusters in a mock catalog based on a full-blown semi-analytic model of galaxy formation on top of the Millenium cosmological simulation. We also discuss future actions for studding our sample of colliding galaxy clusters, including x-ray observations and mass reconstruction obtained by using weak gravitational lenses.

  2. A Multilevel Testlet Model for Dual Local Dependence

    ERIC Educational Resources Information Center

    Jiao, Hong; Kamata, Akihito; Wang, Shudong; Jin, Ying

    2012-01-01

    The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet-based assessment, both local item dependence and local person dependence are likely to be induced.…

  3. A Systematic Evaluation of ADHD and Comorbid Psychopathology in a Population-Based Twin Sample

    ERIC Educational Resources Information Center

    Volk, Heather E.; Neuman, Rosalind J.; Todd, Richard D.

    2005-01-01

    Objective: Clinical and population samples demonstrate that attention-deficit/hyperactivity disorder (ADHD) occurs with other disorders. Comorbid disorder clustering within ADHD subtypes is not well studied. Method: Latent class analysis (LCA) examined the co-occurrence of DSM-IV ADHD, oppositional defiant disorder (ODD), conduct disorder (CD),…

  4. Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks

    PubMed Central

    Samusik, Nikolay; Wang, Xiaowei; Guan, Leying; Nolan, Garry P.

    2017-01-01

    Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject. PMID:29281633

  5. Evaluation of primary immunization coverage of infants under universal immunization programme in an urban area of bangalore city using cluster sampling and lot quality assurance sampling techniques.

    PubMed

    K, Punith; K, Lalitha; G, Suman; Bs, Pradeep; Kumar K, Jayanth

    2008-07-01

    Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Population-based cross-sectional study. Areas under Mathikere Urban Health Center. Children aged 12 months to 23 months. 220 in cluster sampling, 76 in lot quality assurance sampling. Percentages and Proportions, Chi square Test. (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.

  6. Spectroscopic Confirmation of Two Massive Red-sequence-selected Galaxy Clusters at Z Approximately Equal to 1.2 in the Sparcs-North Cluster Survey

    NASA Technical Reports Server (NTRS)

    Muzzin, Adam; Wilson, Gillian; Yee, H.K.C.; Hoekstra, Henk; Gilbank, David; Surace, Jason; Lacy, Mark; Blindert, Kris; Majumdar, Subhabrata; Demarco, Ricardo; hide

    2008-01-01

    The Spitzer Adaptation of the Red-sequence Cluster Survey (SpARCS) is a deep z -band imaging survey covering the Spitzer SWIRE Legacy fields designed to create the first large homogeneously-selected sample of massive clusters at z > 1 using an infrared adaptation of the cluster red-sequence method. We present an overview of the northern component of the survey which has been observed with CFHT/MegaCam and covers 28.3 deg(sup 2). The southern component of the survey was observed with CTIO/MOSAICII, covers 13.6 deg(sup 2), and is summarized in a companion paper by Wilson et al. (2008). We also present spectroscopic confirmation of two rich cluster candidates at z approx. 1.2. Based on Nod-and- Shuffle spectroscopy from GMOS-N on Gemini there are 17 and 28 confirmed cluster members in SpARCS J163435+402151 and SpARCS J163852+403843 which have spectroscopic redshifts of 1.1798 and 1.1963, respectively. The clusters have velocity dispersions of 490 +/- 140 km/s and 650 +/- 160 km/s, respectively which imply masses (M(sub 200)) of (1.0 +/- 0.9) x 10(exp 14) Stellar Mass and (2.4 +/- 1.8) x 10(exp 14) Stellar Mass. Confirmation of these candidates as bonafide massive clusters demonstrates that two-filter imaging is an effective, yet observationally efficient, method for selecting clusters at z > 1.

  7. Inherent structure versus geometric metric for state space discretization.

    PubMed

    Liu, Hanzhong; Li, Minghai; Fan, Jue; Huo, Shuanghong

    2016-05-30

    Inherent structure (IS) and geometry-based clustering methods are commonly used for analyzing molecular dynamics trajectories. ISs are obtained by minimizing the sampled conformations into local minima on potential/effective energy surface. The conformations that are minimized into the same energy basin belong to one cluster. We investigate the influence of the applications of these two methods of trajectory decomposition on our understanding of the thermodynamics and kinetics of alanine tetrapeptide. We find that at the microcluster level, the IS approach and root-mean-square deviation (RMSD)-based clustering method give totally different results. Depending on the local features of energy landscape, the conformations with close RMSDs can be minimized into different minima, while the conformations with large RMSDs could be minimized into the same basin. However, the relaxation timescales calculated based on the transition matrices built from the microclusters are similar. The discrepancy at the microcluster level leads to different macroclusters. Although the dynamic models established through both clustering methods are validated approximately Markovian, the IS approach seems to give a meaningful state space discretization at the macrocluster level in terms of conformational features and kinetics. © 2016 Wiley Periodicals, Inc.

  8. Measuring Health Information Dissemination and Identifying Target Interest Communities on Twitter: Methods Development and Case Study of the @SafetyMD Network.

    PubMed

    Kandadai, Venk; Yang, Haodong; Jiang, Ling; Yang, Christopher C; Fleisher, Linda; Winston, Flaura Koplin

    2016-05-05

    Little is known about the ability of individual stakeholder groups to achieve health information dissemination goals through Twitter. This study aimed to develop and apply methods for the systematic evaluation and optimization of health information dissemination by stakeholders through Twitter. Tweet content from 1790 followers of @SafetyMD (July-November 2012) was examined. User emphasis, a new indicator of Twitter information dissemination, was defined and applied to retweets across two levels of retweeters originating from @SafetyMD. User interest clusters were identified based on principal component analysis (PCA) and hierarchical cluster analysis (HCA) of a random sample of 170 followers. User emphasis of keywords remained across levels but decreased by 9.5 percentage points. PCA and HCA identified 12 statistically unique clusters of followers within the @SafetyMD Twitter network. This study is one of the first to develop methods for use by stakeholders to evaluate and optimize their use of Twitter to disseminate health information. Our new methods provide preliminary evidence that individual stakeholders can evaluate the effectiveness of health information dissemination and create content-specific clusters for more specific targeted messaging.

  9. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.

    PubMed

    Sander, Ulrich; Lubbe, Nils

    2018-04-01

    Intersection accidents are frequent and harmful. The accident types 'straight crossing path' (SCP), 'left turn across path - oncoming direction' (LTAP/OD), and 'left-turn across path - lateral direction' (LTAP/LD) represent around 95% of all intersection accidents and one-third of all police-reported car-to-car accidents in Germany. The European New Car Assessment Program (Euro NCAP) have announced that intersection scenarios will be included in their rating from 2020; however, how these scenarios are to be tested has not been defined. This study investigates whether clustering methods can be used to identify a small number of test scenarios sufficiently representative of the accident dataset to evaluate Intersection Automated Emergency Braking (AEB). Data from the German In-Depth Accident Study (GIDAS) and the GIDAS-based Pre-Crash Matrix (PCM) from 1999 to 2016, containing 784 SCP and 453 LTAP/OD accidents, were analyzed with principal component methods to identify variables that account for the relevant total variances of the sample. Three different methods for data clustering were applied to each of the accident types, two similarity-based approaches, namely Hierarchical Clustering (HC) and Partitioning Around Medoids (PAM), and the probability-based Latent Class Clustering (LCC). The optimum number of clusters was derived for HC and PAM with the silhouette method. The PAM algorithm was both initiated with random start medoid selection and medoids from HC. For LCC, the Bayesian Information Criterion (BIC) was used to determine the optimal number of clusters. Test scenarios were defined from optimal cluster medoids weighted by their real-life representation in GIDAS. The set of variables for clustering was further varied to investigate the influence of variable type and character. We quantified how accurately each cluster variation represents real-life AEB performance using pre-crash simulations with PCM data and a generic algorithm for AEB intervention. The usage of different sets of clustering variables resulted in substantially different numbers of clusters. The stability of the resulting clusters increased with prioritization of categorical over continuous variables. For each different set of cluster variables, a strong in-cluster variance of avoided versus non-avoided accidents for the specified Intersection AEB was present. The medoids did not predict the most common Intersection AEB behavior in each cluster. Despite thorough analysis using various cluster methods and variable sets, it was impossible to reduce the diversity of intersection accidents into a set of test scenarios without compromising the ability to predict real-life performance of Intersection AEB. Although this does not imply that other methods cannot succeed, it was observed that small changes in the definition of a scenario resulted in a different avoidance outcome. Therefore, we suggest using limited physical testing to validate more extensive virtual simulations to evaluate vehicle safety. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Effects of High School Students' Perceptions of School Life Quality on Their Academic Motivation Levels

    ERIC Educational Resources Information Center

    Akin Kösterelioglu, Meltem; Kösterelioglu, Ilker

    2015-01-01

    This study aims to identify the effects of high school students' perceptions of school life quality on their academic motivation levels. The study was conducted on a sample of high school students (n = 2371) in Amasya Province in the fall semester of 2013-2014 academic year. Study sample was selected with the help of cluster sampling method. Data…

  11. In-Service Turkish Elementary and Science Teachers' Attitudes toward Science and Science Teaching: A Sample from Usak Province

    ERIC Educational Resources Information Center

    Turkmen, Lutfullah

    2013-01-01

    The purpose of this study is to reveal Turkish elementary teachers' and science teachers' attitudes toward science and science teaching. The sample of the study, 138 in-service elementary level science teachers from a province of Turkey, was selected by a clustered sampling method. The Science Teaching Attitude Scale-II was employed to measure the…

  12. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

    PubMed

    Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu

    2018-04-17

    Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and the ClusterR package. For cell population identification, DAFi supports multiple options including clustering, bisecting, slope-based gating, and reversed filtering to meet various autogating needs from different scientific use cases. © 2018 International Society for Advancement of Cytometry. © 2018 International Society for Advancement of Cytometry.

  13. Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

    PubMed Central

    Liu, Wenfen

    2017-01-01

    Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447

  14. Somatotyping using 3D anthropometry: a cluster analysis.

    PubMed

    Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

    2013-01-01

    Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.

  15. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    NASA Astrophysics Data System (ADS)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  16. HST observations of globular clusters in M 31. 1: Surface photometry of 13 objects

    NASA Technical Reports Server (NTRS)

    Pecci, F. Fusi; Battistini, P.; Bendinelli, O.; Bonoli, F.; Cacciari, C.; Djorgovski, S.; Federici, L.; Ferraro, F. R.; Parmeggiani, G.; Weir, N.

    1994-01-01

    We present the initial results of a study of globular clusters in M 31, using the Faint Object Camera (FOC) on the Hubble Space Telescope (HST). The sample of objects consists of 13 clusters spanning a range of properties. Three independent image deconvolution techniques were used in order to compensate for the optical problems of the HST, leading to mutually fully consistent results. We present detailed tests and comparisons to determine the reliability and limits of these deconvolution methods, and conclude that high-quality surface photometry of M 31 globulars is possible with the HST data. Surface brightness profiles have been extracted, and core radii, half-light radii, and central surface brightness values have been measured for all of the clusters in the sample. Their comparison with the values from ground-based observations indicates the later to be systematically and strongly biased by the seeing effects, as it may be expected. A comparison of the structural parameters with those of the Galactic globulars shows that the structural properties of the M 31 globulars are very similar to those of their Galactic counterparts. A candidate for a post-core-collapse cluster, Bo 343 = G 105, has been already identified from these data; this is the first such detection in the M 31 globular cluster system.

  17. Fitness and Adiposity Are Independently Associated with Cardiometabolic Risk in Youth

    PubMed Central

    Buchan, Duncan S.; Young, John D.; Boddy, Lynne M.; Malina, Robert M.; Baker, Julien S.

    2013-01-01

    Purpose. The purpose of the study was to examine the independent associations of adiposity and cardiorespiratory fitness with clustered cardiometabolic risk. Methods. A cross-sectional sample of 192 adolescents (118 boys), aged 14–16 years, was recruited from a South Lanarkshire school in the West of Scotland. Anthropometry and blood pressure were measured, and blood samples were taken. The 20 m multistage fitness test was the indicator of cardiorespiratory fitness (CRF). A clustered cardiometabolic risk score was constructed from HDL-C (inverted), LDL-C, HOMA, systolic blood pressure, and triglycerides. Interleukin-6, C-reactive protein, and adiponectin were also measured and examined relative to the clustered cardiometabolic risk score, CRF, and adiposity. Results. Although significant, partial correlations between BMI and waist circumference (WC) and both CRF and adiponectin were negative and weak to moderate, while correlations between the BMI and WC and CRP were positive but weak to moderate. Weak to moderate negative associations were also evident for adiponectin with CRP, IL-6, and clustered cardiometabolic risk. WC was positively associated while CRF was negatively associated with clustered cardiometabolic risk. With the additional adjustment for either WC or CRF, the independent associations with cardiometabolic risk persisted. Conclusion. WC and CRF are independently associated with clustered cardiometabolic risk in Scottish adolescents. PMID:23984329

  18. Pattern analysis of schistosomiasis prevalence by exploring predictive modeling in Jiangling County, Hubei Province, P.R. China.

    PubMed

    Xia, Shang; Xue, Jing-Bo; Zhang, Xia; Hu, He-Hua; Abe, Eniola Michael; Rollinson, David; Bergquist, Robert; Zhou, Yibiao; Li, Shi-Zhu; Zhou, Xiao-Nong

    2017-04-26

    The prevalence of schistosomiasis remains a key public health issue in China. Jiangling County in Hubei Province is a typical lake and marshland endemic area. The pattern analysis of schistosomiasis prevalence in Jiangling County is of significant importance for promoting schistosomiasis surveillance and control in the similar endemic areas. The dataset was constructed based on the annual schistosomiasis surveillance as well the socio-economic data in Jiangling County covering the years from 2009 to 2013. A village clustering method modified from the K-mean algorithm was used to identify different types of endemic villages. For these identified village clusters, a matrix-based predictive model was developed by means of exploring the one-step backward temporal correlation inference algorithm aiming to estimate the predicative correlations of schistosomiasis prevalence among different years. Field sampling of faeces from domestic animals, as an indicator of potential schistosomiasis prevalence, was carried out and the results were used to validate the results of proposed models and methods. The prevalence of schistosomiasis in Jiangling County declined year by year. The total of 198 endemic villages in Jiangling County can be divided into four clusters with reference to the 5 years' occurrences of schistosomiasis in human, cattle and snail populations. For each identified village cluster, a predictive matrix was generated to characterize the relationships of schistosomiasis prevalence with the historic infection level as well as their associated impact factors. Furthermore, the results of sampling faeces from the front field agreed with the results of the identified clusters of endemic villages. The results of village clusters and the predictive matrix can be regard as the basis to conduct targeted measures for schistosomiasis surveillance and control. Furthermore, the proposed models and methods can be modified to investigate the schistosomiasis prevalence in other regions as well as be used for investigating other parasitic diseases.

  19. The Atacama Cosmology Telescope (ACT): Beam Profiles and First SZ Cluster Maps

    NASA Technical Reports Server (NTRS)

    Hincks, A. D.; Acquaviva, V.; Ade, P. A.; Aguirre, P.; Amiri, M.; Appel, J. W.; Barrientos, L. F.; Battistelli, E. S.; Bond, J. R.; Brown, B.; hide

    2010-01-01

    The Atacama Cosmology Telescope (ACT) is currently observing the cosmic microwave background with arcminute resolution at 148 GHz, 218 GHz, and 277 GHz, In this paper, we present ACT's first results. Data have been analyzed using a maximum-likelihood map-making method which uses B-splines to model and remove the atmospheric signal. It has been used to make high-precision beam maps from which we determine the experiment's window functions, This beam information directly impacts all subsequent analyses of the data. We also used the method to map a sample of galaxy clusters via the Sunyaev-Ze1'dovich (SZ) effect, and show five clusters previously detected with X-ray or SZ observations, We provide integrated Compton-y measurements for each cluster. Of particular interest is our detection of the z = 0.44 component of A3128 and our current non-detection of the low-redshift part, providing strong evidence that the further cluster is more massive as suggested by X-ray measurements. This is a compelling example of the redshift-independent mass selection of the SZ effect.

  20. Stochastic coupled cluster theory: Efficient sampling of the coupled cluster expansion

    NASA Astrophysics Data System (ADS)

    Scott, Charles J. C.; Thom, Alex J. W.

    2017-09-01

    We consider the sampling of the coupled cluster expansion within stochastic coupled cluster theory. Observing the limitations of previous approaches due to the inherently non-linear behavior of a coupled cluster wavefunction representation, we propose new approaches based on an intuitive, well-defined condition for sampling weights and on sampling the expansion in cluster operators of different excitation levels. We term these modifications even and truncated selections, respectively. Utilising both approaches demonstrates dramatically improved calculation stability as well as reduced computational and memory costs. These modifications are particularly effective at higher truncation levels owing to the large number of terms within the cluster expansion that can be neglected, as demonstrated by the reduction of the number of terms to be sampled when truncating at triple excitations by 77% and hextuple excitations by 98%.

  1. Visualizing Time-Varying Distribution Data in EOS Application

    NASA Technical Reports Server (NTRS)

    Shen, Han-Wei

    2004-01-01

    In this research, we have developed several novel visualization methods for spatial probability density function data. Our focus has been on 2D spatial datasets, where each pixel is a random variable, and has multiple samples which are the results of experiments on that random variable. We developed novel clustering algorithms as a means to reduce the information contained in these datasets; and investigated different ways of interpreting and clustering the data.

  2. Combination of multivariate curve resolution and multivariate classification techniques for comprehensive high-performance liquid chromatography-diode array absorbance detection fingerprints analysis of Salvia reuterana extracts.

    PubMed

    Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad

    2014-01-24

    In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Co-occurring substance-related and behavioral addiction problems: A person-centered, lay epidemiology approach.

    PubMed

    Konkolÿ Thege, Barna; Hodgins, David C; Wild, T Cameron

    2016-12-01

    Background and aims The aims of this study were (a) to describe the prevalence of single versus multiple addiction problems in a large representative sample and (b) to identify distinct subgroups of people experiencing substance-related and behavioral addiction problems. Methods A random sample of 6,000 respondents from Alberta, Canada, completed survey items assessing self-attributed problems experienced in the past year with four substances (alcohol, tobacco, marijuana, and cocaine) and six behaviors (gambling, eating, shopping, sex, video gaming, and work). Hierarchical cluster analyses were used to classify patterns of co-occurring addiction problems on an analytic subsample of 2,728 respondents (1,696 women and 1032 men; M age  = 45.1 years, SD age  = 13.5 years) who reported problems with one or more of the addictive behaviors in the previous year. Results In the total sample, 49.2% of the respondents reported zero, 29.8% reported one, 13.1% reported two, and 7.9% reported three or more addiction problems in the previous year. Cluster-analytic results suggested a 7-group solution. Members of most clusters were characterized by multiple addiction problems; the average number of past year addictive behaviors in cluster members ranged between 1 (Cluster II: excessive eating only) and 2.5 (Cluster VII: excessive video game playing with the frequent co-occurrence of smoking, excessive eating and work). Discussion and conclusions Our findings replicate previous results indicating that about half of the adult population struggles with at least one excessive behavior in a given year; however, our analyses revealed a higher number of co-occurring addiction clusters than typically found in previous studies.

  4. Co-occurring substance-related and behavioral addiction problems: A person-centered, lay epidemiology approach

    PubMed Central

    Konkolÿ Thege, Barna; Hodgins, David C.; Wild, T. Cameron

    2016-01-01

    Background and aims The aims of this study were (a) to describe the prevalence of single versus multiple addiction problems in a large representative sample and (b) to identify distinct subgroups of people experiencing substance-related and behavioral addiction problems. Methods A random sample of 6,000 respondents from Alberta, Canada, completed survey items assessing self-attributed problems experienced in the past year with four substances (alcohol, tobacco, marijuana, and cocaine) and six behaviors (gambling, eating, shopping, sex, video gaming, and work). Hierarchical cluster analyses were used to classify patterns of co-occurring addiction problems on an analytic subsample of 2,728 respondents (1,696 women and 1032 men; Mage = 45.1 years, SDage = 13.5 years) who reported problems with one or more of the addictive behaviors in the previous year. Results In the total sample, 49.2% of the respondents reported zero, 29.8% reported one, 13.1% reported two, and 7.9% reported three or more addiction problems in the previous year. Cluster-analytic results suggested a 7-group solution. Members of most clusters were characterized by multiple addiction problems; the average number of past year addictive behaviors in cluster members ranged between 1 (Cluster II: excessive eating only) and 2.5 (Cluster VII: excessive video game playing with the frequent co-occurrence of smoking, excessive eating and work). Discussion and conclusions Our findings replicate previous results indicating that about half of the adult population struggles with at least one excessive behavior in a given year; however, our analyses revealed a higher number of co-occurring addiction clusters than typically found in previous studies. PMID:27829288

  5. Occurrence of Radio Minihalos in a Mass-limited Sample of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giacintucci, Simona; Clarke, Tracy E.; Markevitch, Maxim

    2017-06-01

    We investigate the occurrence of radio minihalos—diffuse radio sources of unknown origin observed in the cores of some galaxy clusters—in a statistical sample of 58 clusters drawn from the Planck Sunyaev–Zel’dovich cluster catalog using a mass cut ( M {sub 500} > 6 × 10{sup 14} M {sub ⊙}). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present.more » Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores—at least 12 out of 15 (80%)—in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or “warm cores.” These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.« less

  6. Intracluster medium cooling, AGN feedback, and brightest cluster galaxy properties of galaxy groups. Five properties where groups differ from clusters

    NASA Astrophysics Data System (ADS)

    Bharadwaj, V.; Reiprich, T. H.; Schellenberger, G.; Eckmiller, H. J.; Mittal, R.; Israel, H.

    2014-12-01

    Aims: We aim to investigate cool-core and non-cool-core properties of galaxy groups through X-ray data and compare them to the AGN radio output to understand the network of intracluster medium (ICM) cooling and feedback by supermassive black holes. We also aim to investigate the brightest cluster galaxies (BCGs) to see how they are affected by cooling and heating processes, and compare the properties of groups to those of clusters. Methods: Using Chandra data for a sample of 26 galaxy groups, we constrained the central cooling times (CCTs) of the ICM and classified the groups as strong cool-core (SCC), weak cool-core (WCC), and non-cool-core (NCC) based on their CCTs. The total radio luminosity of the BCG was obtained using radio catalogue data and/or literature, which in turn was compared to the cooling time of the ICM to understand the link between gas cooling and radio output. We determined K-band luminosities of the BCG with 2MASS data, and used a scaling relation to constrain the masses of the supermassive black holes, which were then compared to the radio output. We also tested for correlations between the BCG luminosity and the overall X-ray luminosity and mass of the group. The results obtained for the group sample were also compared to previous results for clusters. Results: The observed cool-core/non-cool-core fractions for groups are comparable to those of clusters. However, notable differences are seen: 1) for clusters, all SCCs have a central temperature drop, but for groups this is not the case as some have centrally rising temperature profiles despite very short cooling times; 2) while for the cluster sample, all SCC clusters have a central radio source as opposed to only 45% of the NCCs, for the group sample, all NCC groups have a central radio source as opposed to 77% of the SCC groups; 3) for clusters, there are indications of an anticorrelation trend between radio luminosity and CCT. However, for groups this trend is absent; 4) the indication of a trend of radio luminosity with black hole mass observed in SCC clusters is absent for groups; and 5) similarly, the strong correlation observed between the BCG luminosity and the cluster X-ray luminosity/cluster mass weakens significantly for groups. Conclusions: We conclude that there are important differences between clusters and groups within the ICM cooling/AGN feedback paradigm and speculate that more gas is fueling star formation in groups than in clusters where much of the gas is thought to feed the central AGN. Table 6 and Appendices A-C are available in electronic form at http://www.aanda.org

  7. Effects of sampling strategy, detection probability, and independence of counts on the use of point counts

    USGS Publications Warehouse

    Pendleton, G.W.; Ralph, C. John; Sauer, John R.; Droege, Sam

    1995-01-01

    Many factors affect the use of point counts for monitoring bird populations, including sampling strategies, variation in detection rates, and independence of sample points. The most commonly used sampling plans are stratified sampling, cluster sampling, and systematic sampling. Each of these might be most useful for different objectives or field situations. Variation in detection probabilities and lack of independence among sample points can bias estimates and measures of precision. All of these factors should be con-sidered when using point count methods.

  8. Estimation of rank correlation for clustered data.

    PubMed

    Rosner, Bernard; Glynn, Robert J

    2017-06-30

    It is well known that the sample correlation coefficient (R xy ) is the maximum likelihood estimator of the Pearson correlation (ρ xy ) for independent and identically distributed (i.i.d.) bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the maximum likelihood estimator of ρ xy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U_ of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (i) converting ranks of both X and Y to the probit scale, (ii) estimating the Pearson correlation between probit scores for X and Y, and (iii) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Species collapse via hybridization in Darwin's tree finches.

    PubMed

    Kleindorfer, Sonia; O'Connor, Jody A; Dudaniec, Rachael Y; Myers, Steven A; Robertson, Jeremy; Sulloway, Frank J

    2014-03-01

    Species hybridization can lead to fitness costs, species collapse, and novel evolutionary trajectories in changing environments. Hybridization is predicted to be more common when environmental conditions change rapidly. Here, we test patterns of hybridization in three sympatric tree finch species (small tree finch Camarhynchus parvulus, medium tree finch Camarhynchus pauper, and large tree finch: Camarhynchus psittacula) that are currently recognized on Floreana Island, Galápagos Archipelago. Genetic analysis of microsatellite data from contemporary samples showed two genetic populations and one hybrid cluster in both 2005 and 2010; hybrid individuals were derived from genetic population 1 (small morph) and genetic population 2 (large morph). Females of the large and rare species were more likely to pair with males of the small common species. Finch populations differed in morphology in 1852-1906 compared with 2005/2010. An unsupervised clustering method showed (a) support for three morphological clusters in the historical tree finch sample (1852-1906), which is consistent with current species recognition; (b) support for two or three morphological clusters in 2005 with some (19%) hybridization; and (c) support for just two morphological clusters in 2010 with frequent (41%) hybridization. We discuss these findings in relation to species demarcations of Camarhynchus tree finches on Floreana Island.

  10. Spectroscopic determination of fundamental parameters of small angular diameter galactic open clusters

    NASA Astrophysics Data System (ADS)

    Ahumada, A. V.; Claria, J. J.; Bica, E.; Parisi, M. C.; Torres, M. C.; Pavani, D. B.

    We present integrated spectra obtained at CASLEO (Argentina) for 9 galactic open clusters of small angular diameter. Two of them (BH 55 and Rup 159) have not been the target of previous research. The flux-calibrated spectra cover the spectral range approx. 3600-6900 A. Using the equivalent widths (EWs) of the Balmer lines and comparing the cluster spectra with template spectra, we determined E(B-V) colour excesses and ages for the present cluster sample. The parameters obtained for 6 of the clusters show good agreement with previous determinations based mainly on photometric methods. This is not the case, however, for BH 90, a scarcely reddened cluster, for which Moffat and Vogt (1975, Astron. and Astroph. SS, 20, 125) derived E(B-V) = 0.51. We explain and justify the strong discrepancy found for this object. According to the present analysis, 3 clusters are very young (Bo 14, Tr 15 and Tr 27), 2 are moderately young (NGC 6268 and BH 205), 3 are Hyades-like clusters (Rup 164, BH 90 and BH 55) and only one is an intermediate-age cluster (Rup 159).

  11. Applying the Anderson-Darling test to suicide clusters: evidence of contagion at U. S. universities?

    PubMed

    MacKenzie, Donald W

    2013-01-01

    Suicide clusters at Cornell University and the Massachusetts Institute of Technology (MIT) prompted popular and expert speculation of suicide contagion. However, some clustering is to be expected in any random process. This work tested whether suicide clusters at these two universities differed significantly from those expected under a homogeneous Poisson process, in which suicides occur randomly and independently of one another. Suicide dates were collected for MIT and Cornell for 1990-2012. The Anderson-Darling statistic was used to test the goodness-of-fit of the intervals between suicides to distribution expected under the Poisson process. Suicides at MIT were consistent with the homogeneous Poisson process, while those at Cornell showed clustering inconsistent with such a process (p = .05). The Anderson-Darling test provides a statistically powerful means to identify suicide clustering in small samples. Practitioners can use this method to test for clustering in relevant communities. The difference in clustering behavior between the two institutions suggests that more institutions should be studied to determine the prevalence of suicide clustering in universities and its causes.

  12. Changes to serum sample tube and processing methodology does not cause Intra-Individual [corrected] variation in automated whole serum N-glycan profiling in health and disease.

    PubMed

    Ventham, Nicholas T; Gardner, Richard A; Kennedy, Nicholas A; Shubhakar, Archana; Kalla, Rahul; Nimmo, Elaine R; Fernandes, Daryl L; Satsangi, Jack; Spencer, Daniel I R

    2015-01-01

    Serum N-glycans have been identified as putative biomarkers for numerous diseases. The impact of different serum sample tubes and processing methods on N-glycan analysis has received relatively little attention. This study aimed to determine the effect of different sample tubes and processing methods on the whole serum N-glycan profile in both health and disease. A secondary objective was to describe a robot automated N-glycan release, labeling and cleanup process for use in a biomarker discovery system. 25 patients with active and quiescent inflammatory bowel disease and controls had three different serum sample tubes taken at the same draw. Two different processing methods were used for three types of tube (with and without gel-separation medium). Samples were randomised and processed in a blinded fashion. Whole serum N-glycan release, 2-aminobenzamide labeling and cleanup was automated using a Hamilton Microlab STARlet Liquid Handling robot. Samples were analysed using a hydrophilic interaction liquid chromatography/ethylene bridged hybrid(BEH) column on an ultra-high performance liquid chromatography instrument. Data were analysed quantitatively by pairwise correlation and hierarchical clustering using the area under each chromatogram peak. Qualitatively, a blinded assessor attempted to match chromatograms to each individual. There was small intra-individual variation in serum N-glycan profiles from samples collected using different sample processing methods. Intra-individual correlation coefficients were between 0.99 and 1. Unsupervised hierarchical clustering and principal coordinate analyses accurately matched samples from the same individual. Qualitative analysis demonstrated good chromatogram overlay and a blinded assessor was able to accurately match individuals based on chromatogram profile, regardless of disease status. The three different serum sample tubes processed using the described methods cause minimal inter-individual variation in serum whole N-glycan profile when processed using an automated workstream. This has important implications for N-glycan biomarker discovery studies using different serum processing standard operating procedures.

  13. Patterns of Childhood Abuse and Neglect in a Representative German Population Sample

    PubMed Central

    Schilling, Christoph; Weidner, Kerstin; Brähler, Elmar; Glaesmer, Heide; Häuser, Winfried; Pöhlmann, Karin

    2016-01-01

    Background Different types of childhood maltreatment, like emotional abuse, emotional neglect, physical abuse, physical neglect and sexual abuse are interrelated because of their co-occurrence. Different patterns of childhood abuse and neglect are associated with the degree of severity of mental disorders in adulthood. The purpose of this study was (a) to identify different patterns of childhood maltreatment in a representative German community sample, (b) to replicate the patterns of childhood neglect and abuse recently found in a clinical German sample, (c) to examine whether participants reporting exposure to specific patterns of child maltreatment would report different levels of psychological distress, and (d) to compare the results of the typological approach and the results of a cumulative risk model based on our data set. Methods In a cross-sectional survey conducted in 2010, a representative random sample of 2504 German participants aged between 14 and 92 years completed the Childhood Trauma Questionnaire (CTQ). General anxiety and depression were assessed by standardized questionnaires (GAD-2, PHQ-2). Cluster analysis was conducted with the CTQ-subscales to identify different patterns of childhood maltreatment. Results Three different patterns of childhood abuse and neglect could be identified by cluster analysis. Cluster one showed low values on all CTQ-scales. Cluster two showed high values in emotional and physical neglect. Only cluster three showed high values in physical and sexual abuse. The three patterns of childhood maltreatment showed different degrees of depression (PHQ-2) and anxiety (GAD-2). Cluster one showed lowest levels of psychological distress, cluster three showed highest levels of mental distress. Conclusion The results show that different types of childhood maltreatment are interrelated and can be grouped into specific patterns of childhood abuse and neglect, which are associated with differing severity of psychological distress in adulthood. The results correspond to those recently found in a German clinical sample and support a typological approach in the research of maltreatment. While cumulative risk models focus on the number of maltreatment types, the typological approach takes the number as well as the severity of the maltreatment types into account. Thus, specific patterns of maltreatment can be examined with regard to specific long-term psychological consequences. PMID:27442446

  14. Evaluation of Primary Immunization Coverage of Infants Under Universal Immunization Programme in an Urban Area of Bangalore City Using Cluster Sampling and Lot Quality Assurance Sampling Techniques

    PubMed Central

    K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth

    2008-01-01

    Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474

  15. A priori evaluation of two-stage cluster sampling for accuracy assessment of large-area land-cover maps

    USGS Publications Warehouse

    Wickham, J.D.; Stehman, S.V.; Smith, J.H.; Wade, T.G.; Yang, L.

    2004-01-01

    Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, within-cluster correlation may reduce the precision of the accuracy estimates. The detailed population information to quantify a priori the effect of within-cluster correlation on precision is typically unavailable. Consequently, a convenient, practical approach to evaluate the likely performance of a two-stage cluster sample is needed. We describe such an a priori evaluation protocol focusing on the spatial distribution of the sample by land-cover class across different cluster sizes and costs of different sampling options, including options not imposing clustering. This protocol also assesses the two-stage design's adequacy for estimating the precision of accuracy estimates for rare land-cover classes. We illustrate the approach using two large-area, regional accuracy assessments from the National Land-Cover Data (NLCD), and describe how the a priorievaluation was used as a decision-making tool when implementing the NLCD design.

  16. On the absence of radio haloes in clusters with double relics

    NASA Astrophysics Data System (ADS)

    Bonafede, A.; Cassano, R.; Brüggen, M.; Ogrean, G. A.; Riseley, C. J.; Cuciti, V.; de Gasperin, F.; Golovich, N.; Kale, R.; Venturi, T.; van Weeren, R. J.; Wik, D. R.; Wittman, D.

    2017-09-01

    Pairs of radio relics are believed to form during cluster mergers, and are best observed when the merger occurs in the plane of the sky. Mergers can also produce radio haloes, through complex processes likely linked to turbulent re-acceleration of cosmic ray electrons. However, only some clusters with double relics also show a radio halo. Here, we present a novel method to derive upper limits on the radio halo emission, and analyse archival X-ray Chandra data, as well as galaxy velocity dispersions and lensing data, in order to understand the key parameter that switches on radio halo emission. We place upper limits on the halo power below the P1.4 GHz-M500 correlation for some clusters, confirming that clusters with double relics have different radio properties. Computing X-ray morphological indicators, we find that clusters with double relics are associated with the most disturbed clusters. We also investigate the role of different mass-ratios and time-since-merger. Data do not indicate that the merger mass-ratio has an impact on the presence or absence of radio haloes (the null hypothesis that the clusters belong to the same group cannot be rejected). However, the data suggest that the absence of radio haloes could be associated with early and late mergers, but the sample is too small to perform a statistical test. Our study is limited by the small number of clusters with double relics. Future surveys with LOFAR, ASKAP, MeerKat and SKA will provide larger samples to better address this issue.

  17. Diary Data Subjected to Cluster Analysis of Intake/Output/Void Habits with Resulting Clusters Compared by Continence Status, Age, Race

    PubMed Central

    Miller, Janis M; Guo, Ying; Rodseth, Sarah Becker

    2011-01-01

    Background Data that incorporate the full complexity of healthy beverage intake and voiding frequency do not exist; therefore, clinicians reviewing bladder habits or voiding diaries for continence care must rely on expert opinion recommendations. Objective To use data-driven cluster analyses to reduce complex voiding diary variables into discrete patterns or data cluster profiles, descriptively name the clusters, and perform validity testing. Method Participants were 352 community women who filled out a 3-day voiding diary. Six variables (void frequency during daytime hours, void frequency during nighttime hours, modal output, total output, total intake, and body mass index) were entered into cluster analyses. The clusters were analyzed for differences by continence status, age, race (Black women, n = 196 White women, n = 156), and for those who were incontinent, by leakage episode severity. Results Three clusters emerged, labeled descriptively as Conventional, Benchmark, and Superplus. The Conventional cluster (68% of the sample) demonstrated mean daily intake of 45 ±13 ounces; mean daily output of 37 ± 15 ounces, mean daily voids 5 ± 2 times, mean modal daytime output 10±0.5 ounces, and mean nighttime voids 1±1 times. The Superplus cluster (7% of the sample) showed double or triple these values across the 5 variables, and the Benchmark cluster (25%) showed values consistent with current popular recommendations on intake and output (e.g., meeting or exceeding the 8 × 8 fluid intake rule of thumb). The clusters differed significantly (p < .05) by age, race, amount of irritating beverages consumed, and incontinence status. Discussion Identification of three discrete clusters provides for a potential parsimonious but data-driven means of classifying individuals for additional epidemiological or clinical study. The clinical utility rests with potential for intervening to move an individual from a high risk to low risk cluster with regards to incontinence. PMID:21317828

  18. Finding SDSS Galaxy Clusters in 4-dimensional Color Space Using the False Discovery Rate

    NASA Astrophysics Data System (ADS)

    Nichol, R. C.; Miller, C. J.; Reichart, D.; Wasserman, L.; Genovese, C.; SDSS Collaboration

    2000-12-01

    We describe a recently developed statistical technique that provides a meaningful cut-off in probability-based decision making. We are concerned with multiple testing, where each test produces a well-defined probability (or p-value). By well-known, we mean that the null hypothesis used to determine the p-value is fully understood and appropriate. The method is entitled False Discovery Rate (FDR) and its largest advantage over other measures is that it allows one to specify a maximal amount of acceptable error. As an example of this tool, we apply FDR to a four-dimensional clustering algorithm using SDSS data. For each galaxy (or test galaxy), we count the number of neighbors that fit within one standard deviation of a four dimensional Gaussian centered on that test galaxy. The mean and standard deviation of that Gaussian are determined from the colors and errors of the test galaxy. We then take that same Gaussian and place it on a random selection of n galaxies and make a similar count. In the limit of large n, we expect the median count around these random galaxies to represent a typical field galaxy. For every test galaxy we determine the probability (or p-value) that it is a field galaxy based on these counts. A low p-value implies that the test galaxy is in a cluster environment. Once we have a p-value for every galaxy, we use FDR to determine at what level we should make our probability cut-off. Once this cut-off is made, we have a final sample of galaxies that are cluster-like galaxies. Using FDR, we also know the maximum amount of field contamination in our cluster galaxy sample. We present our preliminary galaxy clustering results using these methods.

  19. Kappa statistic for clustered matched-pair data.

    PubMed

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  20. The Influence of Social Network Characteristics on Peer Clustering in Smoking: A Two-Wave Panel Study of 19- and 23-Year-Old Swedes

    PubMed Central

    Rostila, Mikael; Edling, Christofer; Rydgren, Jens

    2016-01-01

    Objectives The present study examines how the composition of social networks and perceived relationship content influence peer clustering in smoking, and how the association changes during the transition from late adolescence to early adulthood. Methods The analysis was based on a Swedish two-wave survey sample comprising ego-centric network data. Respondents were 19 years old in the initial wave, and 23 when the follow-up sample was conducted. 17,227 ego-alter dyads were included in the analyses, which corresponds to an average response rate of 48.7 percent. Random effects logistic regression models were performed to calculate gender-specific average marginal effects of social network characteristics on smoking. Results The association of egos’ and alters’ smoking behavior was confirmed and found to be stronger when correlated in the female sample. For females, the associations decreased between age 19 and 23. Interactions between network characteristics and peer clustering in smoking showed that intense social interactions with smokers increase egos’ smoking probability. The influence of network structures on peer clustering in smoking decreased during the transition from late adolescence to early adulthood. Conclusions The study confirmed peer clustering in smoking and revealed that females’ smoking behavior in particular is determined by social interactions. Female smokers’ propensity to interact with other smokers was found to be associated with the quality of peer relationships, frequent social interactions, and network density. The influence of social networks on peer clustering in smoking decreased during the transition from late adolescence to early adulthood. PMID:27727314

  1. Dispositional Mindfulness, Spirituality, and Substance Use in Predicting Depressive Symptoms in a Treatment-Seeking Sample

    PubMed Central

    Shorey, Ryan C.; Gawrysiak, Michael J.; Anderson, Scott; Stuart, Gregory L.

    2015-01-01

    Objective It is imperative that research identifies factors related to depression among individuals in substance use treatment, as depression is associated with substance use relapse. Dispositional mindfulness and spirituality may bear an important role in the relationship between depression and substance use. Method Using preexisting patient medical records (N = 105), the current study investigated dispositional mindfulness and spirituality in relation to depressive symptom clusters (affective, cognitive, and physiological) among men in residential substance use treatment. The mean age of the sample was 41.03 (standard deviation = 10.75). Results Findings demonstrated that dispositional mindfulness and spirituality were negatively associated with depressive symptoms. After controlling for age, alcohol use, and drug use, dispositional mindfulness remained negatively associated with all of the depression clusters. Spirituality only remained associated with the cognitive depression cluster. Conclusion Mindfulness-based interventions may hold promise as an effective intervention for reducing substance use and concurrent depressive symptoms. PMID:25522300

  2. Soft-landing ion mobility of silver clusters for small-molecule matrix-assisted laser desorption ionization mass spectrometry and imaging of latent fingerprints.

    PubMed

    Walton, Barbara L; Verbeck, Guido F

    2014-08-19

    Matrix-assisted laser desorption ionization (MALDI) imaging is gaining popularity, but matrix effects such as mass spectral interference and damage to the sample limit its applications. Replacing traditional matrices with silver particles capable of equivalent or increased photon energy absorption from the incoming laser has proven to be beneficial for low mass analysis. Not only can silver clusters be advantageous for low mass compound detection, but they can be used for imaging as well. Conventional matrix application methods can obstruct samples, such as fingerprints, rendering them useless after mass analysis. The ability to image latent fingerprints without causing damage to the ridge pattern is important as it allows for further characterization of the print. The application of silver clusters by soft-landing ion mobility allows for enhanced MALDI and preservation of fingerprint integrity.

  3. A novel tensile test method to assess texture and gaping in salmon fillets.

    PubMed

    Ashton, Thomas J; Michie, Ian; Johnston, Ian A

    2010-05-01

    A new tensile strength method was developed to quantify the force required to tear a standardized block of Atlantic salmon muscle with the aim of identifying those samples more prone to factory downgrading as a result of softness and fillet gaping. The new method effectively overcomes problems of sample attachment encountered with previous tensile strength tests. The repeatability and sensitivity and predictability of the new technique were evaluated against other common instrumental texture measurement methods. The relationship between sensory assessments of firmness and parameters from the instrumental texture methods was also determined. Data from the new method were shown to have the strongest correlations with gaping severity (r =-0.514, P < 0.001) and the highest level of repeatability of data when analyzing cold-smoked samples. The Warner Bratzler shear method gave the most repeatable data from fresh samples and had the highest correlations between fresh and smoked product from the same fish (r = 0.811, P < 0.001). A hierarchical cluster analysis placed the tensile test in the top cluster, alongside the Warner Bratzler method, demonstrating that it also yields adequate data with respect to these tests. None of the tested sensory analysis attributes showed significant relationships to mechanical tests except fillet firmness, with correlations (r) of 0.42 for cylinder probe maximum force (P = 0.005) and 0.31 for tensile work (P = 0.04). It was concluded that the tensile test method developed provides an important addition to the available tools for mechanical analysis of salmon quality, particularly with respect to the prediction of gaping during factory processing, which is a serious commercial problem. A novel, reliable method of measuring flesh tensile strength in salmon, provides data of relevance to gaping.

  4. Functional Analyses of NSF1 in Wine Yeast Using Interconnected Correlation Clustering and Molecular Analyses

    PubMed Central

    Bessonov, Kyrylo; Walkey, Christopher J.; Shelp, Barry J.; van Vuuren, Hennie J. J.; Chiu, David; van der Merwe, George

    2013-01-01

    Analyzing time-course expression data captured in microarray datasets is a complex undertaking as the vast and complex data space is represented by a relatively low number of samples as compared to thousands of available genes. Here, we developed the Interdependent Correlation Clustering (ICC) method to analyze relationships that exist among genes conditioned on the expression of a specific target gene in microarray data. Based on Correlation Clustering, the ICC method analyzes a large set of correlation values related to gene expression profiles extracted from given microarray datasets. ICC can be applied to any microarray dataset and any target gene. We applied this method to microarray data generated from wine fermentations and selected NSF1, which encodes a C2H2 zinc finger-type transcription factor, as the target gene. The validity of the method was verified by accurate identifications of the previously known functional roles of NSF1. In addition, we identified and verified potential new functions for this gene; specifically, NSF1 is a negative regulator for the expression of sulfur metabolism genes, the nuclear localization of Nsf1 protein (Nsf1p) is controlled in a sulfur-dependent manner, and the transcription of NSF1 is regulated by Met4p, an important transcriptional activator of sulfur metabolism genes. The inter-disciplinary approach adopted here highlighted the accuracy and relevancy of the ICC method in mining for novel gene functions using complex microarray datasets with a limited number of samples. PMID:24130853

  5. The Far-Field Hubble Constant

    NASA Astrophysics Data System (ADS)

    Lauer, Tod

    1995-07-01

    We request deep, near-IR (F814W) WFPC2 images of five nearby Brightest Cluster Galaxies (BCG) to calibrate the BCG Hubble diagram by the Surface Brightness Fluctuation (SBF) method. Lauer & Postman (1992) show that the BCG Hubble diagram measured out to 15,000 km s^-1 is highly linear. Calibration of the Hubble diagram zeropoint by SBF will thus yield an accurate far-field measure of H_0 based on the entire volume within 15,000 km s^-1, thus circumventing any strong biases caused by local peculiar velocity fields. This method of reaching the far field is contrasted with those using distance ratios between Virgo and Coma, or any other limited sample of clusters. HST is required as the ground-based SBF method is limited to <3,000 km s^-1. The high spatial resolution of HST allows precise measurement of the SBF signal at large distances, and allows easy recognition of globular clusters, background galaxies, and dust clouds in the BCG images that must be removed prior to SBF detection. The proposing team developed the SBF method, the first BCG Hubble diagram based on a full-sky, volume-limited BCG sample, played major roles in the calibration of WFPC and WFPC2, and are conducting observations of local galaxies that will validate the SBF zeropoint (through GTO programs). This work uses the SBF method to tie both the Cepheid and Local Group giant-branch distances generated by HST to the large scale Hubble flow, which is most accurately traced by BCGs.

  6. Crystal genes in a marginal glass-forming system of Ni 50Zr 50

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wen, T. Q.; Tang, L.; Sun, Y.

    Glass-forming motifs with B2 traits are found. A perfect Ni-centered B33 motif deteriorates the glass-forming ability of Ni 50Zr 50. The marginal glass-forming ability (GFA) of binary Ni-Zr system is an issue to be explained considering the numerous bulk metallic glasses (BMGs) found in the Cu-Zr system. Using molecular dynamics, the structures and dynamics of Ni 50Zr 50 metallic liquid and glass are investigated at the atomistic level. To achieve a well-relaxed glassy sample, sub-T g annealing method is applied and the final sample is closer to the experiments than the models prepared by continuous cooling. With the state-of-the-art structuralmore » analysis tools such as cluster alignment and pair-wise alignment methods, two glass-forming motifs with some mixed traits of the metastable B2 crystalline phase and the crystalline Ni-centered B33 motif are found to be dominant in the undercooled liquid and glass samples. A new chemical order characterization on each short-range order (SRO) structure is accomplished based on the cluster alignment method. The significant amount of the crystalline motif and the few icosahedra in the glassy sample deteriorate the GFA.« less

  7. Crystal genes in a marginal glass-forming system of Ni 50Zr 50

    DOE PAGES

    Wen, T. Q.; Tang, L.; Sun, Y.; ...

    2017-10-17

    Glass-forming motifs with B2 traits are found. A perfect Ni-centered B33 motif deteriorates the glass-forming ability of Ni 50Zr 50. The marginal glass-forming ability (GFA) of binary Ni-Zr system is an issue to be explained considering the numerous bulk metallic glasses (BMGs) found in the Cu-Zr system. Using molecular dynamics, the structures and dynamics of Ni 50Zr 50 metallic liquid and glass are investigated at the atomistic level. To achieve a well-relaxed glassy sample, sub-T g annealing method is applied and the final sample is closer to the experiments than the models prepared by continuous cooling. With the state-of-the-art structuralmore » analysis tools such as cluster alignment and pair-wise alignment methods, two glass-forming motifs with some mixed traits of the metastable B2 crystalline phase and the crystalline Ni-centered B33 motif are found to be dominant in the undercooled liquid and glass samples. A new chemical order characterization on each short-range order (SRO) structure is accomplished based on the cluster alignment method. The significant amount of the crystalline motif and the few icosahedra in the glassy sample deteriorate the GFA.« less

  8. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  9. Affective Structures among Students and Its Relationship with Academic Burnout with Emphasis on Gender

    ERIC Educational Resources Information Center

    Bikar, Somaye; Marziyeh, Afsaneh; Pourghaz, Abdolwahab

    2018-01-01

    This study aimed to determine the relationship between affective structures and academic burnout among male and female third-grade high school students in Zahedan in the academic year 2016-2017. The current descriptive-correlational study had a sample including 362 students selected using a multistage cluster sampling method. To collect data,…

  10. Sexual Abuse among Female High School Students in Istanbul, Turkey

    ERIC Educational Resources Information Center

    Alikasifoglu, Mujgan; Erginoz, Ethem; Ercan, Oya; Albayrak-Kaymak, Deniz; Uysal, Omer; Ilter, Ozdemir

    2006-01-01

    Objective: The objective of the study was to determine the prevalence of sexual abuse in female adolescents in Istanbul, Turkey from data collected as part of a school-based population study on health and health behaviors. Method: A stratified cluster sampling procedure was used for this cross-sectional study. The study sample included 1,955…

  11. A Multidimensional Examination of the Acculturation and Psychological Functioning of a Sample of Immigrant Chinese Mothers in the US

    ERIC Educational Resources Information Center

    Tahseen, Madiha; Cheah, Charissa S. L.

    2012-01-01

    The present research used the cluster analysis method to examine the acculturation of immigrant Chinese mothers (ICMs), and the demographic characteristics and psychological functioning associated with each acculturation style. The sample was comprised of 83 first-generation ICMs of preschool children residing in Maryland, United States (US).…

  12. Measurement of surface roughness changes of unpolished and polished enamel following erosion

    PubMed Central

    Austin, Rupert S.; Parkinson, Charles R.; Hasan, Adam; Bartlett, David W.

    2017-01-01

    Objectives To determine if Sa roughness data from measuring one central location of unpolished and polished enamel were representative of the overall surfaces before and after erosion. Methods Twenty human enamel sections (4x4 mm) were embedded in bis-acryl composite and randomised to either a native or polishing enamel preparation protocol. Enamel samples were subjected to an acid challenge (15 minutes 100 mL orange juice, pH 3.2, titratable acidity 41.3mmol OH/L, 62.5 rpm agitation, repeated for three cycles). Median (IQR) surface roughness [Sa] was measured at baseline and after erosion from both a centralised cluster and four peripheral clusters. Within each cluster, five smaller areas (0.04 mm2) provided the Sa roughness data. Results For both unpolished and polished enamel samples there were no significant differences between measuring one central cluster or four peripheral clusters, before and after erosion. For unpolished enamel the single central cluster had a median (IQR) Sa roughness of 1.45 (2.58) μm and the four peripheral clusters had a median (IQR) of 1.32 (4.86) μm before erosion; after erosion there were statistically significant reductions to 0.38 (0.35) μm and 0.34 (0.49) μm respectively (p<0.0001). Polished enamel had a median (IQR) Sa roughness 0.04 (0.17) μm for the single central cluster and 0.05 (0.15) μm for the four peripheral clusters which statistically significantly increased after erosion to 0.27 (0.08) μm for both (p<0.0001). Conclusion Measuring one central cluster of unpolished and polished enamel was representative of the overall enamel surface roughness, before and after erosion. PMID:28771562

  13. The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs.

    PubMed

    Thompson, Jennifer A; Fielding, Katherine; Hargreaves, James; Copas, Andrew

    2017-12-01

    Background/Aims We sought to optimise the design of stepped wedge trials with an equal allocation of clusters to sequences and explored sample size comparisons with alternative trial designs. Methods We developed a new expression for the design effect for a stepped wedge trial, assuming that observations are equally correlated within clusters and an equal number of observations in each period between sequences switching to the intervention. We minimised the design effect with respect to (1) the fraction of observations before the first and after the final sequence switches (the periods with all clusters in the control or intervention condition, respectively) and (2) the number of sequences. We compared the design effect of this optimised stepped wedge trial to the design effects of a parallel cluster-randomised trial, a cluster-randomised trial with baseline observations, and a hybrid trial design (a mixture of cluster-randomised trial and stepped wedge trial) with the same total cluster size for all designs. Results We found that a stepped wedge trial with an equal allocation to sequences is optimised by obtaining all observations after the first sequence switches and before the final sequence switches to the intervention; this means that the first sequence remains in the control condition and the last sequence remains in the intervention condition for the duration of the trial. With this design, the optimal number of sequences is [Formula: see text], where [Formula: see text] is the cluster-mean correlation, [Formula: see text] is the intracluster correlation coefficient, and m is the total cluster size. The optimal number of sequences is small when the intracluster correlation coefficient and cluster size are small and large when the intracluster correlation coefficient or cluster size is large. A cluster-randomised trial remains more efficient than the optimised stepped wedge trial when the intracluster correlation coefficient or cluster size is small. A cluster-randomised trial with baseline observations always requires a larger sample size than the optimised stepped wedge trial. The hybrid design can always give an equally or more efficient design, but will be at most 5% more efficient. We provide a strategy for selecting a design if the optimal number of sequences is unfeasible. For a non-optimal number of sequences, the sample size may be reduced by allowing a proportion of observations before the first or after the final sequence has switched. Conclusion The standard stepped wedge trial is inefficient. To reduce sample sizes when a hybrid design is unfeasible, stepped wedge trial designs should have no observations before the first sequence switches or after the final sequence switches.

  14. ASTM clustering for improving coal analysis by near-infrared spectroscopy.

    PubMed

    Andrés, J M; Bona, M T

    2006-11-15

    Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.

  15. [Study on HIV-1 subtype among elderly male clients and female sex workers of low-cost venues in Guangxi Zhuang Autonomous Region, China].

    PubMed

    Deng, Y Q; Li, J J; Fang, N Y; Wang, B; Wang, J W; Liang, S S; Shen, Z Y; Lan, G H; Zhang, H M; Wu, X H; Lu, H X; Ge, X M

    2017-03-10

    Objective: To understand HIV-1 subtype characteristics and transmission clusters in elderly male clients and female sex workers (FSWs) of low-cost commercial sex venues in Guangxi Zhuang Autonomous Region, China. Methods: A cross sectional survey was conducted in FSWs and elderly male clients (≥50 years) of low-cost commercial sex venues in 4 cities and 9 counties in Guangxi Zhuang Autonomous Region by convenient sampling in 2012. The blood sample was collected from each case for HIV-1 antibody detection. The pol gene fragments were amplified and sequenced from viral RNA template extracted from plasma samples. The phylogenetic tree was constructed and the subtypes were identified. Results: A total of 4 048 elderly male clients and 784 FSWs were surveyed, and 116 HIV-1 infections were detected, the positive rate was 2.5% (103/4 048) in the clients and 1.7% (13/784) in FSWs. The gene amplification and sequencing of HIV-1 detected in 84 blood samples indicated that 53 pol gene sequences were successfully determined (48 blood samples from elderly male clients and 5 blood samples from FSWs). Among 53 pol sequences, 48(90.6% ), 4(7.5% ), and 1(1.9% ) sequences were identified as CRF01_AE, CRF08_BC, and CRF07_BC, respectively. Two transmission clusters were identified among CRF01_AE, including 4 sub-clusters. One transmission cluster was identified among CRF08_BC. The transmission cluster or sub-cluster were from the infected individuals at same low-cost commercial sex venue, or different low-cost commercial sex venues in the same town, or same place, or adjacent villages and towns. Conclusions: CRF01_AE was the predominant HIV-1 subtype among elderly male clients and FSWs of low-cost commercial sex venues in Guangxi Zhuang Autonomous Region, circulating in same venue or adjacent villages and towns. The HIV-1 positive male clients and FSWs might play an important role in the spread of the strains.

  16. Magnetic analysis of a melt-spun Fe-dilute Cu60Ag35Fe5 alloy

    NASA Astrophysics Data System (ADS)

    Kondo, Shin-ichiro; Kaneko, Kazuhiro; Morimura, Takao; Nakashima, Hiromichi; Kobayashi, Shin-Taro; Michioka, Chishiro; Yoshimura, Kazuyoshi

    2015-04-01

    The magnetic properties of a melt-spun Fe-dilute Cu60Ag35Fe5 alloy are examined by X-ray diffraction, magnetic measurements, and transmission electron microscopy (TEM). The X-ray diffraction patterns show that the as-spun and annealed (773 K×36 ks) samples contain Cu and Ag phases and no Fe phases; thus, most Fe atoms are dispersed as clusters. Magnetic measurements indicate that the as-spun and annealed samples exhibit superparamagnetic behavior at 300 K, whereas ferromagnetic and superparamagnetic behaviors coexist at 4.2 K. The magnetic moments of small clusters at 300 K are determined by the nonlinear least squares method as 5148 and 4671 μB for as-spun and annealed samples, respectively, whereas those at 300 K are experimentally determined as 3500 and 3200 μB. This decrease in magnetic moments may imply the formation of anti-ferromagnetic coupling by annealing. TEM observation of the melt-spun sample suggests that there are three regions with different compositions: Cu-rich, Ag-rich, and Fe-rich with no precipitation in the matrix. In addition, these regions have obscure interfaces. The magnetic clusters are attributed to the Fe-rich regions.

  17. Revealing hidden species diversity in closely related species using nuclear SNPs, SSRs and DNA sequences - a case study in the tree genus Milicia.

    PubMed

    Daïnou, Kasso; Blanc-Jolivet, Céline; Degen, Bernd; Kimani, Priscilla; Ndiade-Bourobou, Dyana; Donkpegan, Armel S L; Tosso, Félicien; Kaymak, Esra; Bourland, Nils; Doucet, Jean-Louis; Hardy, Olivier J

    2016-12-01

    Species delimitation in closely related plant taxa can be challenging because (i) reproductive barriers are not always congruent with morphological differentiation, (ii) use of plastid sequences might lead to misinterpretation, (iii) rare species might not be sampled. We revisited molecular-based species delimitation in the African genus Milicia, currently divided into M. regia (West Africa) and M. excelsa (from West to East Africa). We used 435 samples collected in West, Central and East Africa. We genotyped SNP and SSR loci to identify genetic clusters, and sequenced two plastid regions (psbA-trnH, trnC-ycf6) and a nuclear gene (At103) to confirm species' divergence and compare species delimitation methods. We also examined whether ecological niche differentiation was congruent with sampled genetic structure. West African M. regia, West African and East African M. excelsa samples constituted three well distinct genetic clusters according to SNPs and SSRs. In Central Africa, two genetic clusters were consistently inferred by both types of markers, while a few scattered samples, sympatric with the preceding clusters but exhibiting leaf traits of M. regia, were grouped with the West African M. regia cluster based on SNPs or formed a distinct cluster based on SSRs. SSR results were confirmed by sequence data from the nuclear region At103 which revealed three distinct 'Fields For Recombination' corresponding to (i) West African M. regia, (ii) Central African samples with leaf traits of M. regia, and (iii) all M. excelsa samples. None of the plastid sequences provide indication of distinct clades of the three species-like units. Niche modelling techniques yielded a significant correlation between niche overlap and genetic distance. Our genetic data suggest that three species of Milicia could be recognized. It is surprising that the occurrence of two species in Central Africa was not reported for this well-known timber tree. Globally, our work highlights the importance of collecting samples in a systematic way and the need for combining different nuclear markers when dealing with species complexes. Recognizing cryptic species is particularly crucial for economically exploited species because some hidden taxa might actually be endangered as they are merged with more abundant species.

  18. Understanding the cluster randomised crossover design: a graphical illustraton of the components of variation and a sample size tutorial.

    PubMed

    Arnup, Sarah J; McKenzie, Joanne E; Hemming, Karla; Pilcher, David; Forbes, Andrew B

    2017-08-15

    In a cluster randomised crossover (CRXO) design, a sequence of interventions is assigned to a group, or 'cluster' of individuals. Each cluster receives each intervention in a separate period of time, forming 'cluster-periods'. Sample size calculations for CRXO trials need to account for both the cluster randomisation and crossover aspects of the design. Formulae are available for the two-period, two-intervention, cross-sectional CRXO design, however implementation of these formulae is known to be suboptimal. The aims of this tutorial are to illustrate the intuition behind the design; and provide guidance on performing sample size calculations. Graphical illustrations are used to describe the effect of the cluster randomisation and crossover aspects of the design on the correlation between individual responses in a CRXO trial. Sample size calculations for binary and continuous outcomes are illustrated using parameters estimated from the Australia and New Zealand Intensive Care Society - Adult Patient Database (ANZICS-APD) for patient mortality and length(s) of stay (LOS). The similarity between individual responses in a CRXO trial can be understood in terms of three components of variation: variation in cluster mean response; variation in the cluster-period mean response; and variation between individual responses within a cluster-period; or equivalently in terms of the correlation between individual responses in the same cluster-period (within-cluster within-period correlation, WPC), and between individual responses in the same cluster, but in different periods (within-cluster between-period correlation, BPC). The BPC lies between zero and the WPC. When the WPC and BPC are equal the precision gained by crossover aspect of the CRXO design equals the precision lost by cluster randomisation. When the BPC is zero there is no advantage in a CRXO over a parallel-group cluster randomised trial. Sample size calculations illustrate that small changes in the specification of the WPC or BPC can increase the required number of clusters. By illustrating how the parameters required for sample size calculations arise from the CRXO design and by providing guidance on both how to choose values for the parameters and perform the sample size calculations, the implementation of the sample size formulae for CRXO trials may improve.

  19. 75 FR 44937 - Submission for OMB Review; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-30

    ... is a block cluster, which consists of one or more contiguous census blocks. The P sample is a sample of housing units and persons obtained independently from the census for a sample of block clusters. The E sample is a sample of census housing units and enumerations in the same block of clusters as the...

  20. Buccal Swabbing as a Noninvasive Method To Determine Bacterial, Archaeal, and Eukaryotic Microbial Community Structures in the Rumen

    PubMed Central

    Kirk, Michelle R.; Jonker, Arjan; McCulloch, Alan

    2015-01-01

    Analysis of rumen microbial community structure based on small-subunit rRNA marker genes in metagenomic DNA samples provides important insights into the dominant taxa present in the rumen and allows assessment of community differences between individuals or in response to treatments applied to ruminants. However, natural animal-to-animal variation in rumen microbial community composition can limit the power of a study considerably, especially when only subtle differences are expected between treatment groups. Thus, trials with large numbers of animals may be necessary to overcome this variation. Because ruminants pass large amounts of rumen material to their oral cavities when they chew their cud, oral samples may contain good representations of the rumen microbiota and be useful in lieu of rumen samples to study rumen microbial communities. We compared bacterial, archaeal, and eukaryotic community structures in DNAs extracted from buccal swabs to those in DNAs from samples collected directly from the rumen by use of a stomach tube for sheep on four different diets. After bioinformatic depletion of potential oral taxa from libraries of samples collected via buccal swabs, bacterial communities showed significant clustering by diet (R = 0.37; analysis of similarity [ANOSIM]) rather than by sampling method (R = 0.07). Archaeal, ciliate protozoal, and anaerobic fungal communities also showed significant clustering by diet rather than by sampling method, even without adjustment for potentially orally associated microorganisms. These findings indicate that buccal swabs may in future allow quick and noninvasive sampling for analysis of rumen microbial communities in large numbers of ruminants. PMID:26276109

  1. How large are the consequences of covariate imbalance in cluster randomized trials: a simulation study with a continuous outcome and a binary covariate at the cluster level.

    PubMed

    Moerbeek, Mirjam; van Schie, Sander

    2016-07-11

    The number of clusters in a cluster randomized trial is often low. It is therefore likely random assignment of clusters to treatment conditions results in covariate imbalance. There are no studies that quantify the consequences of covariate imbalance in cluster randomized trials on parameter and standard error bias and on power to detect treatment effects. The consequences of covariance imbalance in unadjusted and adjusted linear mixed models are investigated by means of a simulation study. The factors in this study are the degree of imbalance, the covariate effect size, the cluster size and the intraclass correlation coefficient. The covariate is binary and measured at the cluster level; the outcome is continuous and measured at the individual level. The results show covariate imbalance results in negligible parameter bias and small standard error bias in adjusted linear mixed models. Ignoring the possibility of covariate imbalance while calculating the sample size at the cluster level may result in a loss in power of at most 25 % in the adjusted linear mixed model. The results are more severe for the unadjusted linear mixed model: parameter biases up to 100 % and standard error biases up to 200 % may be observed. Power levels based on the unadjusted linear mixed model are often too low. The consequences are most severe for large clusters and/or small intraclass correlation coefficients since then the required number of clusters to achieve a desired power level is smallest. The possibility of covariate imbalance should be taken into account while calculating the sample size of a cluster randomized trial. Otherwise more sophisticated methods to randomize clusters to treatments should be used, such as stratification or balance algorithms. All relevant covariates should be carefully identified, be actually measured and included in the statistical model to avoid severe levels of parameter and standard error bias and insufficient power levels.

  2. Grouped fuzzy SVM with EM-based partition of sample space for clustered microcalcification detection.

    PubMed

    Wang, Huiya; Feng, Jun; Wang, Hongyu

    2017-07-20

    Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.

  3. The Effect of Computer Games on Students' Critical Thinking Disposition and Educational Achievement

    ERIC Educational Resources Information Center

    Seifi, Mohammad; Derikvandi, Zahra; Moosavipour, Saeed; Khodabandelou, Rouhollah

    2015-01-01

    The main aim of this research was to investigate the effect of computer games on student' critical thinking disposition and educational achievement. The research method was descriptive, and its type was casual-comparative. The sample included 270 female high school students in Andimeshk town selected by multistage cluster method. Ricketts…

  4. Rapid and sensitive analysis of 27 underivatized free amino acids, dipeptides, and tripeptides in fruits of Siraitia grosvenorii Swingle using HILIC-UHPLC-QTRAP(®)/MS (2) combined with chemometrics methods.

    PubMed

    Zhou, Guisheng; Wang, Mengyue; Li, Yang; Peng, Ying; Li, Xiaobo

    2015-08-01

    In the present study, a new strategy based on chemical analysis and chemometrics methods was proposed for the comprehensive analysis and profiling of underivatized free amino acids (FAAs) and small peptides among various Luo-Han-Guo (LHG) samples. Firstly, the ultrasound-assisted extraction (UAE) parameters were optimized using Plackett-Burman (PB) screening and Box-Behnken designs (BBD), and the following optimal UAE conditions were obtained: ultrasound power of 280 W, extraction time of 43 min, and the solid-liquid ratio of 302 mL/g. Secondly, a rapid and sensitive analytical method was developed for simultaneous quantification of 24 FAAs and 3 active small peptides in LHG at trace levels using hydrophilic interaction ultra-performance liquid chromatography coupled with triple-quadrupole linear ion-trap tandem mass spectrometry (HILIC-UHPLC-QTRAP(®)/MS(2)). The analytical method was validated by matrix effects, linearity, LODs, LOQs, precision, repeatability, stability, and recovery. Thirdly, the proposed optimal UAE conditions and analytical methods were applied to measurement of LHG samples. It was shown that LHG was rich in essential amino acids, which were beneficial nutrient substances for human health. Finally, based on the contents of the 27 analytes, the chemometrics methods of unsupervised principal component analysis (PCA) and supervised counter propagation artificial neural network (CP-ANN) were applied to differentiate and classify the 40 batches of LHG samples from different cultivated forms, regions, and varieties. As a result, these samples were mainly clustered into three clusters, which illustrated the cultivating disparity among the samples. In summary, the presented strategy had potential for the investigation of edible plants and agricultural products containing FAAs and small peptides.

  5. Joint fMRI analysis and subject clustering using sparse dictionary learning

    NASA Astrophysics Data System (ADS)

    Kim, Seung-Jun; Dontaraju, Krishna K.

    2017-08-01

    Multi-subject fMRI data analysis methods based on sparse dictionary learning are proposed. In addition to identifying the component spatial maps by exploiting the sparsity of the maps, clusters of the subjects are learned by postulating that the fMRI volumes admit a subspace clustering structure. Furthermore, in order to tune the associated hyper-parameters systematically, a cross-validation strategy is developed based on entry-wise sampling of the fMRI dataset. Efficient algorithms for solving the proposed constrained dictionary learning formulations are developed. Numerical tests performed on synthetic fMRI data show promising results and provides insights into the proposed technique.

  6. On the Analysis of Clustering in an Irradiated Low Alloy Reactor Pressure Vessel Steel Weld.

    PubMed

    Lindgren, Kristina; Stiller, Krystyna; Efsing, Pål; Thuvander, Mattias

    2017-04-01

    Radiation induced clustering affects the mechanical properties, that is the ductile to brittle transition temperature (DBTT), of reactor pressure vessel (RPV) steel of nuclear power plants. The combination of low Cu and high Ni used in some RPV welds is known to further enhance the DBTT shift during long time operation. In this study, RPV weld samples containing 0.04 at% Cu and 1.6 at% Ni were irradiated to 2.0 and 6.4×1023 n/m2 in the Halden test reactor. Atom probe tomography (APT) was applied to study clustering of Ni, Mn, Si, and Cu. As the clusters are in the nanometer-range, APT is a very suitable technique for this type of study. From APT analyses information about size distribution, number density, and composition of the clusters can be obtained. However, the quantification of these attributes is not trivial. The maximum separation method (MSM) has been used to characterize the clusters and a detailed study about the influence of the choice of MSM cluster parameters, primarily on the cluster number density, has been undertaken.

  7. The Effect of Sampling and Storage on the Fecal Microbiota Composition in Healthy and Diseased Subjects

    PubMed Central

    Tedjo, Danyta I.; Jonkers, Daisy M. A. E.; Savelkoul, Paul H.; Masclee, Ad A.; van Best, Niels; Pierik, Marieke J.; Penders, John

    2015-01-01

    Large-scale cohort studies are currently being designed to investigate the human microbiome in health and disease. Adequate sampling strategies are required to limit bias due to shifts in microbial communities during sampling and storage. Therefore, we examined the impact of different sampling and storage conditions on the stability of fecal microbial communities in healthy and diseased subjects. Fecal samples from 10 healthy controls, 10 irritable bowel syndrome and 8 inflammatory bowel disease patients were collected on site, aliquoted immediately after defecation and stored at -80°C, -20°C for 1 week, at +4°C or room temperature for 24 hours. Fecal transport swabs (FecalSwab, Copan) were collected and stored for 48-72 hours at room temperature. We used pyrosequencing of the 16S gene to investigate the stability of microbial communities. Alpha diversity did not differ between all storage methods and -80°C, except for the fecal swabs. UPGMA clustering and principal coordinate analysis showed significant clustering by test subject (p<0.001) but not by storage method. Bray-Curtis dissimilarity and (un)weighted UniFrac showed a significant higher distance between fecal swabs and -80°C versus the other methods and -80°C samples (p<0.009). The relative abundance of Ruminococcus and Enterobacteriaceae did not differ between the storage methods versus -80°C, but was higher in fecal swabs (p<0.05). Storage up to 24 hours (at +4°C or room temperature) or freezing at -20°C did not significantly alter the fecal microbial community structure compared to direct freezing of samples from healthy subjects and patients with gastrointestinal disorders. PMID:26024217

  8. An Observational Study of Blended Young Stellar Clusters in the Galactic Plane - Do Massive Stars form First?

    NASA Astrophysics Data System (ADS)

    Martínez-Galarza, Rafael; Protopapas, Pavlos; Smith, Howard A.; Morales, Esteban

    2018-01-01

    From an observational point of view, the early life of massive stars is difficult to understand partly because star formation occurs in crowded clusters where individual stars often appear blended together in the beams of infrared telescopes. This renders the characterization of the physical properties of young embedded clusters via spectral energy distribution (SED) fitting a challenging task. Of particular relevance for the testing of star formation models is the question of whether the claimed universality of the IMF (references) is reflected in an equally universal integrated galactic initial mass function (IGIMF) of stars. In other words, is the set of all stellar masses in the galaxy sampled from a single universal IMF, or does the distribution of masses depend on the environment, making the IGIMF different from the canonical IMF? If the latter is true, how different are the two? We present a infrared SED analysis of ~70 Spitzer-selected, low mass ($<100~\\rm{M}_{\\odot}$), galactic blended clusters. For all of the clusters we obtain the most probable individual SED of each member and derive their physical properties, effectively deblending the confused emission from individual YSOs. Our algorithm incorporates a combined probabilistic model of the blended SEDs and the unresolved images in the long-wavelength end. We find that our results are compatible with competitive accretion in the central regions of young clusters, with the most massive stars forming early on in the process and less massive stars forming about 1Myr later. We also find evidence for a relationship between the total stellar mass of the cluster and the mass of the most massive member that favors optimal sampling in the cluster and disfavors random sampling for the canonical IMF, implying that star formation is self-regulated, and that the mass of the most massive star in a cluster depends on the available resources. The method presented here is easily adapted to future observations of clustered regions of star formation with JWST and other high resolution facilities.

  9. Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

    PubMed

    Ramón, M; Martínez-Pastor, F

    2018-04-23

    Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.

  10. X-ray tomography investigation of intensive sheared Al–SiC metal matrix composites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    De Giovanni, Mario; Warnett, Jason M.; Williams, Mark A.

    2015-12-15

    X-ray computed tomography (XCT) was used to characterise three dimensional internal structure of Al–SiC metal matrix composites. The alloy composite was prepared by casting method with the application of intensive shearing to uniformly disperse SiC particles in the matrix. Visualisation of SiC clusters as well as porosity distribution were evaluated and compared with non-shearing samples. Results showed that the average particle size as well as agglomerate size is smaller in sheared sample compared to conventional cast samples. Further, it was observed that the volume fraction of porosity was reduced by 50% compared to conventional casting, confirming that the intensive shearingmore » helps in deagglomeration of particle clusters and decrease in porosity of Al–SiC metal matrix composites. - Highlights: • XCT was used to visualise 3D internal structure of Al-SiC MMC. • Al-SiC MMC was prepared by casting with the application of intensive shearing. • SiC particles and porosity distribution were evaluated. • Results show shearing deagglomerates particle clusters and reduces porosity in MMC.« less

  11. Evaluation of immunization coverage by lot quality assurance sampling compared with 30-cluster sampling in a primary health centre in India.

    PubMed

    Singh, J; Jain, D C; Sharma, R S; Verghese, T

    1996-01-01

    The immunization coverage of infants, children and women residing in a primary health centre (PHC) area in Rajasthan was evaluated both by lot quality assurance sampling (LQAS) and by the 30-cluster sampling method recommended by WHO's Expanded Programme on Immunization (EPI). The LQAS survey was used to classify 27 mutually exclusive subunits of the population, defined as residents in health subcentre areas, on the basis of acceptable or unacceptable levels of immunization coverage among infants and their mothers. The LQAS results from the 27 subcentres were also combined to obtain an overall estimate of coverage for the entire population of the primary health centre, and these results were compared with the EPI cluster survey results. The LQAS survey did not identify any subcentre with a level of immunization among infants high enough to be classified as acceptable; only three subcentres were classified as having acceptable levels of tetanus toxoid (TT) coverage among women. The estimated overall coverage in the PHC population from the combined LQAS results showed that a quarter of the infants were immunized appropriately for their ages and that 46% of their mothers had been adequately immunized with TT. Although the age groups and the periods of time during which the children were immunized differed for the LQAS and EPI survey populations, the characteristics of the mothers were largely similar. About 57% (95% CI, 46-67) of them were found to be fully immunized with TT by 30-cluster sampling, compared with 46% (95% CI, 41-51) by stratified random sampling. The difference was not statistically significant. The field work to collect LQAS data took about three times longer, and cost 60% more than the EPI survey. The apparently homogeneous and low level of immunization coverage in the 27 subcentres makes this an impractical situation in which to apply LQAS, and the results obtained were therefore not particularly useful. However, if LQAS had been applied by local staff in an area with overall high coverage and population subunits with heterogeneous coverage, the method would have been less costly and should have produced useful results.

  12. Rheological Characterization and Cluster Classification of Iranian Commercial Foods, Drinks and Desserts to Recommend for Esophageal Dysphagia Diets.

    PubMed

    Zargaraan, Azizollaah; Omaraee, Yasaman; Rastmanesh, Reza; Taheri, Negin; Fadavi, Ghasem; Fadaei, Morteza; Mohammadifar, Mohammad Amin

    2013-12-01

    In the absence of dysphagia-oriented food products, rheological characterization of available food items is of importance for safe swallowing and adequate nutrient intake of dysphagic patients. In this way, introducing alternative items (with similar ease of swallow) is helpful to improve quality of life and nutritional intake of esophageal cancer dysphagia patients. The present study aimed at rheological characterization and cluster classification of potentially suitable foodstuffs marketed in Iran for their possible use in dysphagia diets. In this descriptive study, rheological data were obtained during January and February 2012 in Rheology Lab of National Nutrition and Food Technology Research Institute Tehran, Iran. Steady state and oscillatory shear parameters of 39 commercial samples were obtained using a Physica MCR 301 rheometer (Anton-Paar, GmbH, Graz, Austria). Matlab Fuzzy Logic Toolbox (R2012 a) was utilized for cluster classification of the samples. Using an extended list of rheological parameters and fuzzy logic methods, 39 commercial samples (drinks, main courses and desserts) were divided to 5 clusters and degree of membership to each cluster was stated by a number between 0 and 0.99. Considering apparent viscosity of foodstuffs as a single criterion for classification of dysphagia-oriented food products is shortcoming of current guidelines in dysphagia diets. Authors proposed to some revisions in classification of dysphagia-oriented food products and including more rheological parameters (especially, viscoelastic parameters) in the classification.

  13. Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data

    NASA Astrophysics Data System (ADS)

    Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.

    2014-12-01

    We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.

  14. Large-scale clustering measurements with photometric redshifts: comparing the dark matter haloes of X-ray AGN, star-forming and passive galaxies at z ≈ 1

    NASA Astrophysics Data System (ADS)

    Georgakakis, A.; Mountrichas, G.; Salvato, M.; Rosario, D.; Pérez-González, P. G.; Lutz, D.; Nandra, K.; Coil, A.; Cooper, M. C.; Newman, J. A.; Berta, S.; Magnelli, B.; Popesso, P.; Pozzi, F.

    2014-10-01

    We combine multi-wavelength data in the AEGIS-XD and C-COSMOS surveys to measure the typical dark matter halo mass of X-ray selected active galactic nuclei (AGN) [LX(2-10 keV) > 1042 erg s- 1] in comparison with far-infrared selected star-forming galaxies detected in the Herschel/PEP survey (PACS Evolutionary Probe; LIR > 1011 L⊙) and quiescent systems at z ≈ 1. We develop a novel method to measure the clustering of extragalactic populations that uses photometric redshift probability distribution functions in addition to any spectroscopy. This is advantageous in that all sources in the sample are used in the clustering analysis, not just the subset with secure spectroscopy. The method works best for large samples. The loss of accuracy because of the lack of spectroscopy is balanced by increasing the number of sources used to measure the clustering. We find that X-ray AGN, far-infrared selected star-forming galaxies and passive systems in the redshift interval 0.6 < z < 1.4 are found in haloes of similar mass, log MDMH/(M⊙ h-1) ≈ 13.0. We argue that this is because the galaxies in all three samples (AGN, star-forming, passive) have similar stellar mass distributions, approximated by the J-band luminosity. Therefore, all galaxies that can potentially host X-ray AGN, because they have stellar masses in the appropriate range, live in dark matter haloes of log MDMH/(M⊙ h-1) ≈ 13.0 independent of their star formation rates. This suggests that the stellar mass of X-ray AGN hosts is driving the observed clustering properties of this population. We also speculate that trends between AGN properties (e.g. luminosity, level of obscuration) and large-scale environment may be related to differences in the stellar mass of the host galaxies.

  15. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  16. Numerical taxonomy and ecology of petroleum-degrading bacteria.

    PubMed Central

    Austin, B; Calomiris, J J; Walker, J D; Colwell, R R

    1977-01-01

    A total of 99 strains of petroleum-degrading bacteria isolated from Chesapeake Bay water and sediment were identified by using numerical taxonomy procedures. The isolates, together with 33 reference cultures, were examined for 48 biochemical, cultural, morphological, and physiological characters. The data were analyzed by computer, using both the simple matching and the Jaccard coefficients. Clustering was achieved by the unweighted average linkage method. From the sorted similarity matrix and dendrogram, 14 phenetic groups, comprising 85 of the petroleum-degrading bacteria, were defined at the 80 to 85% similarity level. These groups were identified as actinomycetes (mycelial forms, four clusters), coryneforms, Enterobacteriaceae, Klebsiella aerogenes, Micrococcus spp. (two clusters), Nocardia species (two clusters), Pseudomonas spp. (two clusters), and Sphaerotilus natans. It is concluded that the degradation of petroleum is accomplished by a diverse range of bacterial taxa, some of which were isolated only at given sampling stations and, more specifically, from sediment collected at a given station. PMID:889329

  17. Galaxy clustering dependence on the [O II] emission line luminosity in the local Universe

    NASA Astrophysics Data System (ADS)

    Favole, Ginevra; Rodríguez-Torres, Sergio A.; Comparat, Johan; Prada, Francisco; Guo, Hong; Klypin, Anatoly; Montero-Dorta, Antonio D.

    2017-11-01

    We study the galaxy clustering dependence on the [O II] emission line luminosity in the SDSS DR7 Main galaxy sample at mean redshift z ∼ 0.1. We select volume-limited samples of galaxies with different [O II] luminosity thresholds and measure their projected, monopole and quadrupole two-point correlation functions. We model these observations using the 1 h-1 Gpc MultiDark-Planck cosmological simulation and generate light cones with the SUrvey GenerAtoR algorithm. To interpret our results, we adopt a modified (Sub)Halo Abundance Matching scheme, accounting for the stellar mass incompleteness of the emission line galaxies. The satellite fraction constitutes an extra parameter in this model and allows to optimize the clustering fit on both small and intermediate scales (i.e. rp ≲ 30 h-1 Mpc), with no need of any velocity bias correction. We find that, in the local Universe, the [O II] luminosity correlates with all the clustering statistics explored and with the galaxy bias. This latter quantity correlates more strongly with the SDSS r-band magnitude than [O II] luminosity. In conclusion, we propose a straightforward method to produce reliable clustering models, entirely built on the simulation products, which provides robust predictions of the typical ELG host halo masses and satellite fraction values. The SDSS galaxy data, MultiDark mock catalogues and clustering results are made publicly available.

  18. Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

    PubMed

    Mwangi, Benson; Soares, Jair C; Hasan, Khader M

    2014-10-30

    Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Familial clustering of overweight and obesity among schoolchildren in northern China

    PubMed Central

    Li, Zengning; Luo, Bin; Du, Limei; Hu, Huanyu; Xie, Ying

    2014-01-01

    Background: We aimed to study the prevalence of overweight and obesity and to assess its familial clustering among schoolchildren in northern China. Methods: A cross-sectional study was conducted on 95,292 schoolchildren in northern China to investigate the prevalence of overweight and obesity. A group of overweight and obese children (n = 450) was selected using a cluster sampling method. Answers from a questionnaire on their and their families’ nutrition and behaviors were recorded and analyzed statistically. Results: The prevalence of overweight and obesity in schoolchildren was 27.4% and 13.2%, respectively. The prevalence of overweight and obesity were significantly higher in boys than in girls. The prevalence of familial clustering of overweight and obesity was 75.3% and 20.3%, respectively. The prevalence of overweight in first-generation (parents) and second-generation (grandparents) relatives was 54.6% and 53.1%, respectively. There was a linear trend toward correlation between age and the rates of overweight and obesity. The familial clustering of obesity with family income reached statistical significance. Conclusion: The prevalence of overweight and obesity was extremely high, especially among boys and their fathers. Evidence of familial clustering of overweight and obesity among schoolchildren and their parental family members in northern China is emerging. PMID:25664106

  20. Confidence intervals for a difference between lognormal means in cluster randomization trials.

    PubMed

    Poirier, Julia; Zou, G Y; Koval, John

    2017-04-01

    Cluster randomization trials, in which intact social units are randomized to different interventions, have become popular in the last 25 years. Outcomes from these trials in many cases are positively skewed, following approximately lognormal distributions. When inference is focused on the difference between treatment arm arithmetic means, existent confidence interval procedures either make restricting assumptions or are complex to implement. We approach this problem by assuming log-transformed outcomes from each treatment arm follow a one-way random effects model. The treatment arm means are functions of multiple parameters for which separate confidence intervals are readily available, suggesting that the method of variance estimates recovery may be applied to obtain closed-form confidence intervals. A simulation study showed that this simple approach performs well in small sample sizes in terms of empirical coverage, relatively balanced tail errors, and interval widths as compared to existing methods. The methods are illustrated using data arising from a cluster randomization trial investigating a critical pathway for the treatment of community acquired pneumonia.

  1. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

    PubMed

    Feltus, F Alex; Ficklin, Stephen P; Gibson, Scott M; Smith, Melissa C

    2013-06-05

    In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.

  2. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

    PubMed Central

    2013-01-01

    Background In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. Results A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Conclusions Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired. PMID:23738693

  3. A Bayesian hierarchical model for mortality data from cluster-sampling household surveys in humanitarian crises.

    PubMed

    Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko

    2018-05-31

    The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.

  4. Super resolution reconstruction of infrared images based on classified dictionary learning

    NASA Astrophysics Data System (ADS)

    Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng

    2018-05-01

    Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.

  5. Preliminary Cluster Analysis For Several Representatives Of Genus Kerivoula (Chiroptera: Vespertilionidae) in Borneo

    NASA Astrophysics Data System (ADS)

    Hasan, Noor Haliza; Abdullah, M. T.

    2008-01-01

    The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.

  6. Clustering consumers based on trust, confidence and giving behaviour: data-driven model building for charitable involvement in the Australian not-for-profit sector.

    PubMed

    de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo

    2015-01-01

    Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not-for-profit organisations adopt these strategies, they will be more successful in today's competitive environment.

  7. Clustering Consumers Based on Trust, Confidence and Giving Behaviour: Data-Driven Model Building for Charitable Involvement in the Australian Not-For-Profit Sector

    PubMed Central

    de Vries, Natalie Jane; Reis, Rodrigo; Moscato, Pablo

    2015-01-01

    Organisations in the Not-for-Profit and charity sector face increasing competition to win time, money and efforts from a common donor base. Consequently, these organisations need to be more proactive than ever. The increased level of communications between individuals and organisations today, heightens the need for investigating the drivers of charitable giving and understanding the various consumer groups, or donor segments, within a population. It is contended that `trust' is the cornerstone of the not-for-profit sector's survival, making it an inevitable topic for research in this context. It has become imperative for charities and not-for-profit organisations to adopt for-profit's research, marketing and targeting strategies. This study provides the not-for-profit sector with an easily-interpretable segmentation method based on a novel unsupervised clustering technique (MST-kNN) followed by a feature saliency method (the CM1 score). A sample of 1,562 respondents from a survey conducted by the Australian Charities and Not-for-profits Commission is analysed to reveal donor segments. Each cluster's most salient features are identified using the CM1 score. Furthermore, symbolic regression modelling is employed to find cluster-specific models to predict `low' or `high' involvement in clusters. The MST-kNN method found seven clusters. Based on their salient features they were labelled as: the `non-institutionalist charities supporters', the `resource allocation critics', the `information-seeking financial sceptics', the `non-questioning charity supporters', the `non-trusting sceptics', the `charity management believers' and the `institutionalist charity believers'. Each cluster exhibits their own characteristics as well as different drivers of `involvement'. The method in this study provides the not-for-profit sector with a guideline for clustering, segmenting, understanding and potentially targeting their donor base better. If charities and not-for-profit organisations adopt these strategies, they will be more successful in today's competitive environment. PMID:25849547

  8. Extending the Compositional Range of Nanocasting in the Oxozirconium Cluster-Based Metal–Organic Framework NU-1000—A Comparative Structural Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Wenyang; Wang, Zhao; Malonzo, Camille D.

    The process of nanocasting in metal-organic frameworks (MOFs) is a versatile approach to modify these porous materials by introducing supporting scaffolds. The nanocast scaffolds can stabilize metal-oxo clusters in MOFs at high temperatures and modulate their chemical environments. Here we demonstrate a range of nanocasting approaches in the MOF NU-1000, which contains hexanuclear oxozirconium clusters (denoted as Zr6 clusters) that are suitable for modification with other metals. We developed methods for introducing SiO2, TiO2, polymeric, and carbon scaffolds into the NU-1000 structure. The responses of NU-1000 towards different scaffold precursors were studied, including the effects on morphology, precursor distribution, andmore » porosity after nanocasting. Upon removal of organic linkers in the MOF by calcination/pyrolysis at 500 °C or above, the Zr6 clusters remained accessible and maintained their Lewis acidity in SiO2 nanocast samples, whereas additional treatment was necessary for Zr6 clusters to become accessible in carbon nanocast samples. Aggregation of Zr6 clusters was largely prevented with SiO2 or carbon scaffolds even after thermal treatment at 500 °C or above. In the case of titania nanocasting, NU- 1000 crystals underwent a pseudomorphic transformation, in which Zr6 clusters reacted with titania to form small oxaggregates of a Zr/Ti mixed oxide with a local structure resembling that of ZrTi2O6. The ability to maintain high densities of discrete Lewis acidic Zr6 clusters on SiO2 or carbon supports at high temperatures provides a starting point for designing new thermally stable catalysts.« less

  9. Reduction of Racial Disparities in Prostate Cancer

    DTIC Science & Technology

    2005-12-01

    erectile dysfunction , and female sexual dysfunction ). Wherever possible, the questions and scales employed on BACH were selected from published...Methods. A racially and ethnically diverse community-based survey of adults aged 30-79 years in Boston, Massachusetts. The BACH survey has...recruited adults in three racial/ethnic groups: Latino, African American, and White using a stratified cluster sample. The target sample size is equally

  10. Relationship of Pupils' Spatial Perception and Ability with Their Performance in Geography

    ERIC Educational Resources Information Center

    Likouri, Anna-Aikaterini; Klonari, Aikaterini; Flouris, George

    2017-01-01

    The aim of this study was to investigate the correlation between pupils' spatial perception and abilities and their performance in geography. The sample was 600 6th-grade pupils from various areas of Greece selected by the cluster sampling method. The study results showed that: a) the vast majority of pupils showed low spatial ability; b) there…

  11. The Atacama Cosmology Telescope: Physical Properties and Purity of a Galaxy Cluster Sample Selected Via the Sunyaev-Zel'Dovich Effect

    NASA Technical Reports Server (NTRS)

    Menanteau, Felipe; Gonzalez, Jorge; Juin, Jean-Baptiste; Marriage, Tobias; Reese, Erik D.; Acquaviva, Viviana; Aguirre, Paula; Appel, John Willam; Baker, Andrew J.; Barrientos, L. Felipe; hide

    2010-01-01

    We present optical and X-ray properties for the first confirmed galaxy cluster sample selected by the Sunyaev-Zel'dovich Effect from 148 GHz maps over 455 square degrees of sky made with the Atacama Cosmology Telescope. These maps. coupled with multi-band imaging on 4-meter-class optical telescopes, have yielded a sample of 23 galaxy clusters with redshifts between 0.118 and 1.066. Of these 23 clusters, 10 are newly discovered. The selection of this sample is approximately mass limited and essentially independent of redshift. We provide optical positions, images, redshifts and X-ray fluxes and luminosities for the full sample, and X-ray temperatures of an important subset. The mass limit of the full sample is around 8.0 x 10(exp 14) Stellar Mass. with a number distribution that peaks around a redshift of 0.4. For the 10 highest significance SZE-selected cluster candidates, all of which are optically confirmed, the mass threshold is 1 x 10(exp 15) Stellar Mass and the redshift range is 0.167 to 1.066. Archival observations from Chandra, XMM-Newton. and ROSAT provide X-ray luminosities and temperatures that are broadly consistent with this mass threshold. Our optical follow-up procedure also allowed us to assess the purity of the ACT cluster sample. Eighty (one hundred) percent of the 148 GHz candidates with signal-to-noise ratios greater than 5.1 (5.7) are confirmed as massive clusters. The reported sample represents one of the largest SZE-selected sample of massive clusters over all redshifts within a cosmologically-significant survey volume, which will enable cosmological studies as well as future studies on the evolution, morphology, and stellar populations in the most massive clusters in the Universe.

  12. Heavy Metal Contamination in Groundwater around Industrial Estate vs Residential Areas in Coimbatore, India

    PubMed Central

    Mohankumar, K.; Rao, N. Prasada

    2016-01-01

    Introduction Water is the vital resource, necessary for all aspects of human and ecosystem survival and health. Depending on the quality, bore water may be used for human consumption, irrigation purposes and livestock watering. The quality of bore water can vary widely depending on the quality of ground water that is its source. Pollutants are being added to the ground water system through human and natural processes. Solid waste from industrial units is being dumped near the factories, which react with percolating rainwater and reaches the ground water. The percolating water picks up a large number of heavy metals and reaches the aquifer system and contaminates the ground water. The usage of the contaminated bore water causes the diseases. Mercury, Arsenic and Cadmium are used or released by many industries. Aim This study was conducted to investigate the pollution of bore water in the industrial region (Kurichi Industrial Cluster) of Coimbatore, in the state of Tamilnadu, India. Materials and Methods Four samples were taken from residential areas around Kurichi Industrial Cluster and analysed to find the concentrations of Mercury, Arsenic and Cadmium. Four more samples were taken from other residential regions far from the industrial estate and served as control. Samples were analysed using Atomic absorption spectrophotometry method. Results We found that the ground water of the areas surrounding the industrial cluster does not contain significant amount of those metals. Instead, Heavy metal contamination of ground water were observed in some residential areas of coimbatore. Conclusion The regulatory measures to contain and prevent ground water contamination by industries undertaken by Tamilnadu pollution control board may have lead to absence of heavy metal contamination in Kurichi Industrial cluster, Coimbatore, India. PMID:27190788

  13. Serological Markers of Sand Fly Exposure to Evaluate Insecticidal Nets against Visceral Leishmaniasis in India and Nepal: A Cluster-Randomized Trial

    PubMed Central

    Gidwani, Kamlesh; Picado, Albert; Rijal, Suman; Singh, Shri Prakash; Roy, Lalita; Volfova, Vera; Andersen, Elisabeth Wreford; Uranw, Surendra; Ostyn, Bart; Sudarshan, Medhavi; Chakravarty, Jaya; Volf, Petr; Sundar, Shyam; Boelaert, Marleen; Rogers, Matthew Edward

    2011-01-01

    Background Visceral leishmaniasis is the world' second largest vector-borne parasitic killer and a neglected tropical disease, prevalent in poor communities. Long-lasting insecticidal nets (LNs) are a low cost proven vector intervention method for malaria control; however, their effectiveness against visceral leishmaniasis (VL) is unknown. This study quantified the effect of LNs on exposure to the sand fly vector of VL in India and Nepal during a two year community intervention trial. Methods As part of a paired-cluster randomized controlled clinical trial in VL-endemic regions of India and Nepal we tested the effect of LNs on sand fly biting by measuring the antibody response of subjects to the saliva of Leishmania donovani vector Phlebotomus argentipes and the sympatric (non-vector) Phlebotomus papatasi. Fifteen to 20 individuals above 15 years of age from 26 VL endemic clusters were asked to provide a blood sample at baseline, 12 and 24 months post-intervention. Results A total of 305 individuals were included in the study, 68 participants provided two blood samples and 237 gave three samples. A random effect linear regression model showed that cluster-wide distribution of LNs reduced exposure to P. argentipes by 12% at 12 months (effect 0.88; 95% CI 0.83–0.94) and 9% at 24 months (effect 0.91; 95% CI 0.80–1.02) in the intervention group compared to control adjusting for baseline values and pair. Similar results were obtained for P. papatasi. Conclusions This trial provides evidence that LNs have a limited effect on sand fly exposure in VL endemic communities in India and Nepal and supports the use of sand fly saliva antibodies as a marker to evaluate vector control interventions. PMID:21931871

  14. A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

    PubMed

    Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold

    2014-12-01

    In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

  15. Sampling Methods in Cardiovascular Nursing Research: An Overview.

    PubMed

    Kandola, Damanpreet; Banner, Davina; O'Keefe-McCarthy, Sheila; Jassal, Debbie

    2014-01-01

    Cardiovascular nursing research covers a wide array of topics from health services to psychosocial patient experiences. The selection of specific participant samples is an important part of the research design and process. The sampling strategy employed is of utmost importance to ensure that a representative sample of participants is chosen. There are two main categories of sampling methods: probability and non-probability. Probability sampling is the random selection of elements from the population, where each element of the population has an equal and independent chance of being included in the sample. There are five main types of probability sampling including simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. Non-probability sampling methods are those in which elements are chosen through non-random methods for inclusion into the research study and include convenience sampling, purposive sampling, and snowball sampling. Each approach offers distinct advantages and disadvantages and must be considered critically. In this research column, we provide an introduction to these key sampling techniques and draw on examples from the cardiovascular research. Understanding the differences in sampling techniques may aid nurses in effective appraisal of research literature and provide a reference pointfor nurses who engage in cardiovascular research.

  16. Mapping the hot gas temperature in galaxy clusters using X-ray and Sunyaev-Zel'dovich imaging

    NASA Astrophysics Data System (ADS)

    Adam, R.; Arnaud, M.; Bartalucci, I.; Ade, P.; André, P.; Beelen, A.; Benoît, A.; Bideaud, A.; Billot, N.; Bourdin, H.; Bourrion, O.; Calvo, M.; Catalano, A.; Coiffard, G.; Comis, B.; D'Addabbo, A.; Désert, F.-X.; Doyle, S.; Ferrari, C.; Goupy, J.; Kramer, C.; Lagache, G.; Leclercq, S.; Macías-Pérez, J.-F.; Maurogordato, S.; Mauskopf, P.; Mayet, F.; Monfardini, A.; Pajot, F.; Pascale, E.; Perotto, L.; Pisano, G.; Pointecouteau, E.; Ponthieu, N.; Pratt, G. W.; Revéret, V.; Ritacco, A.; Rodriguez, L.; Romero, C.; Ruppin, F.; Schuster, K.; Sievers, A.; Triqueneaux, S.; Tucker, C.; Zylka, R.

    2017-10-01

    We propose a method to map the temperature distribution of the hot gas in galaxy clusters that uses resolved images of the thermal Sunyaev-Zel'dovich (tSZ) effect in combination with X-ray data. Application to images from the New IRAM KIDs Array (NIKA) and XMM-Newton allows us to measure and determine the spatial distribution of the gas temperature in the merging cluster MACS J0717.5+3745, at z = 0.55. Despite the complexity of the target object, we find a good morphological agreement between the temperature maps derived from X-ray spectroscopy only - using XMM-Newton (TXMM) and Chandra (TCXO) - and the new gas-mass-weighted tSZ+X-ray imaging method (TSZX). We correlate the temperatures from tSZ+X-ray imaging and those from X-ray spectroscopy alone and find that TSZX is higher than TXMM and lower than TCXO by 10% in both cases. Our results are limited by uncertainties in the geometry of the cluster gas, contamination from kinetic SZ ( 10%), and the absolute calibration of the tSZ map (7%). Investigation using a larger sample of clusters would help minimise these effects.

  17. Comparative study of local atomic structures in Zr2CuxNi1-x (x = 0, 0.5, 1) metallic glasses

    NASA Astrophysics Data System (ADS)

    Huang, Yuxiang; Huang, Li; Wang, C. Z.; Kramer, M. J.; Ho, K. M.

    2015-11-01

    Extensive analysis has been performed to understand the key structural motifs accounting for the difference in glass forming ability in the Zr-Cu and Zr-Ni binary alloy systems. Here, the reliable atomic structure models of Zr2CuxNi1-x (x = 0, 0.5, 1) are constructed using the combination of X-ray diffraction experiments, ab initio molecular dynamics simulations and a constrained reverse Monte Carlo method. We observe a systematic variation of the interatomic distance of different atomic pairs with respect to the alloy composition. The ideal icosahedral content in all samples is limited, despite the high content of five-fold symmetry motifs. We also demonstrate that the population of Z-clusters in Zr2Cu glass is much higher than that in the Zr2Ni and Zr2Cu0.5Ni0.5 samples. And Z12 ⟨0, 0, 12, 0⟩ Voronoi polyhedra clusters prefer to form around Cu atoms, while Ni-centered clusters are more like Z11 ⟨0, 2, 8, 1⟩ clusters, which is less energetically stable compared to Z12 clusters. These two different structural properties may account for the higher glass forming ability of Zr2Cu alloy than that of Zr2Ni alloy.

  18. Colonoscopy screening for colorectal cancer: the outcomes of two recruitment methods.

    PubMed

    Corbett, Mike; Chambers, Sharon L; Shadbolt, Bruce; Hillman, Lybus C; Taupin, Doug

    2004-10-18

    To determine the response to colorectal cancer (CRC) screening by colonoscopy, through direct invitation or through invitation by general practitioners. Two-way comparison of randomised population sampling versus cluster sampling of a representative general practice population in the Australian Capital Territory, May 2002 to January 2004. Invitation to screen, assessment for eligibility, interview, and colonoscopy. 881 subjects aged 55-74 years were invited to screen: 520 from the electoral roll (ER) sample and 361 from the general practice (GP) cluster sample. Response rate, participation rate, and rate of adenomatous polyps in the screened group. Participation was similar in the ER arm (35.1%; 95% CI, 30.2%-40.3%) and the GP arm (40.1%; 95% CI, 29.2%-51.0%) after correcting for ineligibility, which was higher in the ER arm. Superior eligibility in the GP arm was offset by the labour of manual record review. Response rates after two invitations were similar for the two groups (ER arm: 78.8%; 95% CI, 75.1%-82.1%; GP arm: 81.7%; 95% CI, 73.8%-89.6%). Overall, 53.4% ineligibility arose from having a colonoscopy in the past 10 years (ER arm, 98/178; GP arm, 42/84). Of 231 colonoscopies performed, 229 were complete, with 32% of subjects screened having adenomatous polyps. Colonoscopy-based CRC screening yields similar response and participation rates with either random population sampling or general practice cluster sampling, with population sampling through the electoral roll providing greater ease of recruitment.

  19. Distributions of Gas and Galaxies from Galaxy Clusters to Larger Scales

    NASA Astrophysics Data System (ADS)

    Patej, Anna

    2017-01-01

    We address the distributions of gas and galaxies on three scales: the outskirts of galaxy clusters, the clustering of galaxies on large scales, and the extremes of the galaxy distribution. In the outskirts of galaxy clusters, long-standing analytical models of structure formation and recent simulations predict the existence of density jumps in the gas and dark matter profiles. We use these features to derive models for the gas density profile, obtaining a simple fiducial model that is in agreement with both observations of cluster interiors and simulations of the outskirts. We next consider the galaxy density profiles of clusters; under the assumption that the galaxies in cluster outskirts follow similar collisionless dynamics as the dark matter, their distribution should show a steep jump as well. We examine the profiles of a low-redshift sample of clusters and groups, finding evidence for the jump in some of these clusters. Moving to larger scales where massive galaxies of different types are expected to trace the same large-scale structure, we present a test of this prediction by measuring the clustering of red and blue galaxies at z 0.6, finding low stochasticity between the two populations. These results address a key source of systematic uncertainty - understanding how target populations of galaxies trace large-scale structure - in galaxy redshift surveys. Such surveys use baryon acoustic oscillations (BAO) as a cosmological probe, but are limited by the expense of obtaining sufficiently dense spectroscopy. With the intention of leveraging upcoming deep imaging data, we develop a new method of detecting the BAO in sparse spectroscopic samples via cross-correlation with a dense photometric catalog. This method will permit the extension of BAO measurements to higher redshifts than possible with the existing spectroscopy alone. Lastly, we connect galaxies near and far: the Local Group dwarfs and the high redshift galaxies observed by Hubble and Spitzer. We examine how the local dwarfs may have appeared in the past and compare their properties to the detection limits of the upcoming James Webb Space Telescope (JWST), finding that JWST should be able to detect galaxies similar to the progenitors of a few of the brightest of the local galaxies, revealing a hitherto unobserved population of galaxies at high redshifts.

  20. The Assessment of the Perception of the Academic Self Efficacy of Turkish Education Graduate Students

    ERIC Educational Resources Information Center

    Gocer, Ali

    2013-01-01

    The purpose of this research is to determine the perception of the academic self efficacy of Turkish Education graduate students. This study applied qualitative research approach and interview method. Master's students of Erciyes University, Institute of Education Science were chosen as a sample for the purpose, using clustering method. In this…

  1. Re-estimating sample size in cluster randomised trials with active recruitment within clusters.

    PubMed

    van Schie, S; Moerbeek, M

    2014-08-30

    Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.

  2. A PRIOR EVALUATION OF TWO-STAGE CLUSTER SAMPLING FOR ACCURACY ASSESSMENT OF LARGE-AREA LAND-COVER MAPS

    EPA Science Inventory

    Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, withi...

  3. Micro-Raman spectroscopy of natural and synthetic indigo samples.

    PubMed

    Vandenabeele, Peter; Moens, Luc

    2003-02-01

    In this work indigo samples from three different sources are studied by using Raman spectroscopy: the synthetic pigment and pigments from the woad (Isatis tinctoria) and the indigo plant (Indigofera tinctoria). 21 samples were obtained from 8 suppliers; for each sample 5 Raman spectra were recorded and used for further chemometrical analysis. Principal components analysis (PCA) was performed as data reduction method before applying hierarchical cluster analysis. Linear discriminant analysis (LDA) was implemented as a non-hierarchical supervised pattern recognition method to build a classification model. In order to avoid broad-shaped interferences from the fluorescence background, the influence of 1st and 2nd derivatives on the classification was studied by using cross-validation. Although chemically identical, it is shown that Raman spectroscopy in combination with suitable chemometric methods has the potential to discriminate between synthetic and natural indigo samples.

  4. Cluster Stability Estimation Based on a Minimal Spanning Trees Approach

    NASA Astrophysics Data System (ADS)

    Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora

    2009-08-01

    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.

  5. 2-Way k-Means as a Model for Microbiome Samples.

    PubMed

    Jackson, Weston J; Agarwal, Ipsita; Pe'er, Itsik

    2017-01-01

    Motivation . Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k -means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project.

  6. 2-Way k-Means as a Model for Microbiome Samples

    PubMed Central

    2017-01-01

    Motivation. Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k-means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project. PMID:29177026

  7. X-Ray Morphological Analysis of the Planck ESZ Clusters

    NASA Astrophysics Data System (ADS)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev-Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev-Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  8. X-Ray Morphological Analysis of the Planck ESZ Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper wemore » determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.« less

  9. Sampling procedures for inventory of commercial volume tree species in Amazon Forest.

    PubMed

    Netto, Sylvio P; Pelissari, Allan L; Cysneiros, Vinicius C; Bonazza, Marcelo; Sanquetta, Carlos R

    2017-01-01

    The spatial distribution of tropical tree species can affect the consistency of the estimators in commercial forest inventories, therefore, appropriate sampling procedures are required to survey species with different spatial patterns in the Amazon Forest. For this, the present study aims to evaluate the conventional sampling procedures and introduce the adaptive cluster sampling for volumetric inventories of Amazonian tree species, considering the hypotheses that the density, the spatial distribution and the zero-plots affect the consistency of the estimators, and that the adaptive cluster sampling allows to obtain more accurate volumetric estimation. We use data from a census carried out in Jamari National Forest, Brazil, where trees with diameters equal to or higher than 40 cm were measured in 1,355 plots. Species with different spatial patterns were selected and sampled with simple random sampling, systematic sampling, linear cluster sampling and adaptive cluster sampling, whereby the accuracy of the volumetric estimation and presence of zero-plots were evaluated. The sampling procedures applied to species were affected by the low density of trees and the large number of zero-plots, wherein the adaptive clusters allowed concentrating the sampling effort in plots with trees and, thus, agglutinating more representative samples to estimate the commercial volume.

  10. Adaptive Cluster Sampling for Forest Inventories

    Treesearch

    Francis A. Roesch

    1993-01-01

    Adaptive cluster sampling is shown to be a viable alternative for sampling forests when there are rare characteristics of the forest trees which are of interest and occur on clustered trees. The ideas of recent work in Thompson (1990) have been extended to the case in which the initial sample is selected with unequal probabilities. An example is given in which the...

  11. Just the right age: well-clustered exposure ages from a global glacial 10Be compilation

    NASA Astrophysics Data System (ADS)

    Heyman, Jakob; Margold, Martin

    2017-04-01

    Cosmogenic exposure dating has been used extensively for defining glacial chronologies, both in ice sheet and alpine settings, and the global set of published ages today reaches well beyond 10,000 samples. Over the last few years, a number of important developments have improved the measurements (with well-defined AMS standards) and exposure age calculations (with updated data and methods for calculating production rates), in the best case enabling high precision dating of past glacial events. A remaining problem, however, is the fact that a large portion of all dated samples have been affected by prior and/or incomplete exposure, yielding erroneous exposure ages under the standard assumptions. One way to address this issue is to only use exposure ages that can be confidently considered as unaffected by prior/incomplete exposure, such as groups of samples with statistically identical ages. Here we use objective statistical criteria to identify groups of well-clustered exposure ages from the global glacial "expage" 10Be compilation. Out of ˜1700 groups with at least 3 individual samples ˜30% are well-clustered, increasing to ˜45% if allowing outlier rejection of a maximum of 1/3 of the samples (still requiring a minimum of 3 well-clustered ages). The dataset of well-clustered ages is heavily dominated by ages <30 ka, showing that well-defined cosmogenic chronologies primarily exist for the last glaciation. We observe a large-scale global synchronicity in the timing of the last deglaciation from ˜20 to 10 ka. There is also a general correlation between the timing of deglaciation and latitude (or size of the individual ice mass), with earlier deglaciation in lower latitudes and later deglaciation towards the poles. Grouping the data into regions and comparing with available paleoclimate data we can start to untangle regional differences in the last deglaciation and the climate events controlling the ice mass loss. The extensive dataset and the statistical analysis enables an unprecedented global view on the last deglaciation.

  12. Buccal swabbing as a noninvasive method to determine bacterial, archaeal, and eukaryotic microbial community structures in the rumen.

    PubMed

    Kittelmann, Sandra; Kirk, Michelle R; Jonker, Arjan; McCulloch, Alan; Janssen, Peter H

    2015-11-01

    Analysis of rumen microbial community structure based on small-subunit rRNA marker genes in metagenomic DNA samples provides important insights into the dominant taxa present in the rumen and allows assessment of community differences between individuals or in response to treatments applied to ruminants. However, natural animal-to-animal variation in rumen microbial community composition can limit the power of a study considerably, especially when only subtle differences are expected between treatment groups. Thus, trials with large numbers of animals may be necessary to overcome this variation. Because ruminants pass large amounts of rumen material to their oral cavities when they chew their cud, oral samples may contain good representations of the rumen microbiota and be useful in lieu of rumen samples to study rumen microbial communities. We compared bacterial, archaeal, and eukaryotic community structures in DNAs extracted from buccal swabs to those in DNAs from samples collected directly from the rumen by use of a stomach tube for sheep on four different diets. After bioinformatic depletion of potential oral taxa from libraries of samples collected via buccal swabs, bacterial communities showed significant clustering by diet (R = 0.37; analysis of similarity [ANOSIM]) rather than by sampling method (R = 0.07). Archaeal, ciliate protozoal, and anaerobic fungal communities also showed significant clustering by diet rather than by sampling method, even without adjustment for potentially orally associated microorganisms. These findings indicate that buccal swabs may in future allow quick and noninvasive sampling for analysis of rumen microbial communities in large numbers of ruminants. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  13. [Study on discrimination of varieties of fire resistive coating for steel structure based on near-infrared spectroscopy].

    PubMed

    Xue, Gang; Song, Wen-qi; Li, Shu-chao

    2015-01-01

    In order to achieve the rapid identification of fire resistive coating for steel structure of different brands in circulating, a new method for the fast discrimination of varieties of fire resistive coating for steel structure by means of near infrared spectroscopy was proposed. The raster scanning near infrared spectroscopy instrument and near infrared diffuse reflectance spectroscopy were applied to collect the spectral curve of different brands of fire resistive coating for steel structure and the spectral data were preprocessed with standard normal variate transformation(standard normal variate transformation, SNV) and Norris second derivative. The principal component analysis (principal component analysis, PCA)was used to near infrared spectra for cluster analysis. The analysis results showed that the cumulate reliabilities of PC1 to PC5 were 99. 791%. The 3-dimentional plot was drawn with the scores of PC1, PC2 and PC3 X 10, which appeared to provide the best clustering of the varieties of fire resistive coating for steel structure. A total of 150 fire resistive coating samples were divided into calibration set and validation set randomly, the calibration set had 125 samples with 25 samples of each variety, and the validation set had 25 samples with 5 samples of each variety. According to the principal component scores of unknown samples, Mahalanobis distance values between each variety and unknown samples were calculated to realize the discrimination of different varieties. The qualitative analysis model for external verification of unknown samples is a 10% recognition ration. The results demonstrated that this identification method can be used as a rapid, accurate method to identify the classification of fire resistive coating for steel structure and provide technical reference for market regulation.

  14. Methamphetamine injecting is associated with phylogenetic clustering of hepatitis C virus infection among street-involved youth in Vancouver, Canada*

    PubMed Central

    Cunningham, Evan; Jacka, Brendan; DeBeck, Kora; Applegate, Tanya A; Harrigan, P. Richard; Krajden, Mel; Marshall, Brandon DL; Montaner, Julio; Lima, Viviane Dias; Olmstead, Andrea; Milloy, M-J; Wood, Evan; Grebely, Jason

    2015-01-01

    Background Among prospective cohorts of people who inject drugs (PWID), phylogenetic clustering of HCV infection has been observed. However, the majority of studies have included older PWID, representing distant transmission events. The aim of this study was to investigate phylogenetic clustering of HCV infection among a cohort of street-involved youth. Methods Data were derived from a prospective cohort of street-involved youth aged 14–26 recruited between 2005 and 2012 in Vancouver, Canada (At Risk Youth Study, ARYS). HCV RNA testing and sequencing (Core-E2) were performed on HCV positive participants. Phylogenetic trees were inferred using maximum likelihood methods and clusters were identified using ClusterPicker (Core-E2 without HVR1, 90% bootstrap threshold, 0.05 genetic distance threshold). Results Among 945 individuals enrolled in ARYS, 16% (n=149, 100% recent injectors) were HCV antibody positive at baseline interview (n=86) or seroconverted during follow-up (n=63). Among HCV antibody positive participants with available samples (n=131), 75% (n=98) had detectable HCV RNA and 66% (n=65, mean age 23, 58% with recent methamphetamine injection, 31% female, 3% HIV+) had available Core-E2 sequences. Of those with Core-E2 sequence, 14% (n=9) were in a cluster (one cluster of three) or pair (two pairs), with all reporting recent methamphetamine injection. Recent methamphetamine injection was associated with membership in a cluster or pair (P=0.009). Conclusion In this study of street-involved youth with HCV infection and recent injecting, 14% demonstrated phylogenetic clustering. Phylogenetic clustering was associated with recent methamphetamine injection, suggesting that methamphetamine drug injection may play an important role in networks of HCV transmission. PMID:25977204

  15. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    PubMed

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  16. Quality Evaluation of Potentilla fruticosa L. by High Performance Liquid Chromatography Fingerprinting Associated with Chemometric Methods.

    PubMed

    Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue

    2016-01-01

    The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines.

  17. Quality Evaluation of Potentilla fruticosa L. by High Performance Liquid Chromatography Fingerprinting Associated with Chemometric Methods

    PubMed Central

    Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue

    2016-01-01

    The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines. PMID:26890416

  18. Searching for the 3.5 keV Line in the Stacked Suzaku Observations of Galaxy Clusters

    NASA Technical Reports Server (NTRS)

    Bulbul, Esra; Markevitch, Maxim; Foster, Adam; Miller, Eric; Bautz, Mark; Lowenstein, Mike; Randall, Scott W.; Smith, Randall K.

    2016-01-01

    We perform a detailed study of the stacked Suzaku observations of 47 galaxy clusters, spanning a redshift range of 0.01-0.45, to search for the unidentified 3.5 keV line. This sample provides an independent test for the previously detected line. We detect a 2sigma-significant spectral feature at 3.5 keV in the spectrum of the full sample. When the sample is divided into two subsamples (cool-core and non-cool core clusters), the cool-core subsample shows no statistically significant positive residuals at the line energy. A very weak (approx. 2sigma confidence) spectral feature at 3.5 keV is permitted by the data from the non-cool-core clusters sample. The upper limit on a neutrino decay mixing angle of sin(sup 2)(2theta) = 6.1 x 10(exp -11) from the full Suzaku sample is consistent with the previous detections in the stacked XMM-Newton sample of galaxy clusters (which had a higher statistical sensitivity to faint lines), M31, and Galactic center, at a 90% confidence level. However, the constraint from the present sample, which does not include the Perseus cluster, is in tension with previously reported line flux observed in the core of the Perseus cluster with XMM-Newton and Suzaku.

  19. Adaptive sampling in research on risk-related behaviors.

    PubMed

    Thompson, Steven K; Collins, Linda M

    2002-11-01

    This article introduces adaptive sampling designs to substance use researchers. Adaptive sampling is particularly useful when the population of interest is rare, unevenly distributed, hidden, or hard to reach. Examples of such populations are injection drug users, individuals at high risk for HIV/AIDS, and young adolescents who are nicotine dependent. In conventional sampling, the sampling design is based entirely on a priori information, and is fixed before the study begins. By contrast, in adaptive sampling, the sampling design adapts based on observations made during the survey; for example, drug users may be asked to refer other drug users to the researcher. In the present article several adaptive sampling designs are discussed. Link-tracing designs such as snowball sampling, random walk methods, and network sampling are described, along with adaptive allocation and adaptive cluster sampling. It is stressed that special estimation procedures taking the sampling design into account are needed when adaptive sampling has been used. These procedures yield estimates that are considerably better than conventional estimates. For rare and clustered populations adaptive designs can give substantial gains in efficiency over conventional designs, and for hidden populations link-tracing and other adaptive procedures may provide the only practical way to obtain a sample large enough for the study objectives.

  20. Application of advanced sampling and analysis methods to predict the structure of adsorbed protein on a material surface

    PubMed Central

    Abramyan, Tigran M.; Hyde-Volpe, David L.; Stuart, Steven J.; Latour, Robert A.

    2017-01-01

    The use of standard molecular dynamics simulation methods to predict the interactions of a protein with a material surface have the inherent limitations of lacking the ability to determine the most likely conformations and orientations of the adsorbed protein on the surface and to determine the level of convergence attained by the simulation. In addition, standard mixing rules are typically applied to combine the nonbonded force field parameters of the solution and solid phases the system to represent interfacial behavior without validation. As a means to circumvent these problems, the authors demonstrate the application of an efficient advanced sampling method (TIGER2A) for the simulation of the adsorption of hen egg-white lysozyme on a crystalline (110) high-density polyethylene surface plane. Simulations are conducted to generate a Boltzmann-weighted ensemble of sampled states using force field parameters that were validated to represent interfacial behavior for this system. The resulting ensembles of sampled states were then analyzed using an in-house-developed cluster analysis method to predict the most probable orientations and conformations of the protein on the surface based on the amount of sampling performed, from which free energy differences between the adsorbed states were able to be calculated. In addition, by conducting two independent sets of TIGER2A simulations combined with cluster analyses, the authors demonstrate a method to estimate the degree of convergence achieved for a given amount of sampling. The results from these simulations demonstrate that these methods enable the most probable orientations and conformations of an adsorbed protein to be predicted and that the use of our validated interfacial force field parameter set provides closer agreement to available experimental results compared to using standard CHARMM force field parameterization to represent molecular behavior at the interface. PMID:28514864

  1. CA II TRIPLET SPECTROSCOPY OF SMALL MAGELLANIC CLOUD RED GIANTS. III. ABUNDANCES AND VELOCITIES FOR A SAMPLE OF 14 CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parisi, M. C.; Clariá, J. J.; Marcionni, N.

    2015-05-15

    We obtained spectra of red giants in 15 Small Magellanic Cloud (SMC) clusters in the region of the Ca ii lines with FORS2 on the Very Large Telescope. We determined the mean metallicity and radial velocity with mean errors of 0.05 dex and 2.6 km s{sup −1}, respectively, from a mean of 6.5 members per cluster. One cluster (B113) was too young for a reliable metallicity determination and was excluded from the sample. We combined the sample studied here with 15 clusters previously studied by us using the same technique, and with 7 clusters whose metallicities determined by other authorsmore » are on a scale similar to ours. This compilation of 36 clusters is the largest SMC cluster sample currently available with accurate and homogeneously determined metallicities. We found a high probability that the metallicity distribution is bimodal, with potential peaks at −1.1 and −0.8 dex. Our data show no strong evidence of a metallicity gradient in the SMC clusters, somewhat at odds with recent evidence from Ca ii triplet spectra of a large sample of field stars. This may be revealing possible differences in the chemical history of clusters and field stars. Our clusters show a significant dispersion of metallicities, whatever age is considered, which could be reflecting the lack of a unique age–metallicity relation in this galaxy. None of the chemical evolution models currently available in the literature satisfactorily represents the global chemical enrichment processes of SMC clusters.« less

  2. The Hubble Space Telescope Medium Deep Survey Cluster Sample: Methodology and Data

    NASA Astrophysics Data System (ADS)

    Ostrander, E. J.; Nichol, R. C.; Ratnatunga, K. U.; Griffiths, R. E.

    1998-12-01

    We present a new, objectively selected, sample of galaxy overdensities detected in the Hubble Space Telescope Medium Deep Survey (MDS). These clusters/groups were found using an automated procedure that involved searching for statistically significant galaxy overdensities. The contrast of the clusters against the field galaxy population is increased when morphological data are used to search around bulge-dominated galaxies. In total, we present 92 overdensities above a probability threshold of 99.5%. We show, via extensive Monte Carlo simulations, that at least 60% of these overdensities are likely to be real clusters and groups and not random line-of-sight superpositions of galaxies. For each overdensity in the MDS cluster sample, we provide a richness and the average of the bulge-to-total ratio of galaxies within each system. This MDS cluster sample potentially contains some of the most distant clusters/groups ever detected, with about 25% of the overdensities having estimated redshifts z > ~0.9. We have made this sample publicly available to facilitate spectroscopic confirmation of these clusters and help more detailed studies of cluster and galaxy evolution. We also report the serendipitous discovery of a new cluster close on the sky to the rich optical cluster Cl l0016+16 at z = 0.546. This new overdensity, HST 001831+16208, may be coincident with both an X-ray source and a radio source. HST 001831+16208 is the third cluster/group discovered near to Cl 0016+16 and appears to strengthen the claims of Connolly et al. of superclustering at high redshift.

  3. Composition of microbial communities in aerosol, snow and ice samples from remote glaciated areas (Antarctica, Alps, Andes)

    NASA Astrophysics Data System (ADS)

    Elster, J.; Delmas, R. J.; Petit, J.-R.; Řeháková, K.

    2007-06-01

    Taxonomical and ecological analyses were performed on micro-autotrophs (cyanobacteria and algae together with remnants of diatom valves), micro-fungi (hyphae and spores), bacteria (rod, cocci and red clusters), yeast, and plant pollen extracted from various samples: Alps snow (Mt. Blank area), Andean snow (Illimani, Bolivia), Antarctic aerosol filters (Dumont d'Urville, Terre Adélie), and Antarctic inland ice (Terre Adélie). Three methods for ice and snow sample's pre-concentration were tested (filtration, centrifugation and lyophilisation). Afterwards, cultivation methods for terrestrial, freshwater and marine microorganisms (micro-autotrophs and micro-fungi) were used in combination with liquid and solid media. The main goal of the study was to find out if micro-autotrophs are commonly transported by air masses, and later stored in snow and icecaps around the world. The most striking result of this study was the absence of culturable micro-autotrophs in all studied samples. However, an unusual culturable pigmented prokaryote was found in both alpine snow and aerosol samples. Analyses of many samples and proper statistical analyses (PCA, RDA- Monte Carlo permutation tests) showed that studied treatments highly significantly differ in both microbial community and biotic remnants composition F=9.33, p=0.001. In addition, GLM showed that studied treatments highly significantly differ in numbers of categories of microorganisms and remnants of biological material F=11.45, p=0.00005. The Antarctic aerosol samples were characterised by having red clusters of bacteria, the unusual prokaryote and yeasts. The high mountain snow from the Alps and Andes contained much more culturable heterotrophs. The unusual prokaryote was very abundant, as were coccoid bacteria, red clusters of bacteria, as well as yeasts. The Antarctic ice samples were quite different. These samples had higher numbers of rod bacteria and fungal hyphae. The microbial communities and biological remnants of analysed samples comprises two communities, without a sharp boundary between them: i) the first community includes ubiquitous organisms including contaminants, ii) the second community represents individuals frequently occurring in remote terrestrial cold or hot desert/semi-desert and/or marginal soil-snow-ice ecosystems.

  4. Spectroscopic characterization of galaxy clusters in RCS-1: spectroscopic confirmation, redshift accuracy, and dynamical mass-richness relation

    NASA Astrophysics Data System (ADS)

    Gilbank, David G.; Barrientos, L. Felipe; Ellingson, Erica; Blindert, Kris; Yee, H. K. C.; Anguita, T.; Gladders, M. D.; Hall, P. B.; Hertling, G.; Infante, L.; Yan, R.; Carrasco, M.; Garcia-Vergara, Cristina; Dawson, K. S.; Lidman, C.; Morokuma, T.

    2018-05-01

    We present follow-up spectroscopic observations of galaxy clusters from the first Red-sequence Cluster Survey (RCS-1). This work focuses on two samples, a lower redshift sample of ˜30 clusters ranging in redshift from z ˜ 0.2-0.6 observed with multiobject spectroscopy (MOS) on 4-6.5-m class telescopes and a z ˜ 1 sample of ˜10 clusters 8-m class telescope observations. We examine the detection efficiency and redshift accuracy of the now widely used red-sequence technique for selecting clusters via overdensities of red-sequence galaxies. Using both these data and extended samples including previously published RCS-1 spectroscopy and spectroscopic redshifts from SDSS, we find that the red-sequence redshift using simple two-filter cluster photometric redshifts is accurate to σz ≈ 0.035(1 + z) in RCS-1. This accuracy can potentially be improved with better survey photometric calibration. For the lower redshift sample, ˜5 per cent of clusters show some (minor) contamination from secondary systems with the same red-sequence intruding into the measurement aperture of the original cluster. At z ˜ 1, the rate rises to ˜20 per cent. Approximately ten per cent of projections are expected to be serious, where the two components contribute significant numbers of their red-sequence galaxies to another cluster. Finally, we present a preliminary study of the mass-richness calibration using velocity dispersions to probe the dynamical masses of the clusters. We find a relation broadly consistent with that seen in the local universe from the WINGS sample at z ˜ 0.05.

  5. Further observations on comparison of immunization coverage by lot quality assurance sampling and 30 cluster sampling.

    PubMed

    Singh, J; Jain, D C; Sharma, R S; Verghese, T

    1996-06-01

    Lot Quality Assurance Sampling (LQAS) and standard EPI methodology (30 cluster sampling) were used to evaluate immunization coverage in a Primary Health Center (PHC) where coverage levels were reported to be more than 85%. Of 27 sub-centers (lots) evaluated by LQAS, only 2 were accepted for child coverage, whereas none was accepted for tetanus toxoid (TT) coverage in mothers. LQAS data were combined to obtain an estimate of coverage in the entire population; 41% (95% CI 36-46) infants were immunized appropriately for their ages, while 42% (95% CI 37-47) of their mothers had received a second/ booster dose of TT. TT coverage in 149 contemporary mothers sampled in EPI survey was also 42% (95% CI 31-52). Although results by the two sampling methods were consistent with each other, a big gap was evident between reported coverage (in children as well as mothers) and survey results. LQAS was found to be operationally feasible, but it cost 40% more and required 2.5 times more time than the EPI survey. LQAS therefore, is not a good substitute for current EPI methodology to evaluate immunization coverage in a large administrative area. However, LQAS has potential as method to monitor health programs on a routine basis in small population sub-units, especially in areas with high and heterogeneously distributed immunization coverage.

  6. Quantifying opening-mode fracture spatial organization in horizontal wellbore image logs, core and outcrop: Application to Upper Cretaceous Frontier Formation tight gas sandstones, USA

    NASA Astrophysics Data System (ADS)

    Li, J. Z.; Laubach, S. E.; Gale, J. F. W.; Marrett, R. A.

    2018-03-01

    The Upper Cretaceous Frontier Formation is a naturally fractured gas-producing sandstone in Wyoming. Regionally, random and statistically more clustered than random patterns exist in the same upper to lower shoreface depositional facies. East-west- and north-south-striking regional fractures sampled using image logs and cores from three horizontal wells exhibit clustered patterns, whereas data collected from east-west-striking fractures in outcrop have patterns that are indistinguishable from random. Image log data analyzed with the correlation count method shows clusters ∼35 m wide and spaced ∼50 to 90 m apart as well as clusters up to 12 m wide with periodic inter-cluster spacings. A hierarchy of cluster sizes exists; organization within clusters is likely fractal. These rocks have markedly different structural and burial histories, so regional differences in degree of clustering are unsurprising. Clustered patterns correspond to fractures having core quartz deposition contemporaneous with fracture opening, circumstances that some models suggest might affect spacing patterns by interfering with fracture growth. Our results show that quantifying and identifying patterns as statistically more or less clustered than random delineates differences in fracture patterns that are not otherwise apparent but that may influence gas and water production, and therefore may be economically important.

  7. Triosmium Clusters on a Support: Determination of Structure by X-Ray Absorption Spectroscopy and High-Resolution Microscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shareghe, Mehraeen; Chi, Miaofang; Browning, Nigel D.

    2011-01-01

    The structures of small, robust metal clusters on a solid support were determined by a combination of spectroscopic and microscopic methods: extended X-ray absorption fine structure (EXAFS) spectroscopy, scanning transmission electron microscopy (STEM), and aberration-corrected STEM. The samples were synthesized from [Os{sub 3}(CO){sub 12}] on MgO powder to provide supported clusters intended to be triosmium. The results demonstrate that the supported clusters are robust in the absence of oxidants. Conventional high-angle annular dark-field (HAADF) STEM images demonstrate a high degree of uniformity of the clusters, with root-mean-square (rms) radii of 2.03 {+-} 0.06 {angstrom}. The EXAFS OsOs coordination number ofmore » 2.1 {+-} 0.4 confirms the presence of triosmium clusters on average and correspondingly determines an average rms cluster radius of 2.02 {+-} 0.04 {angstrom}. The high-resolution STEM images show the individual Os atoms in the clusters, confirming the triangular structures of their frames and determining OsOs distances of 2.80 {+-} 0.14 {angstrom}, matching the EXAFS value of 2.89 {+-} 0.06 {angstrom}. IR and EXAFS spectra demonstrate the presence of CO ligands on the clusters. This set of techniques is recommended as optimal for detailed and reliable structural characterization of supported clusters.« less

  8. The Minnesota Center for Twin and Family Research Genome-Wide Association Study

    PubMed Central

    Miller, Michael B.; Basu, Saonli; Cunningham, Julie; Eskin, Eleazar; Malone, Steven M.; Oetting, William S.; Schork, Nicholas; Sul, Jae Hoon; Iacono, William G.; Mcgue, Matt

    2012-01-01

    As part of the Genes, Environment and Development Initiative (GEDI), the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study (GWAS), which we describe here. A total of 8405 research participants, clustered in 4-member families, have been successfully genotyped on 527,829 single nucleotide polymorphism (SNP) markers using Illumina’s Human660W-Quad array. Quality control screening of samples and markers as well as SNP imputation procedures are described. We also describe methods for ancestry control and how the familial clustering of the MCTFR sample can be accounted for in the analysis using a Rapid Feasible Generalized Least Squares algorithm. The rich longitudinal MCTFR assessments provide numerous opportunities for collaboration. PMID:23363460

  9. Comparative study of feature selection with ensemble learning using SOM variants

    NASA Astrophysics Data System (ADS)

    Filali, Ameni; Jlassi, Chiraz; Arous, Najet

    2017-03-01

    Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.

  10. Automated modal parameter estimation using correlation analysis and bootstrap sampling

    NASA Astrophysics Data System (ADS)

    Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.

    2018-02-01

    The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.

  11. X-ray and optical substructures of the DAFT/FADA survey clusters

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.

    2013-04-01

    We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.

  12. An X-Ray Flux-Limited Sample of Galaxy Clusters: Physical Properties and Cosmological Implications

    NASA Astrophysics Data System (ADS)

    Reiprich, Thomas H.

    2001-07-01

    An X-ray selected and X-ray flux-limited sample comprising the 63 X-ray brightest galaxy clusters in the sky (excluding the galactic band, called HIFLUGCS) has been constructed based on the ROSAT All-Sky Survey. The flux limit has been set at 2x10^-11 erg/s/cm^2 in the energy band 0.1-2.4 keV. It has been shown that a high completeness is indicated by several tests. Due to the high flux limit this sample can be used for a variety of applications requiring a statistical cluster sample without any corrections to the effective survey volume. Mainly high quality pointed observations have been used to determine fluxes and physical cluster parameters. It has been shown that a tight correlation exists between the X-ray luminosity and the gravitational mass using HIFLUGCS and an extended sample of 106 galaxy clusters. The relation and its scatter have been quantified using different fitting methods. A comparison to theoretical and numerical predictions shows an overall agreement. This relation may be directly applied in large X-ray cluster surveys or dark matter simulations for conversions between X-ray luminosity and gravitating mass. Data from the performance verification phase of the recently launched X-ray satellite observatory XMM-Newton on the galaxy cluster Abell 1835 has been analyzed, in order to test the assumption of isothermality of the cluster gas in the outer parts applied throughout the work. It has been found that the measured outer temperature profile is consistent with being isothermal. In the inner regions a clear drop of the temperature by a factor of two has been found. Physical properties of the cluster sample have been studied by analyzing relations between different cluster parameters. The overall properties are well understood but in detail deviations from simple expectations have been found. It has been found that the gas mass fraction (fgas) does not vary as a function of intracluster gas temperature. For galaxy groups (kTx < 2 keV), however, a steep drop of fgas has been observed. No clear trend of a variation of the shape of the surface brightness profile, i.e. beta, has been observed as a function of temperature. The Lx-Tx relation has been found to be steeper than expected from simple self similar models, as has been found by previous authors. But no clear deviations from a power law shape down to kTx = 0.7 keV have been found. The Mt-Tx relation found here is steeper than expected from self similar models and its normalization is lower compared to hydrodynamic simulations, in agreement with previous findings. Suggested scenarios to account for these deviations, including heating and cooling processes, and observational difficulties have been described. It appears that a blend of different effects, possibly including a variation of mean formation redshift with system mass, is needed to account for the observations presented here. Using HIFLUGCS the gravitational mass function has been determined for the mass interval 3.5x10^13 < M200 < 5.2x10^15 h50^-1 Msun. Comparison with Press-Schechter mass functions has yielded tight constraints on the mean matter density in the universe and the amplitude of density fluctuations. The large covered mass range has allowed to put constraints on the parameters individually. Specifically it has been found that OmegaM = 0.12^{+0.06}_{-0.04} and sigma8 = 0.96^{+0.15}_{-0.12} (90% c.l. statistical uncertainty). This result is consistent with two more estimates of OmegaM obtained in this work using different methods. The mean intracluster gas fraction of the 106 clusters in the extended sample combined with predictions from the theory of nucleosynthesis indicates OmegaM < 0.34. The cluster mass to light ratio multiplied by the mean luminosity density implies OmegaM 0.15. Various tests for systematic uncertainties have been performed, including comparison of the Press-Schechter mass function with the most recent results from large N-body simulations, yielding deviations smaller than the statistical uncertainties. For comparison the best fit OmegaM values for fixed sigma8 values have been determined yielding the relation sigma8 = 0.43OmegaM^-0.38. The mass function has been integrated to obtain the fraction of the total gravitating mass in the universe contained in galaxy clusters. Normalized to the critical density it has been found that Omega_Cluster = 0.012^{+0.003}_{-0.004} for cluster masses larger than 6.4^{+0.7}_{-0.6}x10^13 h50^-1 Msun. With the value for OmegaM determined here this implies that about 90% of the mass in the universe resides outside virialized cluster regions. Similarly it has been found that the fraction of the total gravitating mass which is contained in the intracluster gas, Omega_b,Cluster = 0.0015^{+0.0002}_{-0.0001} h50^-1.5 for gas masses larger than 6.9^{+1.4}_{-1.5}x10^12 h50^{-5/2}Msun, is very small.

  13. Private Universities in Kenya Seek Alternative Ways to Manage Change in Teacher Education Curriculum in Compliance with the Commission for University Education Reforms

    ERIC Educational Resources Information Center

    Amimo, Catherine Adhiambo

    2016-01-01

    This study investigated management of change in teacher education curriculum in Private universities in Kenya. The study employed a concurrent mixed methods design that is based on the use of both quantitative and qualitative approaches. A multi-stage sampling process which included purposive, convenience, cluster, and snowball sampling methods…

  14. Relationship between the Religious Attitude, Self-Efficacy, and Life Satisfaction in High School Teachers of Mahshahr City

    ERIC Educational Resources Information Center

    Bigdeloo, Masoomeh; Bozorgi, Zahra Dasht

    2016-01-01

    This study aims to investigate the relationship between the religious attitude, self-efficacy, and life satisfaction in high school teachers of Mahshahr City. To this end, 253 people of all high school teachers in Mahshahr City, in Iran were selected as the sample using the multistage cluster sampling method. For data collection, Glock and Stark's…

  15. Development and Validation of PCR Primers To Assess the Diversity of Clostridium spp. in Cheese by Temporal Temperature Gradient Gel Electrophoresis

    PubMed Central

    Le Bourhis, Anne-Gaëlle; Saunier, Katiana; Doré, Joël; Carlier, Jean-Philippe; Chamba, Jean-François; Popoff, Michel-Robert; Tholozan, Jean-Luc

    2005-01-01

    A nested-PCR temporal temperature gradient gel electrophoresis (TTGE) approach was developed for the detection of bacteria belonging to phylogenetic cluster I of the genus Clostridium (the largest clostridial group, which represents 25% of the currently cultured clostridial species) in cheese suspected of late blowing. Primers were designed based on the 16S rRNA gene sequence, and the specificity was confirmed in PCRs performed with DNAs from cluster I and non-cluster I species as the templates. TTGE profiles of the PCR products, comprising the V5-V6 region of the 16S rRNA gene, allowed us to distinguish the majority of cluster I species. PCR-TTGE was applied to analyze commercial cheeses with defects. All cheeses gave a signal after nested PCR, and on the basis of band comigration with TTGE profiles of reference strains, all the bands could be assigned to a clostridial species. The direct identification of Clostridium spp. was confirmed by sequencing of excised bands. C. tyrobutyricum and C. beijerinckii contaminated 15 and 14 of the 20 cheese samples tested, respectively, and C. butyricum and C. sporogenes were detected in one cheese sample. Most-probable-number counts and volatile fatty acid were determined for comparison purposes. Results obtained were in agreement, but only two species, C. tyrobutyricum and C. sporogenes, could be isolated by the plating method. In all cheeses with a high amount of butyric acid (>100 mg/100 g), the presence of C. tyrobutyricum DNA was confirmed by PCR-TTGE, suggesting the involvement of this species in butyric acid fermentation. These results demonstrated the efficacy of the PCR-TTGE method to identify Clostridium in cheeses. The sensitivity of the method was estimated to be 100 CFU/g. PMID:15640166

  16. Weak-lensing calibration of a stellar mass-based mass proxy for redMaPPer and Voronoi Tessellation clusters in SDSS Stripe 82

    NASA Astrophysics Data System (ADS)

    Pereira, Maria E. S.; Soares-Santos, Marcelle; Makler, Martin; Annis, James; Lin, Huan; Palmese, Antonella; Vitorelli, André Z.; Welch, Brian; Caminha, Gabriel B.; Erben, Thomas; Moraes, Bruno; Shan, Huanyuan

    2018-02-01

    We present the first weak lensing calibration of μ⋆, a new galaxy cluster mass proxy corresponding to the total stellar mass of red and blue members, in two cluster samples selected from the SDSS Stripe 82 data: 230 red-sequence Matched-filter Probabilistic Percolation (redMaPPer) clusters at redshift 0.1 ≤ z < 0.33 and 136 Voronoi Tessellation (VT) clusters at 0.1 ≤ z < 0.6. We use the CS82 shear catalogue and stack the clusters in μ⋆ bins to measure a mass-observable power-law relation. For redMaPPer clusters we obtain M0 = (1.77 ± 0.36) × 1014 h-1M⊙, α = 1.74 ± 0.62. For VT clusters, we find M0 = (4.31 ± 0.89) × 1014 h-1M⊙, α = 0.59 ± 0.54 and M0 = (3.67 ± 0.56) × 1014 h-1M⊙, α = 0.68 ± 0.49 for a low and a high redshift bin, respectively. Our results are consistent, internally and with the literature, indicating that our method can be applied to any cluster-finding algorithm. In particular, we recommend that μ⋆ be used as the mass proxy for VT clusters. Catalogues including μ⋆ measurements will enable its use in studies of galaxy evolution in clusters and cluster cosmology.

  17. Detecting hybridization between Iranian wild wolf (Canis lupus pallipes) and free-ranging domestic dog (Canis familiaris) by analysis of microsatellite markers.

    PubMed

    Khosravi, Rasoul; Rezaei, Hamid Reza; Kaboli, Mohammad

    2013-01-01

    The genetic threat due to hybridization with free-ranging dogs is one major concern in wolf conservation. The identification of hybrids and extent of hybridization is important in the conservation and management of wolf populations. Genetic variation was analyzed at 15 unlinked loci in 28 dogs, 28 wolves, four known hybrids, two black wolves, and one dog with abnormal traits in Iran. Pritchard's model, multivariate ordination by principal component analysis and neighbor joining clustering were used for population clustering and individual assignment. Analysis of genetic variation showed that genetic variability is high in both wolf and dog populations in Iran. Values of H(E) in dog and wolf samples ranged from 0.75-0.92 and 0.77-0.92, respectively. The results of AMOVA showed that the two groups of dog and wolf were significantly different (F(ST) = 0.05 and R(ST) = 0.36; P < 0.001). In each of the three methods, wolf and dog samples were separated into two distinct clusters. Two dark wolves were assigned to the wolf cluster. Also these models detected D32 (dog with abnormal traits) and some other samples, which were assigned to more than one cluster and could be a hybrid. This study is the beginning of a genetic study in wolf populations in Iran, and our results reveal that as in other countries, hybridization between wolves and dogs is sporadic in Iran and can be a threat to wolf populations if human perturbations increase.

  18. Analysis of cytokine release assay data using machine learning approaches.

    PubMed

    Xiong, Feiyu; Janko, Marco; Walker, Mindi; Makropoulos, Dorie; Weinstock, Daniel; Kam, Moshe; Hrebien, Leonid

    2014-10-01

    The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classification of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. The K-selected Butcher-Oemler Effect

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stanford, S A; De Propris, R; Dickinson, M

    2004-03-02

    We investigate the Butcher-Oemler effect using samples of galaxies brighter than observed frame K* + 1.5 in 33 clusters at 0.1 {approx}< z {approx}< 0.9. We attempt to duplicate as closely as possible the methodology of Butcher & Oemler. Apart from selecting in the K-band, the most important difference is that we use a brightness limit fixed at 1.5 magnitudes below an observed frame K* rather than the nominal limit of rest frame M(V ) = -20 used by Butcher & Oemler. For an early type galaxy at z = 0.1 our sample cutoff is 0.2 magnitudes brighter than restmore » frame M(V ) = -20, while at z = 0.9 our cutoff is 0.9 magnitudes brighter. If the blue galaxies tend to be faint, then the difference in magnitude limits should result in our measuring lower blue fractions. A more minor difference from the Butcher & Oemler methodology is that the area covered by our galaxy samples has a radius of 0.5 or 0.7 Mpc at all redshifts rather than R{sub 30}, the radius containing 30% of the cluster population. In practice our field sizes are generally similar to those used by Butcher & Oemler. We find the fraction of blue galaxies in our K-selected samples to be lower on average than that derived from several optically selected samples, and that it shows little trend with redshift. However, at the redshifts z < 0.6 where our sample overlaps with that of Butcher & Oemler, the difference in fB as determined from our K-selected samples and those of Butcher & Oemler is much reduced. The large scatter in the measured f{sub B}, even in small redshift ranges, in our study indicates that determining the f{sub B} for a much larger sample of clusters from K-selected galaxy samples is important. As a test of our methods, our data allow us to construct optically-selected samples down to rest frame M(V ) = -20, as used by Butcher & Oemler, for four clusters that are common between our sample and that of Butcher & Oemler. For these rest V selected samples, we find similar fractions of blue galaxies to Butcher & Oemler, while the K selected samples for the same 4 clusters yield blue fractions which are typically half as large. This comparison indicates that selecting in the K-band is the primary difference between our study and previous optically-based studies of the Butcher & Oemler effect. Selecting in the observed K-band is more nearly a process of selecting galaxies by their mass than is the case for optically-selected samples. Our results suggest that the Butcher-Oemler effect is at least partly due to low mass galaxies whose optical luminosities are boosted. These lower mass galaxies could evolve into the rich dwarf population observed in nearby clusters.« less

  20. Memory color assisted illuminant estimation through pixel clustering

    NASA Astrophysics Data System (ADS)

    Zhang, Heng; Quan, Shuxue

    2010-01-01

    The under constrained nature of illuminant estimation determines that in order to resolve the problem, certain assumptions are needed, such as the gray world theory. Including more constraints in this process may help explore the useful information in an image and improve the accuracy of the estimated illuminant, providing that the constraints hold. Based on the observation that most personal images have contents of one or more of the following categories: neutral objects, human beings, sky, and plants, we propose a method for illuminant estimation through the clustering of pixels of gray and three dominant memory colors: skin tone, sky blue, and foliage green. Analysis shows that samples of the above colors cluster around small areas under different illuminants and their characteristics can be used to effectively detect pixels falling into each of the categories. The algorithm requires the knowledge of the spectral sensitivity response of the camera, and a spectral database consisted of the CIE standard illuminants and reflectance or radiance database of samples of the above colors.

Top