K, Punith; K, Lalitha; G, Suman; Bs, Pradeep; Kumar K, Jayanth
2008-07-01
Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Population-based cross-sectional study. Areas under Mathikere Urban Health Center. Children aged 12 months to 23 months. 220 in cluster sampling, 76 in lot quality assurance sampling. Percentages and Proportions, Chi square Test. (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.
K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth
2008-01-01
Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474
Active learning for semi-supervised clustering based on locally linear propagation reconstruction.
Chang, Chin-Chun; Lin, Po-Yi
2015-03-01
The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.
Technique for fast and efficient hierarchical clustering
Stork, Christopher
2013-10-08
A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.
NASA Technical Reports Server (NTRS)
Chapman, G. M. (Principal Investigator); Carnes, J. G.
1981-01-01
Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.
NASA Astrophysics Data System (ADS)
Abdullah, Mohamed H.; Wilson, Gillian; Klypin, Anatoly
2018-07-01
We introduce GalWeight, a new technique for assigning galaxy cluster membership. This technique is specifically designed to simultaneously maximize the number of bona fide cluster members while minimizing the number of contaminating interlopers. The GalWeight technique can be applied to both massive galaxy clusters and poor galaxy groups. Moreover, it is effective in identifying members in both the virial and infall regions with high efficiency. We apply the GalWeight technique to MDPL2 and Bolshoi N-body simulations, and find that it is >98% accurate in correctly assigning cluster membership. We show that GalWeight compares very favorably against four well-known existing cluster membership techniques (shifting gapper, den Hartog, caustic, SIM). We also apply the GalWeight technique to a sample of 12 Abell clusters (including the Coma cluster) using observations from the Sloan Digital Sky Survey. We conclude by discussing GalWeight’s potential for other astrophysical applications.
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique
ERIC Educational Resources Information Center
Steinley, Douglas
2006-01-01
Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate…
Sampling designs for HIV molecular epidemiology with application to Honduras.
Shepherd, Bryan E; Rossini, Anthony J; Soto, Ramon Jeremias; De Rivera, Ivette Lorenzana; Mullins, James I
2005-11-01
Proper sampling is essential to characterize the molecular epidemiology of human immunodeficiency virus (HIV). HIV sampling frames are difficult to identify, so most studies use convenience samples. We discuss statistically valid and feasible sampling techniques that overcome some of the potential for bias due to convenience sampling and ensure better representation of the study population. We employ a sampling design called stratified cluster sampling. This first divides the population into geographical and/or social strata. Within each stratum, a population of clusters is chosen from groups, locations, or facilities where HIV-positive individuals might be found. Some clusters are randomly selected within strata and individuals are randomly selected within clusters. Variation and cost help determine the number of clusters and the number of individuals within clusters that are to be sampled. We illustrate the approach through a study designed to survey the heterogeneity of subtype B strains in Honduras.
Cosmological Constraints from Galaxy Clustering and the Mass-to-number Ratio of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Tinker, Jeremy L.; Sheldon, Erin S.; Wechsler, Risa H.; Becker, Matthew R.; Rozo, Eduardo; Zu, Ying; Weinberg, David H.; Zehavi, Idit; Blanton, Michael R.; Busha, Michael T.; Koester, Benjamin P.
2012-01-01
We place constraints on the average density (Ω m ) and clustering amplitude (σ8) of matter using a combination of two measurements from the Sloan Digital Sky Survey: the galaxy two-point correlation function, wp (rp ), and the mass-to-galaxy-number ratio within galaxy clusters, M/N, analogous to cluster M/L ratios. Our wp (rp ) measurements are obtained from DR7 while the sample of clusters is the maxBCG sample, with cluster masses derived from weak gravitational lensing. We construct nonlinear galaxy bias models using the Halo Occupation Distribution (HOD) to fit both wp (rp ) and M/N for different cosmological parameters. HOD models that match the same two-point clustering predict different numbers of galaxies in massive halos when Ω m or σ8 is varied, thereby breaking the degeneracy between cosmology and bias. We demonstrate that this technique yields constraints that are consistent and competitive with current results from cluster abundance studies, without the use of abundance information. Using wp (rp ) and M/N alone, we find Ω0.5 m σ8 = 0.465 ± 0.026, with individual constraints of Ω m = 0.29 ± 0.03 and σ8 = 0.85 ± 0.06. Combined with current cosmic microwave background data, these constraints are Ω m = 0.290 ± 0.016 and σ8 = 0.826 ± 0.020. All errors are 1σ. The systematic uncertainties that the M/N technique are most sensitive to are the amplitude of the bias function of dark matter halos and the possibility of redshift evolution between the SDSS Main sample and the maxBCG cluster sample. Our derived constraints are insensitive to the current level of uncertainties in the halo mass function and in the mass-richness relation of clusters and its scatter, making the M/N technique complementary to cluster abundances as a method for constraining cosmology with future galaxy surveys.
NASA Astrophysics Data System (ADS)
Willis, J. P.; Ramos-Ceja, M. E.; Muzzin, A.; Pacaud, F.; Yee, H. K. C.; Wilson, G.
2018-04-01
We present a comparison of two samples of z > 0.8 galaxy clusters selected using different wavelength-dependent techniques and examine the physical differences between them. We consider 18 clusters from the X-ray selected XMM-LSS distant cluster survey and 92 clusters from the optical-MIR selected SpARCS cluster survey. Both samples are selected from the same approximately 9 square degree sky area and we examine them using common XMM-Newton, Spitzer-SWIRE and CFHT Legacy Survey data. Clusters from each sample are compared employing aperture measures of X-ray and MIR emission. We divide the SpARCS distant cluster sample into three sub-samples: a) X-ray bright, b) X-ray faint, MIR bright, and c) X-ray faint, MIR faint clusters. We determine that X-ray and MIR selected clusters display very similar surface brightness distributions of galaxy MIR light. In addition, the average location and amplitude of the galaxy red sequence as measured from stacked colour histograms is very similar in the X-ray and MIR-selected samples. The sub-sample of X-ray faint, MIR bright clusters displays a distribution of BCG-barycentre position offsets which extends to higher values than all other samples. This observation indicates that such clusters may exist in a more disturbed state compared to the majority of the distant cluster population sampled by XMM-LSS and SpARCS. This conclusion is supported by stacked X-ray images for the X-ray faint, MIR bright cluster sub-sample that display weak, centrally-concentrated X-ray emission, consistent with a population of growing clusters accreting from an extended envelope of material.
Old, L.; Wojtak, R.; Pearce, F. R.; ...
2017-12-20
With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Old, L.; Wojtak, R.; Pearce, F. R.
With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
NASA Astrophysics Data System (ADS)
Willis, J. P.; Ramos-Ceja, M. E.; Muzzin, A.; Pacaud, F.; Yee, H. K. C.; Wilson, G.
2018-07-01
We present a comparison of two samples of z> 0.8 galaxy clusters selected using different wavelength-dependent techniques and examine the physical differences between them. We consider 18 clusters from the X-ray-selected XMM Large Scale Structure (LSS) distant cluster survey and 92 clusters from the optical-mid-infrared (MIR)-selected Spitzer Adaptation of the Red Sequence Cluster survey (SpARCS) cluster survey. Both samples are selected from the same approximately 9 sq deg sky area and we examine them using common XMM-Newton, Spitizer Wide-Area Infrared Extra-galactic (SWIRE) survey, and Canada-France-Hawaii Telescope Legacy Survey data. Clusters from each sample are compared employing aperture measures of X-ray and MIR emission. We divide the SpARCS distant cluster sample into three sub-samples: (i) X-ray bright, (ii) X-ray faint, MIR bright, and (iii) X-ray faint, MIR faint clusters. We determine that X-ray- and MIR-selected clusters display very similar surface brightness distributions of galaxy MIR light. In addition, the average location and amplitude of the galaxy red sequence as measured from stacked colour histograms is very similar in the X-ray- and MIR-selected samples. The sub-sample of X-ray faint, MIR bright clusters displays a distribution of brightest cluster galaxy-barycentre position offsets which extends to higher values than all other samples. This observation indicates that such clusters may exist in a more disturbed state compared to the majority of the distant cluster population sampled by XMM-LSS and SpARCS. This conclusion is supported by stacked X-ray images for the X-ray faint, MIR bright cluster sub-sample that display weak, centrally concentrated X-ray emission, consistent with a population of growing clusters accreting from an extended envelope of material.
ERIC Educational Resources Information Center
Wilson, Mark
This study investigates the accuracy of the Woodruff-Causey technique for estimating sampling errors for complex statistics. The technique may be applied when data are collected by using multistage clustered samples. The technique was chosen for study because of its relevance to the correct use of multivariate analyses in educational survey…
Uncertainties in the cluster-cluster correlation function
NASA Astrophysics Data System (ADS)
Ling, E. N.; Frenk, C. S.; Barrow, J. D.
1986-12-01
The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.
Machine learning approaches for estimation of prediction interval for the model output.
Shrestha, Durga L; Solomatine, Dimitri P
2006-03-01
A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
Analysis of Spectral-type A/B Stars in Five Open Clusters
NASA Astrophysics Data System (ADS)
Wilhelm, Ronald J.; Rafuil Islam, M.
2014-01-01
We have obtained low resolution (R = 1000) spectroscopy of N=68, spectral-type A/B stars in five nearby open star clusters using the McDonald Observatory, 2.1m telescope. The sample of blue stars in various clusters were selected to test our new technique for determining interstellar reddening and distances in areas where interstellar reddening is high. We use a Bayesian approach to find the posterior distribution for Teff, Logg and [Fe/H] from a combination of reddened, photometric colors and spectroscopic line strengths. We will present calibration results for this technique using open cluster star data with known reddening and distances. Preliminary results suggest our technique can produce both reddening and distance determinations to within 10% of cluster values. Our technique opens the possibility of determining distances for blue stars at low Galactic latitudes where extinction can be large and differential. We will also compare our stellar parameter determinations to previously reported MK spectral classifications and discuss the probability that some of our stars are not members of their reported clusters.
NASA Astrophysics Data System (ADS)
Vazza, F.; Brunetti, G.; Gheller, C.; Brunino, R.
2010-11-01
We present a sample of 20 massive galaxy clusters with total virial masses in the range of 6 × 10 14 M ⊙ ⩽ Mvir ⩽ 2 × 10 15 M ⊙, re-simulated with a customized version of the 1.5. ENZO code employing adaptive mesh refinement. This technique allowed us to obtain unprecedented high spatial resolution (≈25 kpc/h) up to the distance of ˜3 virial radii from the clusters center, and makes it possible to focus with the same level of detail on the physical properties of the innermost and of the outermost cluster regions, providing new clues on the role of shock waves and turbulent motions in the ICM, across a wide range of scales. In this paper, a first exploratory study of this data set is presented. We report on the thermal properties of galaxy clusters at z = 0. Integrated and morphological properties of gas density, gas temperature, gas entropy and baryon fraction distributions are discussed, and compared with existing outcomes both from the observational and from the numerical literature. Our cluster sample shows an overall good consistency with the results obtained adopting other numerical techniques (e.g. Smoothed Particles Hydrodynamics), yet it provides a more accurate representation of the accretion patterns far outside the cluster cores. We also reconstruct the properties of shock waves within the sample by means of a velocity-based approach, and we study Mach numbers and energy distributions for the various dynamical states in clusters, giving estimates for the injection of Cosmic Rays particles at shocks. The present sample is rather unique in the panorama of cosmological simulations of massive galaxy clusters, due to its dynamical range, statistics of objects and number of time outputs. For this reason, we deploy a public repository of the available data, accessible via web portal at http://data.cineca.it.
Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model
Ellefsen, Karl J.; Smith, David
2016-01-01
Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.
Sánchez-Marcos, J; Laguna-Marco, M A; Martínez-Morillas, R; Céspedes, E; Menéndez, N; Jiménez-Villacorta, F; Prieto, C
2012-11-01
Partially oxidized iron nanoclusters have been prepared by the gas-phase aggregation technique with typical sizes of 2-3 nm. This preparation technique has been reported to obtain clusters with interesting magnetic properties such as very large exchange bias. In this paper, a sample composition study carried out by Mössbauer and X-ray absorption spectroscopies is reported. The information reached by these techniques, which is based on the iron short range order, results to be an ideal way to have a characterization of the whole sample since the obtained data are an average over a very large amount of the clusters. In addition, our results indicate the presence of ferrihydrite, which is a compound typically ignored when studying this type of systems.
IDENTIFICATION OF MEMBERS IN THE CENTRAL AND OUTER REGIONS OF GALAXY CLUSTERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Serra, Ana Laura; Diaferio, Antonaldo, E-mail: serra@ph.unito.it
2013-05-10
The caustic technique measures the mass of galaxy clusters in both their virial and infall regions and, as a byproduct, yields the list of cluster galaxy members. Here we use 100 galaxy clusters with mass M{sub 200} {>=} 10{sup 14} h {sup -1} M{sub Sun} extracted from a cosmological N-body simulation of a {Lambda}CDM universe to test the ability of the caustic technique to identify the cluster galaxy members. We identify the true three-dimensional members as the gravitationally bound galaxies. The caustic technique uses the caustic location in the redshift diagram to separate the cluster members from the interlopers. Wemore » apply the technique to mock catalogs containing 1000 galaxies in the field of view of 12 h {sup -1} Mpc on a side at the cluster location. On average, this sample size roughly corresponds to 180 real galaxy members within 3r{sub 200}, similar to recent redshift surveys of cluster regions. The caustic technique yields a completeness, the fraction of identified true members, f{sub c} = 0.95 {+-} 0.03, within 3r{sub 200}. The contamination, the fraction of interlopers in the observed catalog of members, increases from f{sub i}=0.020{sup +0.046}{sub -0.015} at r{sub 200} to f{sub i}=0.08{sup +0.11}{sub -0.05} at 3r{sub 200}. No other technique for the identification of the members of a galaxy cluster provides such large completeness and small contamination at these large radii. The caustic technique assumes spherical symmetry and the asphericity of the cluster is responsible for most of the spread of the completeness and the contamination. By applying the technique to an approximately spherical system obtained by stacking the individual clusters, the spreads decrease by at least a factor of two. We finally estimate the cluster mass within 3r{sub 200} after removing the interlopers: for individual clusters, the mass estimated with the virial theorem is unbiased and within 30% of the actual mass; this spread decreases to less than 10% for the spherically symmetric stacked cluster.« less
The Effect of Cluster-Based Instruction on Mathematic Achievement in Inclusive Schools
ERIC Educational Resources Information Center
Gunarhadi, Sunardi; Anwar, Mohammad; Andayani, Tri Rejeki; Shaari, Abdull Sukor
2016-01-01
The research aimed to investigate the effect of Cluster-Based Instruction (CBI) on the academic achievement of Mathematics in inclusive schools. The sample was 68 students in two intact classes, including those with learning disabilities, selected using a cluster random technique among 17 inclusive schools in the regency of Surakarta. The two…
Cancer detection based on Raman spectra super-paramagnetic clustering
NASA Astrophysics Data System (ADS)
González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual
2016-08-01
The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foltz, R.; Wilson, G.; DeGroot, A.
We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable tomore » detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.« less
Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep
2015-05-01
The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.
Pezzoli, L; Tchio, R; Dzossa, A D; Ndjomo, S; Takeu, A; Anya, B; Ticha, J; Ronveaux, O; Lewis, R F
2012-01-01
We used the clustered lot quality assurance sampling (clustered-LQAS) technique to identify districts with low immunization coverage and guide mop-up actions during the last 4 days of a combined oral polio vaccine (OPV) and yellow fever (YF) vaccination campaign conducted in Cameroon in May 2009. We monitored 17 pre-selected districts at risk for low coverage. We designed LQAS plans to reject districts with YF vaccination coverage <90% and with OPV coverage <95%. In each lot the sample size was 50 (five clusters of 10) with decision values of 3 for assessing OPV and 7 for YF coverage. We 'rejected' 10 districts for low YF coverage and 14 for low OPV coverage. Hence we recommended a 2-day extension of the campaign. Clustered-LQAS proved to be useful in guiding the campaign vaccination strategy before the completion of the operations.
Photometry Using Kepler "Superstamps" of Open Clusters NGC 6791 & NGC 6819
NASA Astrophysics Data System (ADS)
Kuehn, Charles A.; Drury, Jason A.; Bellamy, Beau R.; Stello, Dennis; Bedding, Timothy R.; Reed, Mike; Quick, Breanna
2015-09-01
The Kepler space telescope has proven to be a gold mine for the study of variable stars. Usually, Kepler only reads out a handful of pixels around each pre-selected target star, omitting a large number of stars in the Kepler field. Fortunately, for the open clusters NGC 6791 and NGC 6819, Kepler also read out larger "superstamps" which contained complete images of the central region of each cluster. These cluster images can be used to study additional stars in the open clusters that were not originally on Kepler's target list. We discuss our work on using two photometric techniques to analyze these superstamps and present sample results from this project to demonstrate the value of this technique for a wide variety of variable stars.
Survey of adaptive image coding techniques
NASA Technical Reports Server (NTRS)
Habibi, A.
1977-01-01
The general problem of image data compression is discussed briefly with attention given to the use of Karhunen-Loeve transforms, suboptimal systems, and block quantization. A survey is then conducted encompassing the four categories of adaptive systems: (1) adaptive transform coding (adaptive sampling, adaptive quantization, etc.), (2) adaptive predictive coding (adaptive delta modulation, adaptive DPCM encoding, etc.), (3) adaptive cluster coding (blob algorithms and the multispectral cluster coding technique), and (4) adaptive entropy coding.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parisi, M. C.; Clariá, J. J.; Marcionni, N.
2015-05-15
We obtained spectra of red giants in 15 Small Magellanic Cloud (SMC) clusters in the region of the Ca ii lines with FORS2 on the Very Large Telescope. We determined the mean metallicity and radial velocity with mean errors of 0.05 dex and 2.6 km s{sup −1}, respectively, from a mean of 6.5 members per cluster. One cluster (B113) was too young for a reliable metallicity determination and was excluded from the sample. We combined the sample studied here with 15 clusters previously studied by us using the same technique, and with 7 clusters whose metallicities determined by other authorsmore » are on a scale similar to ours. This compilation of 36 clusters is the largest SMC cluster sample currently available with accurate and homogeneously determined metallicities. We found a high probability that the metallicity distribution is bimodal, with potential peaks at −1.1 and −0.8 dex. Our data show no strong evidence of a metallicity gradient in the SMC clusters, somewhat at odds with recent evidence from Ca ii triplet spectra of a large sample of field stars. This may be revealing possible differences in the chemical history of clusters and field stars. Our clusters show a significant dispersion of metallicities, whatever age is considered, which could be reflecting the lack of a unique age–metallicity relation in this galaxy. None of the chemical evolution models currently available in the literature satisfactorily represents the global chemical enrichment processes of SMC clusters.« less
Min-max hyperellipsoidal clustering for anomaly detection in network security.
Sarasamma, Suseela T; Zhu, Qiuming A
2006-08-01
A novel hyperellipsoidal clustering technique is presented for an intrusion-detection system in network security. Hyperellipsoidal clusters toward maximum intracluster similarity and minimum intercluster similarity are generated from training data sets. The novelty of the technique lies in the fact that the parameters needed to construct higher order data models in general multivariate Gaussian functions are incrementally derived from the data sets using accretive processes. The technique is implemented in a feedforward neural network that uses a Gaussian radial basis function as the model generator. An evaluation based on the inclusiveness and exclusiveness of samples with respect to specific criteria is applied to accretively learn the output clusters of the neural network. One significant advantage of this is its ability to detect individual anomaly types that are hard to detect with other anomaly-detection schemes. Applying this technique, several feature subsets of the tcptrace network-connection records that give above 95% detection at false-positive rates below 5% were identified.
Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nurgaliev, D.; McDonald, M.; Benson, B. A.
We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less
Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters
Nurgaliev, D.; McDonald, M.; Benson, B. A.; ...
2017-05-16
We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less
A method of using cluster analysis to study statistical dependence in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
NASA Astrophysics Data System (ADS)
Gilbank, David G.; Barrientos, L. Felipe; Ellingson, Erica; Blindert, Kris; Yee, H. K. C.; Anguita, T.; Gladders, M. D.; Hall, P. B.; Hertling, G.; Infante, L.; Yan, R.; Carrasco, M.; Garcia-Vergara, Cristina; Dawson, K. S.; Lidman, C.; Morokuma, T.
2018-05-01
We present follow-up spectroscopic observations of galaxy clusters from the first Red-sequence Cluster Survey (RCS-1). This work focuses on two samples, a lower redshift sample of ˜30 clusters ranging in redshift from z ˜ 0.2-0.6 observed with multiobject spectroscopy (MOS) on 4-6.5-m class telescopes and a z ˜ 1 sample of ˜10 clusters 8-m class telescope observations. We examine the detection efficiency and redshift accuracy of the now widely used red-sequence technique for selecting clusters via overdensities of red-sequence galaxies. Using both these data and extended samples including previously published RCS-1 spectroscopy and spectroscopic redshifts from SDSS, we find that the red-sequence redshift using simple two-filter cluster photometric redshifts is accurate to σz ≈ 0.035(1 + z) in RCS-1. This accuracy can potentially be improved with better survey photometric calibration. For the lower redshift sample, ˜5 per cent of clusters show some (minor) contamination from secondary systems with the same red-sequence intruding into the measurement aperture of the original cluster. At z ˜ 1, the rate rises to ˜20 per cent. Approximately ten per cent of projections are expected to be serious, where the two components contribute significant numbers of their red-sequence galaxies to another cluster. Finally, we present a preliminary study of the mass-richness calibration using velocity dispersions to probe the dynamical masses of the clusters. We find a relation broadly consistent with that seen in the local universe from the WINGS sample at z ˜ 0.05.
Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.
2016-01-01
Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969
Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G
2015-10-01
Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.
An Archival Search For Young Globular Clusters in Galaxies
NASA Astrophysics Data System (ADS)
Whitmore, Brad
1995-07-01
One of the most intriguing results from HST has been the discovery of ultraluminous star clusters in interacting and merging galaxies. These clusters have the luminosities, colors, and sizes that would be expected of young globular clusters produced by the interaction. We propose to use the data in the HST Archive to determine how prevalent this phenomena is, and to determine whether similar clusters are produced in other environments. Three samples will be extracted and studied in a systematic and consistent manner: 1} interacting and merging galaxies, 2} starburst galaxies, 3} a control sample of ``normal'' galaxies. A preliminary search of the archives shows that there are at least 20 galaxies in each of these samples, and the number will grow by about 50 observations become available. The data will be used to determine the luminosity function, color histogram , spatial distribution, and structural properties of the clusters using the same techniques employed in our study of NGC 7252 {``Atoms -for-Peace'' galaxy} and NGC 4038/4039 {``The Antennae''}. Our ultimate goals are: 1} to understand how globular clusters form, and 2} to use the clusters as evolutionary tracers to unravel the histories of interacting galaxies.
OPEN CLUSTERS AS PROBES OF THE GALACTIC MAGNETIC FIELD. I. CLUSTER PROPERTIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoq, Sadia; Clemens, D. P., E-mail: shoq@bu.edu, E-mail: clemens@bu.edu
2015-10-15
Stars in open clusters are powerful probes of the intervening Galactic magnetic field via background starlight polarimetry because they provide constraints on the magnetic field distances. We use 2MASS photometric data for a sample of 31 clusters in the outer Galaxy for which near-IR polarimetric data were obtained to determine the cluster distances, ages, and reddenings via fitting theoretical isochrones to cluster color–magnitude diagrams. The fitting approach uses an objective χ{sup 2} minimization technique to derive the cluster properties and their uncertainties. We found the ages, distances, and reddenings for 24 of the clusters, and the distances and reddenings formore » 6 additional clusters that were either sparse or faint in the near-IR. The derived ranges of log(age), distance, and E(B−V) were 7.25–9.63, ∼670–6160 pc, and 0.02–1.46 mag, respectively. The distance uncertainties ranged from ∼8% to 20%. The derived parameters were compared to previous studies, and most cluster parameters agree within our uncertainties. To test the accuracy of the fitting technique, synthetic clusters with 50, 100, or 200 cluster members and a wide range of ages were fit. These tests recovered the input parameters within their uncertainties for more than 90% of the individual synthetic cluster parameters. These results indicate that the fitting technique likely provides reliable estimates of cluster properties. The distances derived will be used in an upcoming study of the Galactic magnetic field in the outer Galaxy.« less
Elemental Mixing State of Aerosol Particles Collected in Central Amazonia during GoAmazon2014/15
Fraund, Matthew; Pham, Don; Bonanno, Daniel; ...
2017-09-15
Two complementary techniques, Scanning Transmission X-ray Microscopy/Near Edge Fine Structure spectroscopy (STXM/NEXAFS) and Scanning Electron Microscopy/Energy Dispersive X-ray spectroscopy (SEM/EDX), have been quantitatively combined to characterize individual atmospheric particles. This pair of techniques was applied to particle samples at three sampling sites (ATTO, ZF2, and T3) in the Amazon basin as part of the Observations and Modeling of the Green Ocean Amazon (GoAmazon2014/5) field campaign during the dry season of 2014. The combined data was subjected to k-means clustering using mass fractions of the following elements: C, N, O, Na, Mg, P, S, Cl, K, Ca, Mn, Fe, Ni, andmore » Zn. Cluster analysis identified 12 particle types, across different sampling sites and particle sizes. Samples from the remote Amazon Tall Tower Observatory (ATTO, also T0a) exhibited less cluster variety and fewer anthropogenic clusters than samples collected at the sites nearer to the Manaus metropolitan region, ZF2 (also T0t) or T3. Samples from the ZF2 site contained aged/anthropogenic clusters not readily explained by transport from ATTO or Manaus, possibly suggesting the effects of long range atmospheric transport or other local aerosol sources present during sampling. In addition, this data set allowed for recently established diversity parameters to be calculated. All sample periods had high mixing state indices (χ) that were >0.8. Two individual particle diversity (D i) populations were observed, with particles <0.5 μm having a D i of ~2.4 and >0.5 μm particles having a D i of ~3.6, which likely correspond to fresh and aged aerosols respectively. The diversity parameters determined by the quantitative method presented here will serve to aid in the accurate representation of aerosol mixing state, source apportionment, and aging in both less polluted and more industrialized environments in the Amazon Basin.« less
Elemental Mixing State of Aerosol Particles Collected in Central Amazonia during GoAmazon2014/15
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fraund, Matthew; Pham, Don; Bonanno, Daniel
Two complementary techniques, Scanning Transmission X-ray Microscopy/Near Edge Fine Structure spectroscopy (STXM/NEXAFS) and Scanning Electron Microscopy/Energy Dispersive X-ray spectroscopy (SEM/EDX), have been quantitatively combined to characterize individual atmospheric particles. This pair of techniques was applied to particle samples at three sampling sites (ATTO, ZF2, and T3) in the Amazon basin as part of the Observations and Modeling of the Green Ocean Amazon (GoAmazon2014/5) field campaign during the dry season of 2014. The combined data was subjected to k-means clustering using mass fractions of the following elements: C, N, O, Na, Mg, P, S, Cl, K, Ca, Mn, Fe, Ni, andmore » Zn. Cluster analysis identified 12 particle types, across different sampling sites and particle sizes. Samples from the remote Amazon Tall Tower Observatory (ATTO, also T0a) exhibited less cluster variety and fewer anthropogenic clusters than samples collected at the sites nearer to the Manaus metropolitan region, ZF2 (also T0t) or T3. Samples from the ZF2 site contained aged/anthropogenic clusters not readily explained by transport from ATTO or Manaus, possibly suggesting the effects of long range atmospheric transport or other local aerosol sources present during sampling. In addition, this data set allowed for recently established diversity parameters to be calculated. All sample periods had high mixing state indices (χ) that were >0.8. Two individual particle diversity (D i) populations were observed, with particles <0.5 μm having a D i of ~2.4 and >0.5 μm particles having a D i of ~3.6, which likely correspond to fresh and aged aerosols respectively. The diversity parameters determined by the quantitative method presented here will serve to aid in the accurate representation of aerosol mixing state, source apportionment, and aging in both less polluted and more industrialized environments in the Amazon Basin.« less
Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice
2015-01-01
Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.
Matsen IV, Frederick A.; Evans, Steven N.
2013-01-01
Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. PMID:23505415
Network visualization of conformational sampling during molecular dynamics simulation.
Ahlstrom, Logan S; Baker, Joseph Lee; Ehrlich, Kent; Campbell, Zachary T; Patel, Sunita; Vorontsov, Ivan I; Tama, Florence; Miyashita, Osamu
2013-11-01
Effective data reduction methods are necessary for uncovering the inherent conformational relationships present in large molecular dynamics (MD) trajectories. Clustering algorithms provide a means to interpret the conformational sampling of molecules during simulation by grouping trajectory snapshots into a few subgroups, or clusters, but the relationships between the individual clusters may not be readily understood. Here we show that network analysis can be used to visualize the dominant conformational states explored during simulation as well as the connectivity between them, providing a more coherent description of conformational space than traditional clustering techniques alone. We compare the results of network visualization against 11 clustering algorithms and principal component conformer plots. Several MD simulations of proteins undergoing different conformational changes demonstrate the effectiveness of networks in reaching functional conclusions. Copyright © 2013 Elsevier Inc. All rights reserved.
Weak lensing magnification of SpARCS galaxy clusters
NASA Astrophysics Data System (ADS)
Tudorica, A.; Hildebrandt, H.; Tewes, M.; Hoekstra, H.; Morrison, C. B.; Muzzin, A.; Wilson, G.; Yee, H. K. C.; Lidman, C.; Hicks, A.; Nantais, J.; Erben, T.; van der Burg, R. F. J.; Demarco, R.
2017-12-01
Context. Measuring and calibrating relations between cluster observables is critical for resource-limited studies. The mass-richness relation of clusters offers an observationally inexpensive way of estimating masses. Its calibration is essential for cluster and cosmological studies, especially for high-redshift clusters. Weak gravitational lensing magnification is a promising and complementary method to shear studies, that can be applied at higher redshifts. Aims: We aim to employ the weak lensing magnification method to calibrate the mass-richness relation up to a redshift of 1.4. We used the Spitzer Adaptation of the Red-Sequence Cluster Survey (SpARCS) galaxy cluster candidates (0.2 < z < 1.4) and optical data from the Canada France Hawaii Telescope (CFHT) to test whether magnification can be effectively used to constrain the mass of high-redshift clusters. Methods: Lyman-break galaxies (LBGs) selected using the u-band dropout technique and their colours were used as a background sample of sources. LBG positions were cross-correlated with the centres of the sample of SpARCS clusters to estimate the magnification signal, which was optimally-weighted using an externally-calibrated LBG luminosity function. The signal was measured for cluster sub-samples, binned in both redshift and richness. Results: We measured the cross-correlation between the positions of galaxy cluster candidates and LBGs and detected a weak lensing magnification signal for all bins at a detection significance of 2.6-5.5σ. In particular, the significance of the measurement for clusters with z> 1.0 is 4.1σ; for the entire cluster sample we obtained an average M200 of 1.28 -0.21+0.23 × 1014 M⊙. Conclusions: Our measurements demonstrated the feasibility of using weak lensing magnification as a viable tool for determining the average halo masses for samples of high redshift galaxy clusters. The results also established the success of using galaxy over-densities to select massive clusters at z > 1. Additional studies are necessary for further modelling of the various systematic effects we discussed.
Structure of clusters and building blocks in amylopectin from African rice accessions.
Gayin, Joseph; Abdel-Aal, El-Sayed M; Marcone, Massimo; Manful, John; Bertoft, Eric
2016-09-05
Enzymatic hydrolysis in combination with gel-permeation and anion-exchange chromatography techniques were employed to characterise the composition of clusters and building blocks of amylopectin from two African rice (Oryza glaberrima) accessions-IRGC 103759 and TOG 12440. The samples were compared with one Asian rice (Oryza sativa) sample (cv WITA 4) and one O. sativa×O. glaberrima cross (NERICA 4). The average DP of clusters from the African rice accessions (ARAs) was marginally larger (DP=83) than in WITA 4 (DP=81). However, regarding average number of chains, clusters from the ARAs represented both the smallest and largest clusters. Overall, the result suggested that the structure of clusters in TOG 12440 was dense with short chains and high degree of branching, whereas the situation was the opposite in NERICA 4. IRGC 103759 and WITA 4 possessed clusters with intermediate characteristics. The commonest type of building blocks in all samples was group 2 (single branched dextrins) representing 40.3-49.4% of the blocks, while groups 3-6 were found in successively lower numbers. The average number of building blocks in the clusters was significantly larger in NERICA 4 (5.8) and WITA 4 (5.7) than in IRGC 103759 and TOG 12440 (5.1 and 5.3, respectively). Copyright © 2016 Elsevier Ltd. All rights reserved.
Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid
2008-01-01
A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.
Li, Jinyan; Fong, Simon; Sung, Yunsick; Cho, Kyungeun; Wong, Raymond; Wong, Kelvin K L
2016-01-01
An imbalanced dataset is defined as a training dataset that has imbalanced proportions of data in both interesting and uninteresting classes. Often in biomedical applications, samples from the stimulating class are rare in a population, such as medical anomalies, positive clinical tests, and particular diseases. Although the target samples in the primitive dataset are small in number, the induction of a classification model over such training data leads to poor prediction performance due to insufficient training from the minority class. In this paper, we use a novel class-balancing method named adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique (ASCB_DmSMOTE) to solve this imbalanced dataset problem, which is common in biomedical applications. The proposed method combines under-sampling and over-sampling into a swarm optimisation algorithm. It adaptively selects suitable parameters for the rebalancing algorithm to find the best solution. Compared with the other versions of the SMOTE algorithm, significant improvements, which include higher accuracy and credibility, are observed with ASCB_DmSMOTE. Our proposed method tactfully combines two rebalancing techniques together. It reasonably re-allocates the majority class in the details and dynamically optimises the two parameters of SMOTE to synthesise a reasonable scale of minority class for each clustered sub-imbalanced dataset. The proposed methods ultimately overcome other conventional methods and attains higher credibility with even greater accuracy of the classification model.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.
Chandramouli, Balasubramanian; Mancini, Giordano
2016-01-01
Classical Molecular Dynamics (MD) simulations can provide insights at the nanoscopic scale into protein dynamics. Currently, simulations of large proteins and complexes can be routinely carried out in the ns-μs time regime. Clustering of MD trajectories is often performed to identify selective conformations and to compare simulation and experimental data coming from different sources on closely related systems. However, clustering techniques are usually applied without a careful validation of results and benchmark studies involving the application of different algorithms to MD data often deal with relatively small peptides instead of average or large proteins; finally clustering is often applied as a means to analyze refined data and also as a way to simplify further analysis of trajectories. Herein, we propose a strategy to classify MD data while carefully benchmarking the performance of clustering algorithms and internal validation criteria for such methods. We demonstrate the method on two showcase systems with different features, and compare the classification of trajectories in real and PCA space. We posit that the prototype procedure adopted here could be highly fruitful in clustering large trajectories of multiple systems or that resulting especially from enhanced sampling techniques like replica exchange simulations. Copyright: © 2016 by Fabrizio Serra editore, Pisa · Roma.
Pennings, Stephanie M; Finn, Joseph; Houtsma, Claire; Green, Bradley A; Anestis, Michael D
2017-10-01
Prior studies examining posttraumatic stress disorder (PTSD) symptom clusters and the components of the interpersonal theory of suicide (ITS) have yielded mixed results, likely stemming in part from the use of divergent samples and measurement techniques. This study aimed to expand on these findings by utilizing a large military sample, gold standard ITS measures, and multiple PTSD factor structures. Utilizing a sample of 935 military personnel, hierarchical multiple regression analyses were used to test the association between PTSD symptom clusters and the ITS variables. Additionally, we tested for indirect effects of PTSD symptom clusters on suicidal ideation through thwarted belongingness, conditional on levels of perceived burdensomeness. Results indicated that numbing symptoms are positively associated with both perceived burdensomeness and thwarted belongingness and hyperarousal symptoms (dysphoric arousal in the 5-factor model) are positively associated with thwarted belongingness. Results also indicated that hyperarousal symptoms (anxious arousal in the 5-factor model) were positively associated with fearlessness about death. The positive association between PTSD symptom clusters and suicidal ideation was inconsistent and modest, with mixed support for the ITS model. Overall, these results provide further clarity regarding the association between specific PTSD symptom clusters and suicide risk factors. © 2016 The American Association of Suicidology.
Cosmology with XMM galaxy clusters: the X-CLASS/GROND catalogue and photometric redshifts
NASA Astrophysics Data System (ADS)
Ridl, J.; Clerc, N.; Sadibekova, T.; Faccioli, L.; Pacaud, F.; Greiner, J.; Krühler, T.; Rau, A.; Salvato, M.; Menzel, M.-L.; Steinle, H.; Wiseman, P.; Nandra, K.; Sanders, J.
2017-06-01
The XMM Cluster Archive Super Survey (X-CLASS) is a serendipitously detected X-ray-selected sample of 845 galaxy clusters based on 2774 XMM archival observations and covering an approximately 90 deg2 spread across the high-Galactic latitude (|b| > 20°) sky. The primary goal of this survey is to produce a well-selected sample of galaxy clusters on which cosmological analyses can be performed. This paper presents the photometric redshift follow-up of a high signal-to-noise ratio subset of 265 of these clusters with declination δ < +20° with Gamma-Ray Burst Optical and Near-Infrared Detector (GROND), a 7-channel (grizJHK) simultaneous imager on the MPG 2.2-m telescope at the ESO La Silla Observatory. We use a newly developed technique based on the red sequence colour-redshift relation, enhanced with information coming from the X-ray detection to provide photometric redshifts for this sample. We determine photometric redshifts for 232 clusters, finding a median redshift of z = 0.39 with an accuracy of Δz = 0.02(1 + z) when compared to a sample of 76 spectroscopically confirmed clusters. We also compute X-ray luminosities for the entire sample and find a median bolometric luminosity of 7.2 × 1043 erg s-1 and a median temperature of 2.9 keV. We compare our results to those of the XMM-XCS and XMM-XXL surveys, finding good agreement in both samples. The X-CLASS catalogue is available online at http://xmm-lss.in2p3.fr:8080/l4sdb/.
NASA Astrophysics Data System (ADS)
Eftekharzadeh, S.; Myers, A. D.; Hennawi, J. F.; Djorgovski, S. G.; Richards, G. T.; Mahabal, A. A.; Graham, M. J.
2017-06-01
We present the most precise estimate to date of the clustering of quasars on very small scales, based on a sample of 47 binary quasars with magnitudes of g < 20.85 and proper transverse separations of ˜25 h-1 kpc. Our sample of binary quasars, which is about six times larger than any previous spectroscopically confirmed sample on these scales, is targeted using a kernel density estimation (KDE) technique applied to Sloan Digital Sky Survey (SDSS) imaging over most of the SDSS area. Our sample is 'complete' in that all of the KDE target pairs with 17.0 ≲ R ≲ 36.2 h-1 kpc in our area of interest have been spectroscopically confirmed from a combination of previous surveys and our own long-slit observational campaign. We catalogue 230 candidate quasar pairs with angular separations of <8 arcsec, from which our binary quasars were identified. We determine the projected correlation function of quasars (\\bar{W}_p) in four bins of proper transverse scale over the range 17.0 ≲ R ≲ 36.2 h-1 kpc. The implied small-scale quasar clustering amplitude from the projected correlation function, integrated across our entire redshift range, is A = 24.1 ± 3.6 at ˜26.6 h-1 kpc. Our sample is the first spectroscopically confirmed sample of quasar pairs that is sufficiently large to study how quasar clustering evolves with redshift at ˜25 h-1 kpc. We find that empirical descriptions of how quasar clustering evolves with redshift at ˜25 h-1 Mpc also adequately describe the evolution of quasar clustering at ˜25 h-1 kpc.
Finite temperature properties of clusters by replica exchange metadynamics: the water nonamer.
Zhai, Yingteng; Laio, Alessandro; Tosatti, Erio; Gong, Xin-Gao
2011-03-02
We introduce an approach for the accurate calculation of thermal properties of classical nanoclusters. On the basis of a recently developed enhanced sampling technique, replica exchange metadynamics, the method yields the true free energy of each relevant cluster structure, directly sampling its basin and measuring its occupancy in full equilibrium. All entropy sources, whether vibrational, rotational anharmonic, or especially configurational, the latter often forgotten in many cluster studies, are automatically included. For the present demonstration, we choose the water nonamer (H(2)O)(9), an extremely simple cluster, which nonetheless displays a sufficient complexity and interesting physics in its relevant structure spectrum. Within a standard TIP4P potential description of water, we find that the nonamer second relevant structure possesses a higher configurational entropy than the first, so that the two free energies surprisingly cross for increasing temperature.
Finite Temperature Properties of Clusters by Replica Exchange Metadynamics: The Water Nonamer
NASA Astrophysics Data System (ADS)
Zhai, Yingteng; Laio, Alessandro; Tosatti, Erio; Gong, Xingao
2012-02-01
We introduce an approach for the accurate calculation of thermal properties of classical nanoclusters. Based on a recently developed enhanced sampling technique, replica exchange metadynamics, the method yields the true free energy of each relevant cluster structure, directly sampling its basin and measuring its occupancy in full equilibrium. All entropy sources, whether vibrational, rotational anharmonic and especially configurational -- the latter often forgotten in many cluster studies -- are automatically included. For the present demonstration we choose the water nonamer (H2O)9, an extremely simple cluster which nonetheless displays a sufficient complexity and interesting physics in its relevant structure spectrum. Within a standard TIP4P potential description of water, we find that the nonamer second relevant structure possesses a higher configurational entropy than the first, so that the two free energies surprisingly cross for increasing temperature.
Pearson's chi-square test and rank correlation inferences for clustered data.
Shih, Joanna H; Fay, Michael P
2017-09-01
Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.
Haneuse, Sebastien; Rivera-Rodriguez, Claudia
2018-01-01
In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.
Structural parameters of young star clusters: fractal analysis
NASA Astrophysics Data System (ADS)
Hetem, A.
2017-07-01
A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.
Testing Gravity and Cosmic Acceleration with Galaxy Clustering
NASA Astrophysics Data System (ADS)
Kazin, Eyal; Tinker, J.; Sanchez, A. G.; Blanton, M.
2012-01-01
The large-scale structure contains vast amounts of cosmological information that can help understand the accelerating nature of the Universe and test gravity on large scales. Ongoing and future sky surveys are designed to test these using various techniques applied on clustering measurements of galaxies. We present redshift distortion measurements of the Sloan Digital Sky Survey II Luminous Red Galaxy sample. We find that when combining the normalized quadrupole Q with the projected correlation function wp(rp) along with cluster counts (Rapetti et al. 2010), results are consistent with General Relativity. The advantage of combining Q and wp is the addition of the bias information, when using the Halo Occupation Distribution framework. We also present improvements to the standard technique of measuring Hubble expansion rates H(z) and angular diameter distances DA(z) when using the baryonic acoustic feature as a standard ruler. We introduce clustering wedges as an alternative basis to the multipole expansion and show that it yields similar constraints. This alternative basis serves as a useful technique to test for systematics, and ultimately improve measurements of the cosmic acceleration.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stello, Dennis; Huber, Daniel; Bedding, Timothy R.
Studying star clusters offers significant advances in stellar astrophysics due to the combined power of having many stars with essentially the same distance, age, and initial composition. This makes clusters excellent test benches for verification of stellar evolution theory. To fully exploit this potential, it is vital that the star sample is uncontaminated by stars that are not members of the cluster. Techniques for determining cluster membership therefore play a key role in the investigation of clusters. We present results on three clusters in the Kepler field of view based on a newly established technique that uses asteroseismology to identifymore » fore- or background stars in the field, which demonstrates advantages over classical methods such as kinematic and photometry measurements. Four previously identified seismic non-members in NGC 6819 are confirmed in this study, and three additional non-members are found-two in NGC 6819 and one in NGC 6791. We further highlight which stars are, or might be, affected by blending, which needs to be taken into account when analyzing these Kepler data.« less
NASA Astrophysics Data System (ADS)
Brusa, Roberto S.; Karwasz, Grzegorz P.; Tiengo, Nadia; Zecca, Antonio; Corni, Federico; Tonini, Rita; Ottaviani, Gianpiero
2000-04-01
The depth profile of open volume defects has been measured in Si implanted with He at an energy of 20 keV, by means of a slow-positron beam and the Doppler broadening technique. The evolution of defect distributions has been studied as a function of isochronal annealing in two series of samples implanted at the fluence of 5×1015 and 2×1016 He cm-2. A fitting procedure has been applied to the experimental data to extract a positron parameter characterizing each open volume defect. The defects have been identified by comparing this parameter with recent theoretical calculations. In as-implanted samples the major part of vacancies and divacancies produced by implantation is passivated by the presence of He. The mean depth of defects as seen by the positron annihilation technique is about five times less than the helium projected range. During the successive isochronal annealing the number of positron traps decreases, then increases and finally, at the highest annealing temperatures, disappears only in the samples implanted at the lowest fluence. A minimum of open volume defects is reached at the annealing temperature of 250 °C in both series. The increase of open volume defects at temperatures higher than 250 °C is due to the appearance of vacancy clusters of increasing size, with a mean depth distribution that moves towards the He projected range. The appearance of vacancy clusters is strictly related to the out diffusion of He. In the samples implanted at 5×1015 cm-2 the vacancy clusters are mainly four vacancy agglomerates stabilized by He related defects. They disappear starting from an annealing temperature of 700 °C. In the samples implanted at 2×1016 cm-2 and annealed at 850-900 °C the vacancy clusters disappear and only a distribution of cavities centered around the He projected range remains. The role of vacancies in the formation of He clusters, which evolve in bubble and then in cavities, is discussed.
NASA Astrophysics Data System (ADS)
Kobrina, Yevgeniya; Isaksson, Hanna; Sinisaari, Miikka; Rieppo, Lassi; Brama, Pieter A.; van Weeren, René; Helminen, Heikki J.; Jurvelin, Jukka S.; Saarakkala, Simo
2010-11-01
The collagen phase in bone is known to undergo major changes during growth and maturation. The objective of this study is to clarify whether Fourier transform infrared (FTIR) microspectroscopy, coupled with cluster analysis, can detect quantitative and qualitative changes in the collagen matrix of subchondral bone in horses during maturation and growth. Equine subchondral bone samples (n = 29) from the proximal joint surface of the first phalanx are prepared from two sites subjected to different loading conditions. Three age groups are studied: newborn (0 days old), immature (5 to 11 months old), and adult (6 to 10 years old) horses. Spatial collagen content and collagen cross-link ratio are quantified from the spectra. Additionally, normalized second derivative spectra of samples are clustered using the k-means clustering algorithm. In quantitative analysis, collagen content in the subchondral bone increases rapidly between the newborn and immature horses. The collagen cross-link ratio increases significantly with age. In qualitative analysis, clustering is able to separate newborn and adult samples into two different groups. The immature samples display some nonhomogeneity. In conclusion, this is the first study showing that FTIR spectral imaging combined with clustering techniques can detect quantitative and qualitative changes in the collagen matrix of subchondral bone during growth and maturation.
Probability of coincidental similarity among the orbits of small bodies - I. Pairing
NASA Astrophysics Data System (ADS)
Jopek, Tadeusz Jan; Bronikowska, Małgorzata
2017-09-01
Probability of coincidental clustering among orbits of comets, asteroids and meteoroids depends on many factors like: the size of the orbital sample searched for clusters or the size of the identified group, it is different for groups of 2,3,4,… members. Probability of coincidental clustering is assessed by the numerical simulation, therefore, it depends also on the method used for the synthetic orbits generation. We have tested the impact of some of these factors. For a given size of the orbital sample we have assessed probability of random pairing among several orbital populations of different sizes. We have found how these probabilities vary with the size of the orbital samples. Finally, keeping fixed size of the orbital sample we have shown that the probability of random pairing can be significantly different for the orbital samples obtained by different observation techniques. Also for the user convenience we have obtained several formulae which, for given size of the orbital sample can be used to calculate the similarity threshold corresponding to the small value of the probability of coincidental similarity among two orbits.
Statistical Analysis of Large Scale Structure by the Discrete Wavelet Transform
NASA Astrophysics Data System (ADS)
Pando, Jesus
1997-10-01
The discrete wavelet transform (DWT) is developed as a general statistical tool for the study of large scale structures (LSS) in astrophysics. The DWT is used in all aspects of structure identification including cluster analysis, spectrum and two-point correlation studies, scale-scale correlation analysis and to measure deviations from Gaussian behavior. The techniques developed are demonstrated on 'academic' signals, on simulated models of the Lymanα (Lyα) forests, and on observational data of the Lyα forests. This technique can detect clustering in the Ly-α clouds where traditional techniques such as the two-point correlation function have failed. The position and strength of these clusters in both real and simulated data is determined and it is shown that clusters exist on scales as large as at least 20 h-1 Mpc at significance levels of 2-4 σ. Furthermore, it is found that the strength distribution of the clusters can be used to distinguish between real data and simulated samples even where other traditional methods have failed to detect differences. Second, a method for measuring the power spectrum of a density field using the DWT is developed. All common features determined by the usual Fourier power spectrum can be calculated by the DWT. These features, such as the index of a power law or typical scales, can be detected even when the samples are geometrically complex, the samples are incomplete, or the mean density on larger scales is not known (the infrared uncertainty). Using this method the spectra of Ly-α forests in both simulated and real samples is calculated. Third, a method for measuring hierarchical clustering is introduced. Because hierarchical evolution is characterized by a set of rules of how larger dark matter halos are formed by the merging of smaller halos, scale-scale correlations of the density field should be one of the most sensitive quantities in determining the merging history. We show that these correlations can be completely determined by the correlations between discrete wavelet coefficients on adjacent scales and at nearly the same spatial position, Cj,j+12/cdot2. Scale-scale correlations on two samples of the QSO Ly-α forests absorption spectra are computed. Lastly, higher order statistics are developed to detect deviations from Gaussian behavior. These higher order statistics are necessary to fully characterize the Ly-α forests because the usual 2nd order statistics, such as the two-point correlation function or power spectrum, give inconclusive results. It is shown how this technique takes advantage of the locality of the DWT to circumvent the central limit theorem. A non-Gaussian spectrum is defined and this spectrum reveals not only the magnitude, but the scales of non-Gaussianity. When applied to simulated and observational samples of the Ly-α clouds, it is found that different popular models of structure formation have different spectra while two, independent observational data sets, have the same spectra. Moreover, the non-Gaussian spectra of real data sets are significantly different from the spectra of various possible random samples. (Abstract shortened by UMI.)
Wedge sampling for computing clustering coefficients and triangle counts on large graphs
Seshadhri, C.; Pinar, Ali; Kolda, Tamara G.
2014-05-08
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Despite the importance of these triadic measures, algorithms to compute them can be extremely expensive. We discuss the method of wedge sampling. This versatile technique allows for the fast and accurate approximation of various types of clustering coefficients and triangle counts. Furthermore, these techniques are extensible to counting directed triangles in digraphs. Our methods come with provable andmore » practical time-approximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the state of the art, while providing nearly the accuracy of full enumeration.« less
Effect of denoising on supervised lung parenchymal clusters
NASA Astrophysics Data System (ADS)
Jayamani, Padmapriya; Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.
2012-03-01
Denoising is a critical preconditioning step for quantitative analysis of medical images. Despite promises for more consistent diagnosis, denoising techniques are seldom explored in clinical settings. While this may be attributed to the esoteric nature of the parameter sensitve algorithms, lack of quantitative measures on their ecacy to enhance the clinical decision making is a primary cause of physician apathy. This paper addresses this issue by exploring the eect of denoising on the integrity of supervised lung parenchymal clusters. Multiple Volumes of Interests (VOIs) were selected across multiple high resolution CT scans to represent samples of dierent patterns (normal, emphysema, ground glass, honey combing and reticular). The VOIs were labeled through consensus of four radiologists. The original datasets were ltered by multiple denoising techniques (median ltering, anisotropic diusion, bilateral ltering and non-local means) and the corresponding ltered VOIs were extracted. Plurality of cluster indices based on multiple histogram-based pair-wise similarity measures were used to assess the quality of supervised clusters in the original and ltered space. The resultant rank orders were analyzed using the Borda criteria to nd the denoising-similarity measure combination that has the best cluster quality. Our exhaustive analyis reveals (a) for a number of similarity measures, the cluster quality is inferior in the ltered space; and (b) for measures that benet from denoising, a simple median ltering outperforms non-local means and bilateral ltering. Our study suggests the need to judiciously choose, if required, a denoising technique that does not deteriorate the integrity of supervised clusters.
Luo, Kun; Hu, Xuebin; He, Qiang; Wu, Zhengsong; Cheng, Hao; Hu, Zhenlong; Mazumder, Asit
2017-04-01
Rapid urbanization in China has been causing dramatic deterioration in the water quality of rivers and threatening aquatic ecosystem health. In this paper, multivariate techniques, such as factor analysis (FA) and cluster analysis (CA), were applied to analyze the water quality datasets for 19 rivers in Liangjiang New Area (LJNA), China, collected in April (dry season) and September (wet season) of 2014 and 2015. In most sampling rivers, total phosphorus, total nitrogen, and fecal coliform exceeded the Class V guideline (GB3838-2002), which could thereby threaten the water quality in Yangtze and Jialing Rivers. FA clearly identified the five groups of water quality variables, which explain majority of the experimental data. Nutritious pollution, seasonal changes, and construction activities were three key factors influencing rivers' water quality in LJNA. CA grouped 19 sampling sites into two clusters, which located at sub-catchments with high- and low-level urbanization, respectively. One-way ANOVA showed the nutrients (total phosphorus, soluble reactive phosphorus, total nitrogen, ammonium nitrogen, and nitrite), fecal coliform, and conductivity in cluster 1 were significantly greater than in cluster 2. Thus, catchment urbanization degraded rivers' water quality in Liangjiang New Area. Identifying effective buffer zones at riparian scale to weaken the negative impacts of catchment urbanization was recommended.
Wang, Chuji; Pan, Yong-Le; James, Deryck; Wetmore, Alan E; Redding, Brandon
2014-04-11
We report a novel atmospheric aerosol characterization technique, in which dual wavelength UV laser induced fluorescence (LIF) spectrometry marries an eight-stage rotating drum impactor (RDI), namely UV-LIF-RDI, to achieve size- and time-resolved analysis of aerosol particles on-strip. The UV-LIF-RDI technique measured LIF spectra via direct laser beam illumination onto the particles that were impacted on a RDI strip with a spatial resolution of 1.2mm, equivalent to an averaged time resolution in the aerosol sampling of 3.6 h. Excited by a 263 nm or 351 nm laser, more than 2000 LIF spectra within a 3-week aerosol collection time period were obtained from the eight individual RDI strips that collected particles in eight different sizes ranging from 0.09 to 10 μm in Djibouti. Based on the known fluorescence database from atmospheric aerosols in the US, the LIF spectra obtained from the Djibouti aerosol samples were found to be dominated by fluorescence clusters 2, 5, and 8 (peaked at 330, 370, and 475 nm) when excited at 263 nm and by fluorescence clusters 1, 2, 5, and 6 (peaked at 390 and 460 nm) when excited at 351 nm. Size- and time-dependent variations of the fluorescence spectra revealed some size and time evolution behavior of organic and biological aerosols from the atmosphere in Djibouti. Moreover, this analytical technique could locate the possible sources and chemical compositions contributing to these fluorescence clusters. Advantages, limitations, and future developments of this new aerosol analysis technique are also discussed. Published by Elsevier B.V.
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
NASA Astrophysics Data System (ADS)
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
NASA Astrophysics Data System (ADS)
Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw
2014-01-01
We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.
CHEERS: The chemical evolution RGS sample
NASA Astrophysics Data System (ADS)
de Plaa, J.; Kaastra, J. S.; Werner, N.; Pinto, C.; Kosec, P.; Zhang, Y.-Y.; Mernier, F.; Lovisari, L.; Akamatsu, H.; Schellenberger, G.; Hofmann, F.; Reiprich, T. H.; Finoguenov, A.; Ahoranta, J.; Sanders, J. S.; Fabian, A. C.; Pols, O.; Simionescu, A.; Vink, J.; Böhringer, H.
2017-11-01
Context. The chemical yields of supernovae and the metal enrichment of the intra-cluster medium (ICM) are not well understood. The hot gas in clusters of galaxies has been enriched with metals originating from billions of supernovae and provides a fair sample of large-scale metal enrichment in the Universe. High-resolution X-ray spectra of clusters of galaxies provide a unique way of measuring abundances in the hot intracluster medium (ICM). The abundance measurements can provide constraints on the supernova explosion mechanism and the initial-mass function of the stellar population. This paper introduces the CHEmical Enrichment RGS Sample (CHEERS), which is a sample of 44 bright local giant ellipticals, groups, and clusters of galaxies observed with XMM-Newton. Aims: The CHEERS project aims to provide the most accurate set of cluster abundances measured in X-rays using this sample. This paper focuses specifically on the abundance measurements of O and Fe using the reflection grating spectrometer (RGS) on board XMM-Newton. We aim to thoroughly discuss the cluster to cluster abundance variations and the robustness of the measurements. Methods: We have selected the CHEERS sample such that the oxygen abundance in each cluster is detected at a level of at least 5σ in the RGS. The dispersive nature of the RGS limits the sample to clusters with sharp surface brightness peaks. The deep exposures and the size of the sample allow us to quantify the intrinsic scatter and the systematic uncertainties in the abundances using spectral modeling techniques. Results: We report the oxygen and iron abundances as measured with RGS in the core regions of all 44 clusters in the sample. We do not find a significant trend of O/Fe as a function of cluster temperature, but we do find an intrinsic scatter in the O and Fe abundances from cluster to cluster. The level of systematic uncertainties in the O/Fe ratio is estimated to be around 20-30%, while the systematic uncertainties in the absolute O and Fe abundances can be as high as 50% in extreme cases. Thanks to the high statistics of the observations, we were able to identify and correct a systematic bias in the oxygen abundance determination that was due to an inaccuracy in the spectral model. Conclusions: The lack of dependence of O/Fe on temperature suggests that the enrichment of the ICM does not depend on cluster mass and that most of the enrichment likely took place before the ICM was formed. We find that the observed scatter in the O/Fe ratio is due to a combination of intrinsic scatter in the source and systematic uncertainties in the spectral fitting, which we are unable to separate. The astrophysical source of intrinsic scatter could be due to differences in active galactic nucleus activity and ongoing star formation in the brightest cluster galaxy. The systematic scatter is due to uncertainties in the spatial line broadening, absorption column, multi-temperature structure, and the thermal plasma models.
Patterns of victimization between and within peer clusters in a high school social network.
Swartz, Kristin; Reyns, Bradford W; Wilcox, Pamela; Dunham, Jessica R
2012-01-01
This study presents a descriptive analysis of patterns of violent victimization between and within the various cohesive clusters of peers comprising a sample of more than 500 9th-12th grade students from one high school. Social network analysis techniques provide a visualization of the overall friendship network structure and allow for the examination of variation in victimization across the various peer clusters within the larger network. Social relationships among clusters with varying levels of victimization are also illustrated so as to provide a sense of possible spatial clustering or diffusion of victimization across proximal peer clusters. Additionally, to provide a sense of the sorts of peer clusters that support (or do not support) victimization, characteristics of clusters at both the high and low ends of the victimization scale are discussed. Finally, several of the peer clusters at both the high and low ends of the victimization continuum are "unpacked", allowing examination of within-network individual-level differences in victimization for these select clusters.
Oxygen diffusion in alpha-Al2O3. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Cawley, J. D.; Halloran, J. W.; Cooper, A. R.
1984-01-01
Oxygen self diffusion coefficients were determined in single crystal alpha-Al2O3 using the gas exchange technique. The samples were semi-infinite slabs cut from five different boules with varying background impurities. The diffusion direction was parallel to the c-axis. The tracer profiles were determined by two techniques, single spectrum proton activation and secondary ion mass spectrometry. The SIMS proved to be a more useful tool. The determined diffusion coefficients, which were insensitive to impurity levels and oxygen partial pressure, could be described by D = .00151 exp (-572kJ/RT) sq m/s. The insensitivities are discussed in terms of point defect clustering. Two independent models are consistent with the findings, the first considers the clusters as immobile point defect traps which buffer changes in the defect chemistry. The second considers clusters to be mobile and oxygen diffusion to be intrinsic behavior, the mechanism for oxygen transport involving neutral clusters of Schottky quintuplets.
Roshan, Abdul-Rahman A; Gad, Haidy A; El-Ahmady, Sherweit H; Khanbash, Mohamed S; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M
2013-08-14
This work describes a simple model developed for the authentication of monofloral Yemeni Sidr honey using UV spectroscopy together with chemometric techniques of hierarchical cluster analysis (HCA), principal component analysis (PCA), and soft independent modeling of class analogy (SIMCA). The model was constructed using 13 genuine Sidr honey samples and challenged with 25 honey samples of different botanical origins. HCA and PCA were successfully able to present a preliminary clustering pattern to segregate the genuine Sidr samples from the lower priced local polyfloral and non-Sidr samples. The SIMCA model presented a clear demarcation of the samples and was used to identify genuine Sidr honey samples as well as detect admixture with lower priced polyfloral honey by detection limits >10%. The constructed model presents a simple and efficient method of analysis and may serve as a basis for the authentication of other honey types worldwide.
WINGS-SPE Spectroscopy in the WIde-field Nearby Galaxy-cluster Survey
NASA Astrophysics Data System (ADS)
Cava, A.; Bettoni, D.; Poggianti, B. M.; Couch, W. J.; Moles, M.; Varela, J.; Biviano, A.; D'Onofrio, M.; Dressler, A.; Fasano, G.; Fritz, J.; Kjærgaard, P.; Ramella, M.; Valentinuzzi, T.
2009-03-01
Aims: We present the results from a comprehensive spectroscopic survey of the WINGS (WIde-field Nearby Galaxy-cluster Survey) clusters, a program called WINGS-SPE. The WINGS-SPE sample consists of 48 clusters, 22 of which are in the southern sky and 26 in the north. The main goals of this spectroscopic survey are: (1) to study the dynamics and kinematics of the WINGS clusters and their constituent galaxies, (2) to explore the link between the spectral properties and the morphological evolution in different density environments and across a wide range of cluster X-ray luminosities and optical properties. Methods: Using multi-object fiber-fed spectrographs, we observed our sample of WINGS cluster galaxies at an intermediate resolution of 6-9 Å and, using a cross-correlation technique, we measured redshifts with a mean accuracy of ~45 km s-1. Results: We present redshift measurements for 6137 galaxies and their first analyses. Details of the spectroscopic observations are reported. The WINGS-SPE has ~30% overlap with previously published data sets, allowing us both to perform a complete comparison with the literature and to extend the catalogs. Conclusions: Using our redshifts, we calculate the velocity dispersion for all the clusters in the WINGS-SPE sample. We almost triple the number of member galaxies known in each cluster with respect to previous works. We also investigate the X-ray luminosity vs. velocity dispersion relation for our WINGS-SPE clusters, and find it to be consistent with the form Lx ∝ σ_v^4. Table 4, containing the complete redshift catalog, is only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/495/707
ERIC Educational Resources Information Center
Zahra, Asma-Tuz; Arif, Manzoor H.; Yousuf, Muhammad Imran
2010-01-01
This study investigated relationship between self-concept and academic achievement of bachelor degree students. Female students at bachelor were considered the target population. A sample of 1500 students was selected by using two stage cluster sampling technique. An amended form of Self-Descriptive Questionnaire developed by Marsh (1985) was used…
Implementation of Structured Inquiry Based Model Learning toward Students' Understanding of Geometry
ERIC Educational Resources Information Center
Salim, Kalbin; Tiawa, Dayang Hjh
2015-01-01
The purpose of this study is implementation of a structured inquiry learning model in instruction of geometry. The model used is a model with a quasi-experimental study amounted to two classes of samples selected from the population of the ten classes with cluster random sampling technique. Data collection tool consists of a test item…
Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A
2016-08-05
Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Gad, Haidy A; El-Ahmady, Sherweit H; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M
2013-01-01
Recently, the fields of chemometrics and multivariate analysis have been widely implemented in the quality control of herbal drugs to produce precise results, which is crucial in the field of medicine. Thyme represents an essential medicinal herb that is constantly adulterated due to its resemblance to many other plants with similar organoleptic properties. To establish a simple model for the quality assessment of Thymus species using UV spectroscopy together with known chemometric techniques. The success of this model may also serve as a technique for the quality control of other herbal drugs. The model was constructed using 30 samples of authenticated Thymus vulgaris and challenged with 20 samples of different botanical origins. The methanolic extracts of all samples were assessed using UV spectroscopy together with chemometric techniques: principal component analysis (PCA), soft independent modeling of class analogy (SIMCA) and hierarchical cluster analysis (HCA). The model was able to discriminate T. vulgaris from other Thymus, Satureja, Origanum, Plectranthus and Eriocephalus species, all traded in the Egyptian market as different types of thyme. The model was also able to classify closely related species in clusters using PCA and HCA. The model was finally used to classify 12 commercial thyme varieties into clusters of species incorporated in the model as thyme or non-thyme. The model constructed is highly recommended as a simple and efficient method for distinguishing T. vulgaris from other related species as well as the classification of marketed herbs as thyme or non-thyme. Copyright © 2013 John Wiley & Sons, Ltd.
ASTM clustering for improving coal analysis by near-infrared spectroscopy.
Andrés, J M; Bona, M T
2006-11-15
Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.
Diversity of the Gastric Microbiota in Thoroughbred Racehorses Having Gastric Ulcer.
Dong, Hee-Jin; Ho, Hungwui; Hwang, Hyeshin; Kim, Yongbaek; Han, Janet; Lee, Inhyung; Cho, Seongbeom
2016-04-28
Equine gastric ulcer syndrome is one of the most frequently reported diseases in thoroughbred racehorses. Although several risk factors for the development of gastric ulcers have been widely studied, investigation of microbiological factors has been limited. In this study, the presence of Helicobacter spp. and the gastric microbial communities of thoroughbred racehorses having mild to severe gastric ulcers were investigated. Although Helicobacter spp. were not detected using culture and PCR techniques from 52 gastric biopsies and 52 fecal samples, the genomic sequences of H. pylori and H. ganmani were detected using nextgeneration sequencing techniques from 2 out of 10 representative gastric samples. The gastric microbiota of horses was mainly composed of Firmicutes (50.0%), Proteobacteria (18.7%), Bacteroidetes (14.4%), and Actinobacteria (9.7%), but the proportion of each phylum varied among samples. There was no major difference in microbial composition among samples having mild to severe gastric ulcers. Using phylogenetic analysis, three distinct clusters were observed, and one cluster differed from the other two clusters in the frequency of feeding, amount of water consumption, and type of bedding. To the best of our knowledge, this is the first study to investigate the gastric microbiota of thoroughbred racehorses having gastric ulcer and to evaluate the microbial diversity in relation to the severity of gastric ulcer and management factors. This study is important for further exploration of the gastric microbiota in racehorses and is ultimately applicable to improving animal and human health.
Structure determination in 55-atom Li-Na and Na-K nanoalloys.
Aguado, Andrés; López, José M
2010-09-07
The structure of 55-atom Li-Na and Na-K nanoalloys is determined through combined empirical potential (EP) and density functional theory (DFT) calculations. The potential energy surface generated by the EP model is extensively sampled by using the basin hopping technique, and a wide diversity of structural motifs is reoptimized at the DFT level. A composition comparison technique is applied at the DFT level in order to make a final refinement of the global minimum structures. For dilute concentrations of one of the alkali atoms, the structure of the pure metal cluster, namely, a perfect Mackay icosahedron, remains stable, with the minority component atoms entering the host cluster as substitutional impurities. At intermediate concentrations, the nanoalloys adopt instead a core-shell polyicosahedral (p-Ih) packing, where the element with smaller atomic size and larger cohesive energy segregates to the cluster core. The p-Ih structures show a marked prolate deformation, in agreement with the predictions of jelliumlike models. The electronic preference for a prolate cluster shape, which is frustrated in the 55-atom pure clusters due to the icosahedral geometrical shell closing, is therefore realized only in the 55-atom nanoalloys. An analysis of the electronic densities of states suggests that photoelectron spectroscopy would be a sufficiently sensitive technique to assess the structures of nanoalloys with fixed size and varying compositions.
Progress toward Synthesis and Characterization of Rare-Earth Nanoparticles
NASA Astrophysics Data System (ADS)
Romero, Dulce G.; Ho, Pei-Chun; Attar, Saeed; Margosan, Dennis
2010-03-01
Magnetic nanoparticles exhibit interesting phenomena, such as enhanced magnetization and reduced magnetic ordering temperature (i.e. superparamagnetism), which has technical applications in industry, including magnetic storage, magnetic imaging, and magnetic refrigeration. We used the inverse micelle technique to synthesize Gd and Nd nanoparticles given its potential to control the cluster size, amount of aggregation, and prevent oxidation of the rare-earth elements. Gd and Nd were reduced by NaBH4 from the chloride salt. The produced clusters were characterized by X-ray diffraction (XRD), scanning electron microscopy (SEM), and energy dispersive X-ray spectroscopy (EDX). The results from the XRD show that the majority of the peaks match those of the surfactant, DDAB. No peaks of Gd were observed due to excess surfactant or amorphous clusters. However, the results from the SEM and EDX indicate the presence of Gd and Nd in our clusters microscopically, and current synthesized samples contain impurities. We are using liquid-liquid extraction method to purify the sample, and the results will be discussed.
Model for spectral and chromatographic data
Jarman, Kristin [Richland, WA; Willse, Alan [Richland, WA; Wahl, Karen [Richland, WA; Wahl, Jon [Richland, WA
2002-11-26
A method and apparatus using a spectral analysis technique are disclosed. In one form of the invention, probabilities are selected to characterize the presence (and in another form, also a quantification of a characteristic) of peaks in an indexed data set for samples that match a reference species, and other probabilities are selected for samples that do not match the reference species. An indexed data set is acquired for a sample, and a determination is made according to techniques exemplified herein as to whether the sample matches or does not match the reference species. When quantification of peak characteristics is undertaken, the model is appropriately expanded, and the analysis accounts for the characteristic model and data. Further techniques are provided to apply the methods and apparatuses to process control, cluster analysis, hypothesis testing, analysis of variance, and other procedures involving multiple comparisons of indexed data.
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
A formal concept analysis approach to consensus clustering of multi-experiment expression data
2014-01-01
Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Irregular Breakfast Eating and Associated Health Behaviors: A Pilot Study among College Students
ERIC Educational Resources Information Center
Thiagarajah, Krisha; Torabi, Mohammad R.
2009-01-01
The purpose of this study was to examine prevalence of eating breakfast and associated health compromising behaviors. This study utilized a cross-sectional survey methodology. A purposive cluster sampling technique was utilized to collect data from a representative sample of college students in a Midwestern university in the U.S. A total of 1,257…
The composite sequential clustering technique for analysis of multispectral scanner data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Validation of spot-testing kits to determine iodine content in salt.
Pandav, C. S.; Arora, N. K.; Krishnan, A.; Sankar, R.; Pandav, S.; Karmarkar, M. G.
2000-01-01
Iodine deficiency disorders are a major public health problem, and salt iodization is the most widely practised intervention for their elimination. For the intervention to be successful and sustainable, it is vital to monitor the iodine content of salt regularly. Iodometric titration, the traditional method for measuring iodine content, has problems related to accessibility and cost. The newer spot-testing kits are inexpensive, require minimal training, and provide immediate results. Using data from surveys to assess the availability of iodized salt in two states in India, Madhya Pradesh and the National Capital Territory of Delhi, we tested the suitability of such a kit in field situations. Salt samples from Delhi were collected from 30 schools, chosen using the Expanded Programme on Immunization (EPI) cluster sampling technique. A single observer made the measurement for iodine content using the kit. Salt samples from Madhya Pradesh were from 30 rural and 30 urban clusters, identified by using census data and the EPI cluster sampling technique. In each cluster, salt samples were collected from 10 randomly selected households and all retailers. The 15 investigators performing the survey estimated the iodine content of salt samples in the field using the kit. All the samples were brought to the central laboratory in Delhi, where iodine content was estimated using iodometric titration as a reference method. The agreement between the kit and titration values decreased as the number of observers increased. Although sensitivity was not much affected by the increase in the number of observers (93.3% for a single observer and 93.9% for multiple observers), specificity decreased sharply (90.4% for a single observer and 40.4% for multiple observers). Due to the low specificity and resulting high numbers of false-positives for the kit when used by multiple observers ("real-life situations"), kits were likely to consistently overestimate the availability of iodized salt. This overestimation could result in complacency. Therefore, we conclude that until a valid alternative is available, the titration method should be used for monitoring the iodine content of salt at all levels, from producer to consumer, to ensure effectiveness of the programme. PMID:10994281
On the Connection between Turbulent Motions and Particle Acceleration in Galaxy Clusters
NASA Astrophysics Data System (ADS)
Eckert, D.; Gaspari, M.; Vazza, F.; Gastaldello, F.; Tramacere, A.; Zimmer, S.; Ettori, S.; Paltani, S.
2017-07-01
Giant radio halos are megaparsec-scale diffuse radio sources associated with the central regions of galaxy clusters. The most promising scenario to explain the origin of these sources is that of turbulent re-acceleration, in which MeV electrons injected throughout the formation history of galaxy clusters are accelerated to higher energies by turbulent motions mostly induced by cluster mergers. In this Letter, we use the amplitude of density fluctuations in the intracluster medium as a proxy for the turbulent velocity and apply this technique to a sample of 51 clusters with available radio data. Our results indicate a segregation in the turbulent velocity of radio halo and radio quiet clusters, with the turbulent velocity of the former being on average higher by about a factor of two. The velocity dispersion recovered with this technique correlates with the measured radio power through the relation {P}{radio}\\propto {σ }v3.3+/- 0.7, which implies that the radio power is nearly proportional to the turbulent energy rate. In case turbulence cascades without being dissipated down to the particle acceleration scales, our results provide an observational confirmation of a key prediction of the turbulent re-acceleration model and possibly shed light on the origin of radio halos.
Spectroscopic Confirmation of Five Galaxy Clusters at z > 1.25 in the 2500 deg^2 SPT-SZ Survey
NASA Astrophysics Data System (ADS)
Khullar, Gourav; Bleem, Lindsey; Bayliss, Matthew; Gladders, Michael; South Pole Telescope (SPT) Collaboration
2018-06-01
We present spectroscopic confirmation of 5 galaxy clusters at 1.25 < z < 1.5, discovered in the 2500 deg2 South Pole Telescope Sunyaev-Zel’dovich (SPT-SZ) survey. These clusters, taken from a nearly redshift-independent mass-limited sample of clusters, have multi-wavelength follow-up imaging data from the X-ray to the near-IR, and currently form the most homogenous massive high-redshift cluster sample in existence. We briefly describe the analysis pipeline used on the low S/N spectra of these faint galaxies, and describing the multiple techniques used to extract robust redshifts from a combination of absorption-line (Ca II H&K doublet - λλ3934,3968Å) and emission-line ([OII] λλ3727,3729Å) spectral features. We present several ensemble analyses of cluster member galaxies that demonstrate the reliability of the measured redshifts. We also identify modest [OII] emission and pronounced CN and Hδ absorption in a composite stacked spectrum of 28 low S/N passive galaxy spectra with redshifts derived primarily from Ca II H&K features. This work increases the number of spectroscopically-confirmed SPT-SZ galaxy clusters at z > 1.25 from 2 to 7, further demonstrating the efficacy of SZ selection for the highest redshift massive clusters, and enabling further detailed study of these confirmed systems.
Validating clustering of molecular dynamics simulations using polymer models.
Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn
2011-11-14
Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
Geornaras, Ifigenia; Kunene, Nokuthula F.; von Holy, Alexander; Hastings, John W.
1999-01-01
Molecular typing has been used previously to identify and trace dissemination of pathogenic and spoilage bacteria associated with food processing. Amplified fragment length polymorphism (AFLP) is a novel DNA fingerprinting technique which is considered highly reproducible and has high discriminatory power. This technique was used to fingerprint 88 Pseudomonas fluorescens and Pseudomonas putida strains that were previously isolated from plate counts of carcasses at six processing stages and various equipment surfaces and environmental sources of a poultry abattoir. Clustering of the AFLP patterns revealed a high level of diversity among the strains. Six clusters (clusters I through VI) were delineated at an arbitrary Dice coefficient level of 0.65; clusters III (31 strains) and IV (28 strains) were the largest clusters. More than one-half (52.3%) of the strains obtained from carcass samples, which may have represented the resident carcass population, grouped together in cluster III. By contrast, 43.2% of the strains from most of the equipment surfaces and environmental sources grouped together in cluster IV. In most cases, the clusters in which carcass strains from processing stages grouped corresponded to the clusters in which strains from the associated equipment surfaces and/or environmental sources were found. This provided evidence that there was cross-contamination between carcasses and the abattoir environment at the DNA level. The AFLP data also showed that strains were being disseminated from the beginning to the end of the poultry processing operation, since many strains associated with carcasses at the packaging stage were members of the same clusters as strains obtained from carcasses after the defeathering stage. PMID:10473382
Geornaras, I; Kunene, N F; von Holy, A; Hastings, J W
1999-09-01
Molecular typing has been used previously to identify and trace dissemination of pathogenic and spoilage bacteria associated with food processing. Amplified fragment length polymorphism (AFLP) is a novel DNA fingerprinting technique which is considered highly reproducible and has high discriminatory power. This technique was used to fingerprint 88 Pseudomonas fluorescens and Pseudomonas putida strains that were previously isolated from plate counts of carcasses at six processing stages and various equipment surfaces and environmental sources of a poultry abattoir. Clustering of the AFLP patterns revealed a high level of diversity among the strains. Six clusters (clusters I through VI) were delineated at an arbitrary Dice coefficient level of 0.65; clusters III (31 strains) and IV (28 strains) were the largest clusters. More than one-half (52.3%) of the strains obtained from carcass samples, which may have represented the resident carcass population, grouped together in cluster III. By contrast, 43.2% of the strains from most of the equipment surfaces and environmental sources grouped together in cluster IV. In most cases, the clusters in which carcass strains from processing stages grouped corresponded to the clusters in which strains from the associated equipment surfaces and/or environmental sources were found. This provided evidence that there was cross-contamination between carcasses and the abattoir environment at the DNA level. The AFLP data also showed that strains were being disseminated from the beginning to the end of the poultry processing operation, since many strains associated with carcasses at the packaging stage were members of the same clusters as strains obtained from carcasses after the defeathering stage.
Spike sorting based upon machine learning algorithms (SOMA).
Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F
2007-02-15
We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.
Sethi, Suresh; Linden, Daniel; Wenburg, John; Lewis, Cara; Lemons, Patrick R.; Fuller, Angela K.; Hare, Matthew P.
2016-01-01
Error-tolerant likelihood-based match calling presents a promising technique to accurately identify recapture events in genetic mark–recapture studies by combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Combined with clustering algorithms to group samples into sets of recaptures based upon pairwise match calls, these tools can be used to reconstruct accurate capture histories for mark–recapture modelling. Here, we assess the performance of a recently introduced error-tolerant likelihood-based match-calling model and sample clustering algorithm for genetic mark–recapture studies. We assessed both biallelic (i.e. single nucleotide polymorphisms; SNP) and multiallelic (i.e. microsatellite; MSAT) markers using a combination of simulation analyses and case study data on Pacific walrus (Odobenus rosmarus divergens) and fishers (Pekania pennanti). A novel two-stage clustering approach is demonstrated for genetic mark–recapture applications. First, repeat captures within a sampling occasion are identified. Subsequently, recaptures across sampling occasions are identified. The likelihood-based matching protocol performed well in simulation trials, demonstrating utility for use in a wide range of genetic mark–recapture studies. Moderately sized SNP (64+) and MSAT (10–15) panels produced accurate match calls for recaptures and accurate non-match calls for samples from closely related individuals in the face of low to moderate genotyping error. Furthermore, matching performance remained stable or increased as the number of genetic markers increased, genotyping error notwithstanding.
Maljovec, D.; Liu, S.; Wang, B.; ...
2015-07-14
Here, dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP and MELCOR) with simulation controller codes (e.g., RAVEN and ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic and operating procedures) and stochastic (e.g., component failures and parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated,more » where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other for enhanced structural understanding of the data.« less
A Fast Projection-Based Algorithm for Clustering Big Data.
Wu, Yun; He, Zhiquan; Lin, Hao; Zheng, Yufei; Zhang, Jingfen; Xu, Dong
2018-06-07
With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.
Adaptive Water Sampling based on Unsupervised Clustering
NASA Astrophysics Data System (ADS)
Py, F.; Ryan, J.; Rajan, K.; Sherman, A.; Bird, L.; Fox, M.; Long, D.
2007-12-01
Autonomous Underwater Vehicles (AUVs) are widely used for oceanographic surveys, during which data is collected from a number of on-board sensors. Engineers and scientists at MBARI have extended this approach by developing a water sampler specialy for the AUV, which can sample a specific patch of water at a specific time. The sampler, named the Gulper, captures 2 liters of seawater in less than 2 seconds on a 21" MBARI Odyssey AUV. Each sample chamber of the Gulper is filled with seawater through a one-way valve, which protrudes through the fairing of the AUV. This new kind of device raises a new problem: when to trigger the gulper autonomously? For example, scientists interested in studying the mobilization and transport of shelf sediments would like to detect intermediate nepheloïd layers (INLs). To be able to detect this phenomenon we need to extract a model based on AUV sensors that can detect this feature in-situ. The formation of such a model is not obvious as identification of this feature is generally based on data from multiple sensors. We have developed an unsupervised data clustering technique to extract the different features which will then be used for on-board classification and triggering of the Gulper. We use a three phase approach: 1) use data from past missions to learn the different classes of data from sensor inputs. The clustering algorithm will then extract the set of features that can be distinguished within this large data set. 2) Scientists on shore then identify these features and point out which correspond to those of interest (e.g. nepheloïd layer, upwelling material etc) 3) Embed the corresponding classifier into the AUV control system to indicate the most probable feature of the water depending on sensory input. The triggering algorithm looks to this result and triggers the Gulper if the classifier indicates that we are within the feature of interest with a predetermined threshold of confidence. We have deployed this method of online classification and sampling based on AUV depth and HOBI Labs Hydroscat-2 sensor data. Using approximately 20,000 data samples the clustering algorithm generated 14 clusters with one identified as corresponding to a nepheloïd layer. We demonstrate that such a technique can be used to reliably and efficiently sample water based on multiple sources of data in real-time.
Joint fMRI analysis and subject clustering using sparse dictionary learning
NASA Astrophysics Data System (ADS)
Kim, Seung-Jun; Dontaraju, Krishna K.
2017-08-01
Multi-subject fMRI data analysis methods based on sparse dictionary learning are proposed. In addition to identifying the component spatial maps by exploiting the sparsity of the maps, clusters of the subjects are learned by postulating that the fMRI volumes admit a subspace clustering structure. Furthermore, in order to tune the associated hyper-parameters systematically, a cross-validation strategy is developed based on entry-wise sampling of the fMRI dataset. Efficient algorithms for solving the proposed constrained dictionary learning formulations are developed. Numerical tests performed on synthetic fMRI data show promising results and provides insights into the proposed technique.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sehgal, Ray M.; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu, E-mail: ford@ecs.umass.edu; Ford, David M., E-mail: maroudas@ecs.umass.edu, E-mail: ford@ecs.umass.edu
We have developed a coarse-grained description of the phase behavior of the isolated 38-atom Lennard-Jones cluster (LJ{sub 38}). The model captures both the solid-solid polymorphic transitions at low temperatures and the complex cluster breakup and melting transitions at higher temperatures. For this coarse model development, we employ the manifold learning technique of diffusion mapping. The outcome of the diffusion mapping analysis over a broad temperature range indicates that two order parameters are sufficient to describe the cluster's phase behavior; we have chosen two such appropriate order parameters that are metrics of condensation and overall crystallinity. In this well-justified coarse-variable space,more » we calculate the cluster's free energy landscape (FEL) as a function of temperature, employing Monte Carlo umbrella sampling. These FELs are used to quantify the phase behavior and onsets of phase transitions of the LJ{sub 38} cluster.« less
NASA Astrophysics Data System (ADS)
Jusinski, Leonard E.; Bahuguna, Ramen; Das, Amrita; Arya, Karamjeet
2006-02-01
Surface enhanced Raman spectroscopy has become a viable technique for the detection of single molecules. This highly sensitive technique is due to the very large (up to 14 orders in magnitude) enhancement in the Raman cross section when the molecule is adsorbed on a metal nanoparticle cluster. We report here SERS (Surface Enhanced Raman Spectroscopy) experiments performed by adsorbing analyte molecules on nanoscale silver particle clusters within the gelatin layer of commercially available holographic plates which have been developed and fixed. The Ag particles range in size between 5 - 30 nanometers (nm). Sample preparation was performed by immersing the prepared holographic plate in an analyte solution for a few minutes. We report here the production of SERS signals from Rhodamine 6G (R6G) molecules of nanomolar concentration. These measurements demonstrate a fast, low cost, reproducible technique of producing SERS substrates in a matter of minutes compared to the conventional procedure of preparing Ag clusters from colloidal solutions. SERS active colloidal solutions require up to a full day to prepare. In addition, the preparations of colloidal aggregates are not consistent in shape, contain additional interfering chemicals, and do not generate consistent SERS enhancement. Colloidal solutions require the addition of KCl or NaCl to increase the ionic strength to allow aggregation and cluster formation. We find no need to add KCl or NaCl to create SERS active clusters in the holographic gelatin matrix. These holographic plates, prepared using simple, conventional procedures, can be stored in an inert environment and preserve SERS activity after several weeks subsequent to preparation.
An unsupervised classification technique for multispectral remote sensing data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Cummings, R. E.
1973-01-01
Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.
Planck intermediate results. XLIII. Spectral energy distribution of dust in clusters of galaxies
NASA Astrophysics Data System (ADS)
Planck Collaboration; Adam, R.; Ade, P. A. R.; Aghanim, N.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bersanelli, M.; Bielewicz, P.; Bikmaev, I.; Bonaldi, A.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Burenin, R.; Burigana, C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Chiang, H. C.; Christensen, P. R.; Churazov, E.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dole, H.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Elsner, F.; Enßlin, T. A.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Galeotta, S.; Ganga, K.; Génova-Santos, R. T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Harrison, D. L.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Hornstrup, A.; Hovest, W.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Keihänen, E.; Keskitalo, R.; Khamitov, I.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Macías-Pérez, J. F.; Maffei, B.; Maggio, G.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; Melchiorri, A.; Mennella, A.; Migliaccio, M.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Nørgaard-Nielsen, H. U.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Pagano, L.; Pajot, F.; Paoletti, D.; Pasian, F.; Perdereau, O.; Perotto, L.; Pettorino, V.; Piacentini, F.; Piat, M.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Pratt, G. W.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Valenziano, L.; Valiviita, J.; Van Tent, F.; Vielva, P.; Villa, F.; Wade, L. A.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zonca, A.
2016-12-01
Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objects from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck's resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. The recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.
The panchromatic Hubble Andromeda Treasury. V. Ages and masses of the year 1 stellar clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fouesneau, Morgan; Johnson, L. Clifton; Weisz, Daniel R.
We present ages and masses for 601 star clusters in M31 from the analysis of the six filter integrated light measurements from near-ultraviolet to near-infrared wavelengths, made as part of the Panchromatic Hubble Andromeda Treasury (PHAT). We derive the ages and masses using a probabilistic technique, which accounts for the effects of stochastic sampling of the stellar initial mass function. Tests on synthetic data show that this method, in conjunction with the exquisite sensitivity of the PHAT observations and their broad wavelength baseline, provides robust age and mass recovery for clusters ranging from ∼10{sup 2} to 2 × 10{sup 6}more » M {sub ☉}. We find that the cluster age distribution is consistent with being uniform over the past 100 Myr, which suggests a weak effect of cluster disruption within M31. The age distribution of older (>100 Myr) clusters falls toward old ages, consistent with a power-law decline of index –1, likely from a combination of fading and disruption of the clusters. We find that the mass distribution of the whole sample can be well described by a single power law with a spectral index of –1.9 ± 0.1 over the range of 10{sup 3}-3 × 10{sup 5} M {sub ☉}. However, if we subdivide the sample by galactocentric radius, we find that the age distributions remain unchanged. However, the mass spectral index varies significantly, showing best-fit values between –2.2 and –1.8, with the shallower slope in the highest star formation intensity regions. We explore the robustness of our study to potential systematics and conclude that the cluster mass function may vary with respect to environment.« less
Planck intermediate results: XLIII. Spectral energy distribution of dust in clusters of galaxies
Adam, R.; Ade, P. A. R.; Aghanim, N.; ...
2016-12-12
Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide in this paper new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objectsmore » from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck’s resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. Finally, the recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.« less
Chirped Pulse Rotational Spectroscopy of a Single THUJONE+WATER Sample
NASA Astrophysics Data System (ADS)
Kisiel, Zbigniew; Perez, Cristobal; Schnell, Melanie
2016-06-01
Rotational spectroscopy of natural products dates over 35 years when six different species including thujone were investigated. Nevertheless, the technique of low-resolution microwave spectroscopy employed therein allowed determination of only a single conformational parameter. Advances in sensitivity and resolution possible with supersonic expansion techniques of rotational spectroscopy made possible much more detailed studies such that, for example, the structures of first camphor, and then of multiple clusters of camphor with water were determined. We revisited the rotational spectrum of the well known thujone molecule by using the chirped pulse spectrometer in Hamburg. The spectrum of a single thujone sample was recorded with an admixture of 18O enriched water and was successively analysed using an array of techniques, including the AUTOFIT program, the AABS package and the STRFIT program. We have, so far, been able to assign rotational transitions of α-thujone, β-thujone, another thujone isomer, fenchone, and several thujone-water clusters in the spectrum of this single sample. Natural abundance molecular populations were sufficient to determine precise heavy atom backbones of thujone and fenchone, and H_218O enrichment delivered water molecule orientations in the hydrated clusters. An overview of these results will be presented. Z.Kisiel, A.C.Legon, JACS 100, 8166 (1978) Z.Kisiel, O.Desyatnyk, E.Białkowska-Jaworska, L.Pszczółkowski, PCCP 5 820 (2003) C.Pérez, A.Krin, A.L.Steber, J.C.López, Z.Kisiel, M.Schnell, J.Phys.Chem.Lett. 7 154 (2016) N.A.Seifert, I.A.Finneran, C.Perez, et al. J.Mol.Spectrosc. 312, 12 (2015) Z.Kisiel, L.Pszczółkowski, B.J.Drouin, et al. J.Mol.Spectrosc. 280, 134 (2012). Z.Kisiel, J.Mol.Spectrosc. 218, 58 (2003)
NASA Astrophysics Data System (ADS)
Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.
2014-06-01
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
NASA Astrophysics Data System (ADS)
Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.
2014-11-01
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Albetel, Angela-Nadia; Outten, Caryn E
2018-01-01
Monothiol glutaredoxins (Grxs) with a conserved Cys-Gly-Phe-Ser (CGFS) active site are iron-sulfur (Fe-S) cluster-binding proteins that interact with a variety of partner proteins and perform crucial roles in iron metabolism including Fe-S cluster transfer, Fe-S cluster repair, and iron signaling. Various analytical and spectroscopic methods are currently being used to monitor and characterize glutaredoxin Fe-S cluster-dependent interactions at the molecular level. The electronic, magnetic, and vibrational properties of the protein-bound Fe-S cluster provide a convenient handle to probe the structure, function, and coordination chemistry of Grx complexes. However, some limitations arise from sample preparation requirements, complexity of individual techniques, or the necessity for combining multiple methods in order to achieve a complete investigation. In this chapter, we focus on the use of UV-visible circular dichroism spectroscopy as a fast and simple initial approach for investigating glutaredoxin Fe-S cluster-dependent interactions. © 2018 Elsevier Inc. All rights reserved.
Kitsos, Christine M; Bhamidipati, Phani; Melnikova, Irena; Cash, Ethan P; McNulty, Chris; Furman, Julia; Cima, Michael J; Levinson, Douglas
2007-01-01
This study examined whether hierarchical clustering could be used to detect cell states induced by treatment combinations that were generated through automation and high-throughput (HT) technology. Data-mining techniques were used to analyze the large experimental data sets to determine whether nonlinear, non-obvious responses could be extracted from the data. Unary, binary, and ternary combinations of pharmacological factors (examples of stimuli) were used to induce differentiation of HL-60 cells using a HT automated approach. Cell profiles were analyzed by incorporating hierarchical clustering methods on data collected by flow cytometry. Data-mining techniques were used to explore the combinatorial space for nonlinear, unexpected events. Additional small-scale, follow-up experiments were performed on cellular profiles of interest. Multiple, distinct cellular profiles were detected using hierarchical clustering of expressed cell-surface antigens. Data-mining of this large, complex data set retrieved cases of both factor dominance and cooperativity, as well as atypical cellular profiles. Follow-up experiments found that treatment combinations producing "atypical cell types" made those cells more susceptible to apoptosis. CONCLUSIONS Hierarchical clustering and other data-mining techniques were applied to analyze large data sets from HT flow cytometry. From each sample, the data set was filtered and used to define discrete, usable states that were then related back to their original formulations. Analysis of resultant cell populations induced by a multitude of treatments identified unexpected phenotypes and nonlinear response profiles.
HST observations of globular clusters in M 31. 1: Surface photometry of 13 objects
NASA Technical Reports Server (NTRS)
Pecci, F. Fusi; Battistini, P.; Bendinelli, O.; Bonoli, F.; Cacciari, C.; Djorgovski, S.; Federici, L.; Ferraro, F. R.; Parmeggiani, G.; Weir, N.
1994-01-01
We present the initial results of a study of globular clusters in M 31, using the Faint Object Camera (FOC) on the Hubble Space Telescope (HST). The sample of objects consists of 13 clusters spanning a range of properties. Three independent image deconvolution techniques were used in order to compensate for the optical problems of the HST, leading to mutually fully consistent results. We present detailed tests and comparisons to determine the reliability and limits of these deconvolution methods, and conclude that high-quality surface photometry of M 31 globulars is possible with the HST data. Surface brightness profiles have been extracted, and core radii, half-light radii, and central surface brightness values have been measured for all of the clusters in the sample. Their comparison with the values from ground-based observations indicates the later to be systematically and strongly biased by the seeing effects, as it may be expected. A comparison of the structural parameters with those of the Galactic globulars shows that the structural properties of the M 31 globulars are very similar to those of their Galactic counterparts. A candidate for a post-core-collapse cluster, Bo 343 = G 105, has been already identified from these data; this is the first such detection in the M 31 globular cluster system.
Cluster Models of Metal-Seeded Energetic Materials
1997-01-31
cannot be formed by this plasma chemistry because the metals are less reactive. Plasma chemistry reactions for these metals lead to addition to... plasma chemistry method, but they are produced readily from composite sample (metal film on carbon rod) vaporization. Another technique we have used with
In vivo testing for gold nanoparticle toxicity.
Simpson, Carrie A; Huffman, Brian J; Cliffel, David E
2013-01-01
A technique for measuring the toxicity of nanomaterials using a murine model is described. Blood samples are collected via submandibular bleeding while urine samples are collected on cellophane sheets. Both biosamples are then analyzed by inductively coupled plasma optical emission spectroscopy (ICP-OES) for nanotoxicity. Blood samples are further tested for immunological response using a standard Coulter counter. The major organs of interest for filtration are also digested and analyzed via ICP-OES, producing useful information regarding target specificity of the nanomaterial of interest. Collection of the biosamples and analysis afterward is detailed, and the operation of the technique is described and illustrated by analysis of the nanotoxicity of an injection of a modified tiopronin monolayer-protected cluster.
NASA Astrophysics Data System (ADS)
Burns, Jack O.; Datta, Abhirup; Hallman, Eric J.
2016-06-01
Galaxy clusters are assembled through large and small mergers which are the most energetic events ("bangs") since the Big Bang. Cluster mergers "stir" the intracluster medium (ICM) creating shocks and turbulence which are illuminated by ~Mpc-sized radio features called relics and halos. These shocks heat the ICM and are detected in x-rays via thermal emission. Disturbed morphologies in x-ray surface brightness and temperatures are direct evidence for cluster mergers. In the radio, relics (in the outskirts of the clusters) and halos (located near the cluster core) are also clear signposts of recent mergers. Our recent ENZO cosmological simulations suggest that around a merger event, radio emission peaks very sharply (and briefly) while the x-ray emission rises and decays slowly. Hence, a sample of galaxy clusters that shows both luminous x-ray emission and radio relics/halos are good candidates for very recent mergers. We are in the early stages of analyzing a unique sample of 48 galaxy clusters with (i) known radio relics and/or halos and (ii) significant archival x-ray observations (>50 ksec) from Chandra and/or XMM. We have developed a new x-ray data analysis pipeline, implemented on parallel processor supercomputers, to create x-ray surface brightness, high fidelity temperature, and pressure maps of these clusters in order to study merging activity. The temperature maps are made using three different map-making techniques: Weighted Voronoi Tessellation, Adaptive Circular Binning, and Contour Binning. In this talk, we will show preliminary results for several clusters, including Abell 2744 and the Bullet cluster. This work is supported by NASA ADAP grant NNX15AE17G.
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Automated cloud screening of AVHRR imagery using split-and-merge clustering
NASA Technical Reports Server (NTRS)
Gallaudet, Timothy C.; Simpson, James J.
1991-01-01
Previous methods to segment clouds from ocean in AVHRR imagery have shown varying degrees of success, with nighttime approaches being the most limited. An improved method of automatic image segmentation, the principal component transformation split-and-merge clustering (PCTSMC) algorithm, is presented and applied to cloud screening of both nighttime and daytime AVHRR data. The method combines spectral differencing, the principal component transformation, and split-and-merge clustering to sample objectively the natural classes in the data. This segmentation method is then augmented by supervised classification techniques to screen clouds from the imagery. Comparisons with other nighttime methods demonstrate its improved capability in this application. The sensitivity of the method to clustering parameters is presented; the results show that the method is insensitive to the split-and-merge thresholds.
Image processing for x-ray inspection of pistachio nuts
NASA Astrophysics Data System (ADS)
Casasent, David P.
2001-03-01
A review is provided of image processing techniques that have been applied to the inspection of pistachio nuts using X-ray images. X-ray sensors provide non-destructive internal product detail not available from other sensors. The primary concern in this data is detecting the presence of worm infestations in nuts, since they have been linked to the presence of aflatoxin. We describe new techniques for segmentation, feature selection, selection of product categories (clusters), classifier design, etc. Specific novel results include: a new segmentation algorithm to produce images of isolated product items; preferable classifier operation (the classifier with the best probability of correct recognition Pc is not best); higher-order discrimination information is present in standard features (thus, high-order features appear useful); classifiers that use new cluster categories of samples achieve improved performance. Results are presented for X-ray images of pistachio nuts; however, all techniques have use in other product inspection applications.
Unsupervised classification of earth resources data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.
1972-01-01
A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.
On the Analysis of Clustering in an Irradiated Low Alloy Reactor Pressure Vessel Steel Weld.
Lindgren, Kristina; Stiller, Krystyna; Efsing, Pål; Thuvander, Mattias
2017-04-01
Radiation induced clustering affects the mechanical properties, that is the ductile to brittle transition temperature (DBTT), of reactor pressure vessel (RPV) steel of nuclear power plants. The combination of low Cu and high Ni used in some RPV welds is known to further enhance the DBTT shift during long time operation. In this study, RPV weld samples containing 0.04 at% Cu and 1.6 at% Ni were irradiated to 2.0 and 6.4×1023 n/m2 in the Halden test reactor. Atom probe tomography (APT) was applied to study clustering of Ni, Mn, Si, and Cu. As the clusters are in the nanometer-range, APT is a very suitable technique for this type of study. From APT analyses information about size distribution, number density, and composition of the clusters can be obtained. However, the quantification of these attributes is not trivial. The maximum separation method (MSM) has been used to characterize the clusters and a detailed study about the influence of the choice of MSM cluster parameters, primarily on the cluster number density, has been undertaken.
Nanoparticle formation of deposited Agn-clusters on free-standing graphene
NASA Astrophysics Data System (ADS)
Al-Hada, M.; Peters, S.; Gregoratti, L.; Amati, M.; Sezen, H.; Parisse, P.; Selve, S.; Niermann, T.; Berger, D.; Neeb, M.; Eberhardt, W.
2017-11-01
Size-selected Agn-clusters on unsupported graphene of a commercial Quantifoil sample have been investigated by surface and element-specific techniques such as transmission electron microscopy (TEM), spatially-resolved inner-shell X-ray photoelectron spectroscopy (XPS) and Auger electron spectroscopy (AES). An agglomeration of the highly mobile clusters into nm-sized Ag-nanodots of 2-3 nm is observed. Moreover, crystalline as well as non-periodic fivefold symmetric structures of the Ag-nanoparticles are evident by high-resolution TEM. Using a lognormal size-distribution as revealed by TEM, the measured positive binding energy shift of the air-exposed Ag-nanodots can be explained by the size-dependent dynamical liquid-drop model.
Large Scale Structures in the GOODS-SOUTH Field up to z~2.5
NASA Astrophysics Data System (ADS)
Trevese, D.; Castellano, M.; Salimbeni, S.; Pentericci, L.; Fiore, F.
2009-05-01
We apply a density evaluation technique based on photometric redshifts, developed by our group, to estimate galaxy space density on the deep (z450~26) multi-wavelength GOODS-MUSIC catalogue. We find several groups and clusters in the redshift range 0.4-2.5. We present here an outline of the X-ray properties of our cluster sample as computed from the Chandra 2Ms data. A group at z = 0.96 could be associated to an extended X-ray source, while two clusters with masses of few times 1014Msolar have upper limits on their X-ray emission significantly lower than expected from their optical properties.
A nonparametric clustering technique which estimates the number of clusters
NASA Technical Reports Server (NTRS)
Ramey, D. B.
1983-01-01
In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
NASA Technical Reports Server (NTRS)
Smedes, H. W.; Linnerud, H. J.; Woolaver, L. B.; Su, M. Y.; Jayroe, R. R.
1972-01-01
Two clustering techniques were used for terrain mapping by computer of test sites in Yellowstone National Park. One test was made with multispectral scanner data using a composite technique which consists of (1) a strictly sequential statistical clustering which is a sequential variance analysis, and (2) a generalized K-means clustering. In this composite technique, the output of (1) is a first approximation of the cluster centers. This is the input to (2) which consists of steps to improve the determination of cluster centers by iterative procedures. Another test was made using the three emulsion layers of color-infrared aerial film as a three-band spectrometer. Relative film densities were analyzed using a simple clustering technique in three-color space. Important advantages of the clustering technique over conventional supervised computer programs are (1) human intervention, preparation time, and manipulation of data are reduced, (2) the computer map, gives unbiased indication of where best to select the reference ground control data, (3) use of easy to obtain inexpensive film, and (4) the geometric distortions can be easily rectified by simple standard photogrammetric techniques.
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
The Influence of Educational Systems on the Academic Performance of JSCE Students in Rivers State
ERIC Educational Resources Information Center
Orluwene, Goodness W.; Igwe, Benjamin N.
2015-01-01
This work is a comparative study of JSCE results between the 6-3-3-4 system (2006 & 2008) and the 9-3-4 (UBE) system (2009 & 2011) in Port Harcourt using a comparative/evaluative survey design. A cluster sampling technique was used to compose a sample of 2,487 drawn from the population of 17,139 candidates in 2006, 2008, 2009 and 2011 in…
NASA Astrophysics Data System (ADS)
He, Cunfu; Yang, Meng; Liu, Xiucheng; Wang, Xueqian; Wu, Bin
2017-11-01
The magnetic hysteresis behaviours of ferromagnetic materials vary with the heat treatment-induced micro-structural changes. In the study, the minor hysteresis loop measurement technique was used to quantitatively characterise the case depth in two types of medium carbon steels. Firstly, high-frequency induction quenching was applied in rod samples to increase the volume fraction of hard martensite to the soft ferrite/pearlite (or sorbite) in the sample surface. In order to determine the effective and total case depth, a complementary error function was employed to fit the measured hardness-depth profiles of induction-hardened samples. The cluster of minor hysteresis loops together with the tangential magnetic field (TMF) were recorded from all the samples and the comparative study was conducted among three kinds of magnetic parameters, which were sensitive to the variation of case depth. Compared to the parameters extracted from an individual minor loop and the distortion factor of the TMF, the magnitude of three-order harmonic of TMF was more suitable to indicate the variation in case depth. Two new minor-loop coefficients were introduced by combining two magnetic parameters with cumulative statistics of the cluster of minor-loops. The experimental results showed that the two coefficients monotonically linearly varied with the case depth within the carefully selected magnetisation region.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
NASA Technical Reports Server (NTRS)
Glick, B. J.
1985-01-01
Techniques for classifying objects into groups or clases go under many different names including, most commonly, cluster analysis. Mathematically, the general problem is to find a best mapping of objects into an index set consisting of class identifiers. When an a priori grouping of objects exists, the process of deriving the classification rules from samples of classified objects is known as discrimination. When such rules are applied to objects of unknown class, the process is denoted classification. The specific problem addressed involves the group classification of a set of objects that are each associated with a series of measurements (ratio, interval, ordinal, or nominal levels of measurement). Each measurement produces one variable in a multidimensional variable space. Cluster analysis techniques are reviewed and methods for incuding geographic location, distance measures, and spatial pattern (distribution) as parameters in clustering are examined. For the case of patterning, measures of spatial autocorrelation are discussed in terms of the kind of data (nominal, ordinal, or interval scaled) to which they may be applied.
Eye-gaze determination of user intent at the computer interface
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goldberg, J.H.; Schryver, J.C.
1993-12-31
Determination of user intent at the computer interface through eye-gaze monitoring can significantly aid applications for the disabled, as well as telerobotics and process control interfaces. Whereas current eye-gaze control applications are limited to object selection and x/y gazepoint tracking, a methodology was developed here to discriminate a more abstract interface operation: zooming-in or out. This methodology first collects samples of eve-gaze location looking at controlled stimuli, at 30 Hz, just prior to a user`s decision to zoom. The sample is broken into data frames, or temporal snapshots. Within a data frame, all spatial samples are connected into a minimummore » spanning tree, then clustered, according to user defined parameters. Each cluster is mapped to one in the prior data frame, and statistics are computed from each cluster. These characteristics include cluster size, position, and pupil size. A multiple discriminant analysis uses these statistics both within and between data frames to formulate optimal rules for assigning the observations into zooming, zoom-out, or no zoom conditions. The statistical procedure effectively generates heuristics for future assignments, based upon these variables. Future work will enhance the accuracy and precision of the modeling technique, and will empirically test users in controlled experiments.« less
Yin, Zhong; Zhang, Jianhua
2014-07-01
Identifying the abnormal changes of mental workload (MWL) over time is quite crucial for preventing the accidents due to cognitive overload and inattention of human operators in safety-critical human-machine systems. It is known that various neuroimaging technologies can be used to identify the MWL variations. In order to classify MWL into a few discrete levels using representative MWL indicators and small-sized training samples, a novel EEG-based approach by combining locally linear embedding (LLE), support vector clustering (SVC) and support vector data description (SVDD) techniques is proposed and evaluated by using the experimentally measured data. The MWL indicators from different cortical regions are first elicited by using the LLE technique. Then, the SVC approach is used to find the clusters of these MWL indicators and thereby to detect MWL variations. It is shown that the clusters can be interpreted as the binary class MWL. Furthermore, a trained binary SVDD classifier is shown to be capable of detecting slight variations of those indicators. By combining the two schemes, a SVC-SVDD framework is proposed, where the clear-cut (smaller) cluster is detected by SVC first and then a subsequent SVDD model is utilized to divide the overlapped (larger) cluster into two classes. Finally, three-class MWL levels (low, normal and high) can be identified automatically. The experimental data analysis results are compared with those of several existing methods. It has been demonstrated that the proposed framework can lead to acceptable computational accuracy and has the advantages of both unsupervised and supervised training strategies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adam, R.; Ade, P. A. R.; Aghanim, N.
Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide in this paper new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objectsmore » from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck’s resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. Finally, the recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.« less
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
S-CNN: Subcategory-aware convolutional networks for object detection.
Chen, Tao; Lu, Shijian; Fan, Jiayuan
2017-09-26
The marriage between the deep convolutional neural network (CNN) and region proposals has made breakthroughs for object detection in recent years. While the discriminative object features are learned via a deep CNN for classification, the large intra-class variation and deformation still limit the performance of the CNN based object detection. We propose a subcategory-aware CNN (S-CNN) to solve the object intra-class variation problem. In the proposed technique, the training samples are first grouped into multiple subcategories automatically through a novel instance sharing maximum margin clustering process. A multi-component Aggregated Channel Feature (ACF) detector is then trained to produce more latent training samples, where each ACF component corresponds to one clustered subcategory. The produced latent samples together with their subcategory labels are further fed into a CNN classifier to filter out false proposals for object detection. An iterative learning algorithm is designed for the joint optimization of image subcategorization, multi-component ACF detector, and subcategory-aware CNN classifier. Experiments on INRIA Person dataset, Pascal VOC 2007 dataset and MS COCO dataset show that the proposed technique clearly outperforms the state-of-the-art methods for generic object detection.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bellini, A.; Anderson, J.; Van der Marel, R. P.
We present the first study of high-precision internal proper motions (PMs) in a large sample of globular clusters, based on Hubble Space Telescope (HST) data obtained over the past decade with the ACS/WFC, ACS/HRC, and WFC3/UVIS instruments. We determine PMs for over 1.3 million stars in the central regions of 22 clusters, with a median number of ∼60,000 stars per cluster. These PMs have the potential to significantly advance our understanding of the internal kinematics of globular clusters by extending past line-of-sight (LOS) velocity measurements to two- or three-dimensional velocities, lower stellar masses, and larger sample sizes. We describe themore » reduction pipeline that we developed to derive homogeneous PMs from the very heterogeneous archival data. We demonstrate the quality of the measurements through extensive Monte Carlo simulations. We also discuss the PM errors introduced by various systematic effects and the techniques that we have developed to correct or remove them to the extent possible. We provide in electronic form the catalog for NGC 7078 (M 15), which consists of 77,837 stars in the central 2.'4. We validate the catalog by comparison with existing PM measurements and LOS velocities and use it to study the dependence of the velocity dispersion on radius, stellar magnitude (or mass) along the main sequence, and direction in the plane of the sky (radial or tangential). Subsequent papers in this series will explore a range of applications in globular-cluster science and will also present the PM catalogs for the other sample clusters.« less
Conductive Atomic Force Microscopy | Materials Science | NREL
electrical measurement techniques is the high spatial resolution. For example, C-AFM measurements on : High-resolution image of a sample semiconductor device; the image shows white puff-like clusters on a dark background and was obtained using atomic force microscopy. Bottom: High-resolution image of the
The Relationship between Affective and Social Isolation among Undergraduate Students
ERIC Educational Resources Information Center
Alghraibeh, Ahmad M.; Juieed, Noof M. Bni
2018-01-01
We examined the correlation between social isolation and affective isolation among 457 undergraduate students using a stratified cluster sampling technique. Participants comprised 221 men and 236 women, all of whom were either first- or fourth-year students enrolled in various majors at King Saud University. Means, standard deviations, Pearson…
Mathematical Intelligence and Mathematical Creativity: A Causal Relationship
ERIC Educational Resources Information Center
Tyagi, Tarun Kumar
2017-01-01
This study investigated the causal relationship between mathematical creativity and mathematical intelligence. Four hundred thirty-nine 8th-grade students, age ranged from 11 to 14 years, were included in the sample of this study by random cluster technique on which mathematical creativity and Hindi adaptation of mathematical intelligence test…
A statistical software tool, Stream Fish Community Predictor (SFCP), based on EMAP stream sampling in the mid-Atlantic Highlands, was developed to predict stream fish communities using stream and watershed characteristics. Step one in the tool development was a cluster analysis t...
Data Mining Methods for Recommender Systems
NASA Astrophysics Data System (ADS)
Amatriain, Xavier; Jaimes*, Alejandro; Oliver, Nuria; Pujol, Josep M.
In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.
Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003
NASA Astrophysics Data System (ADS)
Di Salvo, Roberto; Montalto, Placido; Nunnari, Giuseppe; Neri, Marco; Puglisi, Giuseppe
2013-02-01
Time series clustering is an important task in data analysis issues in order to extract implicit, previously unknown, and potentially useful information from a large collection of data. Finding useful similar trends in multivariate time series represents a challenge in several areas including geophysics environment research. While traditional time series analysis methods deal only with univariate time series, multivariate time series analysis is a more suitable approach in the field of research where different kinds of data are available. Moreover, the conventional time series clustering techniques do not provide desired results for geophysical datasets due to the huge amount of data whose sampling rate is different according to the nature of signal. In this paper, a novel approach concerning geophysical multivariate time series clustering is proposed using dynamic time series segmentation and Self Organizing Maps techniques. This method allows finding coupling among trends of different geophysical data recorded from monitoring networks at Mt. Etna spanning from 1996 to 2003, when the transition from summit eruptions to flank eruptions occurred. This information can be used to carry out a more careful evaluation of the state of volcano and to define potential hazard assessment at Mt. Etna.
NASA Astrophysics Data System (ADS)
Somogyi, Andrea; Medjoubi, Kadda; Sancho-Tomas, Maria; Visscher, P. T.; Baranton, Gil; Philippot, Pascal
2017-09-01
The understanding of real complex geological, environmental and geo-biological processes depends increasingly on in-depth non-invasive study of chemical composition and morphology. In this paper we used scanning hard X-ray nanoprobe techniques in order to study the elemental composition, morphology and As speciation in complex highly heterogeneous geological samples. Multivariate statistical analytical techniques, such as principal component analysis and clustering were used for data interpretation. These measurements revealed the quantitative and valance state inhomogeneity of As and its relation to the total compositional and morphological variation of the sample at sub-μm scales.
Huang, Guanxing; Chen, Zongyu; Liu, Fan; Sun, Jichao; Wang, Jincui
2014-11-01
Anthropogenic factors resulted from the urbanization may affect the groundwater As in urbanized areas. Groundwater samples from the Guangzhou city (South China) were collected for As and other parameter analysis, in order to assess the impact of urbanization and natural processes on As distribution in aquifers. Nearly 25.5 % of groundwater samples were above the WHO drinking water standard for As, and the As concentrations in the granular aquifer (GA) were generally far higher than that in the fractured bedrock aquifer (FBA). Samples were classified into four clusters by using hierarchical cluster analysis. Cluster 1 is mainly located in the FBA and controlled by natural processes. Anthropogenic pollution resulted from the urbanization is responsible for high As concentrations identified in cluster 2. Clusters 3 and 4 are mainly located in the GA and controlled by both natural processes and anthropogenic factors. Three main mechanisms control the source and mobilization of groundwater As in the study area. Firstly, the interaction of water and calcareous rocks appears to be responsible for As release in the FBA. Secondly, reduction of Fe/Mn oxyhydroxides and decomposition of organic matter are probably responsible for high As concentrations in the GA. Thirdly, during the process of urbanization, the infiltration of wastewater/leachate with a high As content is likely to be the main source for groundwater As, while NO3 (-) contamination diminishes groundwater As.
NASA Astrophysics Data System (ADS)
Berry, Jamal Ihsan
The desorption of biomolecules from frozen aqueous solutions on metal substrates with femtosecond laser pulses is presented for the first time. Unlike previous studies using nanosecond pulses, this approach produces high quality mass spectra of biomolecules repeatedly and reproducibly. This novel technique allows analysis of biomolecules directly from their native frozen environments. The motivation for this technique stems from molecular dynamics computer simulations comparing nanosecond and picosecond heating of water overlayers frozen on Au substrates which demonstrate large water cluster formation and ejection upon substrate heating within ultrashort timescales. As the frozen aqueous matrix and analyte molecules are transparent at the wavelengths used, the laser energy is primarily absorbed by the substrate, causing rapid heating and explosive boiling of the ice overlayer, followed by the ejection of ice clusters and the entrained analyte molecule. Spectral characteristics at a relatively high fluence of 10 J/cm 2 reveal the presence of large molecular weight metal clusters when a gold substrate is employed, with smaller cluster species observed from frozen aqueous solutions on Ag, Cu, and Pb substrates. The presence of the metal clusters is indicative of an evaporative cooling mechanism which stabiles cluster ion formation and the ejection of biomolecules from frozen aqueous solutions. Solvation is necessary as the presence of metal clusters and biomolecular ion signals are not observed from bare metal substrates in absence of the frozen overlayer. The potential for mass spectrometric imaging with femtosecond LDI of frozen samples is also presented. The initial results for the characterization of peptides and peptoids linked to combinatorial beads frozen in ice and the assay of frozen brain tissue from the serotonin transporter gene knockout mouse via LDI imaging are discussed. Images of very good quality and resolution are obtained with 400 nm, 200 fs pulses at a fluence of 1.25 J/cm2 . An attractive feature of this technique is that images are acquired within minutes for large sample areas. Additionally, the images obtained with femtosecond laser desorption are high in lateral resolution with the laser capable of being focused to a spot size of 30 mum. Femtosecond laser desorption from ice is unique in that unlike matrix assisted laser desorption ionization mass spectrometry, it does not employ an organic UV absorbing matrix to desorb molecular ions. Instead, the laser energy is absorbed by the metal substrate causing explosive boiling and ejection of the frozen overlayer. This approach is significant in that femtosecond laser desorption possess the potential of analyzing and assaying biomolecules directly from their frozen native environments. This technique was developed to compliment existing ToF-SIMS imaging capability for analysis of tissue and cells, as well as other biological systems of interest.
Value-based customer grouping from large retail data sets
NASA Astrophysics Data System (ADS)
Strehl, Alexander; Ghosh, Joydeep
2000-04-01
In this paper, we propose OPOSSUM, a novel similarity-based clustering algorithm using constrained, weighted graph- partitioning. Instead of binary presence or absence of products in a market-basket, we use an extended 'revenue per product' measure to better account for management objectives. Typically the number of clusters desired in a database marketing application is only in the teens or less. OPOSSUM proceeds top-down, which is more efficient and takes a small number of steps to attain the desired number of clusters as compared to bottom-up agglomerative clustering approaches. OPOSSUM delivers clusters that are balanced in terms of either customers (samples) or revenue (value). To facilitate data exploration and validation of results we introduce CLUSION, a visualization toolkit for high-dimensional clustering problems. To enable closed loop deployment of the algorithm, OPOSSUM has no user-specified parameters. Thresholding heuristics are avoided and the optimal number of clusters is automatically determined by a search for maximum performance. Results are presented on a real retail industry data-set of several thousand customers and products, to demonstrate the power of the proposed technique.
NASA Astrophysics Data System (ADS)
Arif, Shafaq; Rafique, M. Shahid; Saleemi, Farhat; Sagheer, Riffat; Naab, Fabian; Toader, Ovidiu; Mahmood, Arshad; Rashid, Rashad; Mahmood, Mazhar
2015-09-01
Ion implantation is a useful technique to modify surface properties of polymers without altering their bulk properties. The objective of this work is to explore the 400 keV C+ ion implantation effects on PMMA at different fluences ranging from 5 × 1013 to 5 × 1015 ions/cm2. The surface topographical examination of irradiated samples has been performed using Atomic Force Microscope (AFM). The structural and chemical modifications in implanted PMMA are examined by Raman and Fourier Infrared Spectroscopy (FTIR) respectively. The effects of carbon ion implantation on optical properties of PMMA are investigated by UV-Visible spectroscopy. The modifications in electrical conductivity have been measured using a four point probe technique. AFM images reveal a decrease in surface roughness of PMMA with an increase in ion fluence from 5 × 1014 to 5 × 1015 ions/cm2. The existence of amorphization and sp2-carbon clusterization has been confirmed by Raman and FTIR spectroscopic analysis. The UV-Visible data shows a prominent red shift in absorption edge as a function of ion fluence. This shift displays a continuous reduction in optical band gap (from 3.13 to 0.66 eV) due to formation of carbon clusters. Moreover, size of carbon clusters and photoconductivity are found to increase with increasing ion fluence. The ion-induced carbonaceous clusters are believed to be responsible for an increase in electrical conductivity of PMMA from (2.14 ± 0.06) × 10-10 (Ω-cm)-1 (pristine) to (0.32 ± 0.01) × 10-5 (Ω-cm)-1 (irradiated sample).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-10
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clustersmore » may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.« less
Posttraumatic Stress Disorder Following Ethnoreligious Conflict in Jos, Nigeria
ERIC Educational Resources Information Center
Obilom, Rose E.; Thacher, Tom D.
2008-01-01
In September 2001, ethnoreligious rioting occurred in Jos, Nigeria. Using a multistage cluster sampling technique, 290 respondents were recruited in Jos 7 to 9 months after the riots. Data were collected regarding demographics, exposure to traumatic events, and psychological symptoms. Resting pulse and blood pressure were recorded. A total of 145…
NASA Technical Reports Server (NTRS)
Gott, J. Richard, III; Weinberg, David H.; Melott, Adrian L.
1987-01-01
A quantitative measure of the topology of large-scale structure: the genus of density contours in a smoothed density distribution, is described and applied. For random phase (Gaussian) density fields, the mean genus per unit volume exhibits a universal dependence on threshold density, with a normalizing factor that can be calculated from the power spectrum. If large-scale structure formed from the gravitational instability of small-amplitude density fluctuations, the topology observed today on suitable scales should follow the topology in the initial conditions. The technique is illustrated by applying it to simulations of galaxy clustering in a flat universe dominated by cold dark matter. The technique is also applied to a volume-limited sample of the CfA redshift survey and to a model in which galaxies reside on the surfaces of polyhedral 'bubbles'. The topology of the evolved mass distribution and 'biased' galaxy distribution in the cold dark matter models closely matches the topology of the density fluctuations in the initial conditions. The topology of the observational sample is consistent with the random phase, cold dark matter model.
NASA Astrophysics Data System (ADS)
Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew
2017-01-01
We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ˜0.1 dex for GCs with metallicities as high as [Fe/H] = -0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na I, Mg I, Al I, Si I, Ca I, Ti I, Ti II, Sc II, V I, Cr I, Mn I, Co I, Ni I, Cu I, Y II, Zr I, Ba II, La II, Nd II, and Eu II. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe I, Ca I, Si I, Ni I, and Ba II. The elements that show the greatest differences include Mg I and Zr I. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements. This paper includes data gathered with the 6.5 m Magellan Telescopes located at Las Campanas Observatory, Chile.
Daïnou, Kasso; Blanc-Jolivet, Céline; Degen, Bernd; Kimani, Priscilla; Ndiade-Bourobou, Dyana; Donkpegan, Armel S L; Tosso, Félicien; Kaymak, Esra; Bourland, Nils; Doucet, Jean-Louis; Hardy, Olivier J
2016-12-01
Species delimitation in closely related plant taxa can be challenging because (i) reproductive barriers are not always congruent with morphological differentiation, (ii) use of plastid sequences might lead to misinterpretation, (iii) rare species might not be sampled. We revisited molecular-based species delimitation in the African genus Milicia, currently divided into M. regia (West Africa) and M. excelsa (from West to East Africa). We used 435 samples collected in West, Central and East Africa. We genotyped SNP and SSR loci to identify genetic clusters, and sequenced two plastid regions (psbA-trnH, trnC-ycf6) and a nuclear gene (At103) to confirm species' divergence and compare species delimitation methods. We also examined whether ecological niche differentiation was congruent with sampled genetic structure. West African M. regia, West African and East African M. excelsa samples constituted three well distinct genetic clusters according to SNPs and SSRs. In Central Africa, two genetic clusters were consistently inferred by both types of markers, while a few scattered samples, sympatric with the preceding clusters but exhibiting leaf traits of M. regia, were grouped with the West African M. regia cluster based on SNPs or formed a distinct cluster based on SSRs. SSR results were confirmed by sequence data from the nuclear region At103 which revealed three distinct 'Fields For Recombination' corresponding to (i) West African M. regia, (ii) Central African samples with leaf traits of M. regia, and (iii) all M. excelsa samples. None of the plastid sequences provide indication of distinct clades of the three species-like units. Niche modelling techniques yielded a significant correlation between niche overlap and genetic distance. Our genetic data suggest that three species of Milicia could be recognized. It is surprising that the occurrence of two species in Central Africa was not reported for this well-known timber tree. Globally, our work highlights the importance of collecting samples in a systematic way and the need for combining different nuclear markers when dealing with species complexes. Recognizing cryptic species is particularly crucial for economically exploited species because some hidden taxa might actually be endangered as they are merged with more abundant species.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shareghe, Mehraeen; Chi, Miaofang; Browning, Nigel D.
2011-01-01
The structures of small, robust metal clusters on a solid support were determined by a combination of spectroscopic and microscopic methods: extended X-ray absorption fine structure (EXAFS) spectroscopy, scanning transmission electron microscopy (STEM), and aberration-corrected STEM. The samples were synthesized from [Os{sub 3}(CO){sub 12}] on MgO powder to provide supported clusters intended to be triosmium. The results demonstrate that the supported clusters are robust in the absence of oxidants. Conventional high-angle annular dark-field (HAADF) STEM images demonstrate a high degree of uniformity of the clusters, with root-mean-square (rms) radii of 2.03 {+-} 0.06 {angstrom}. The EXAFS OsOs coordination number ofmore » 2.1 {+-} 0.4 confirms the presence of triosmium clusters on average and correspondingly determines an average rms cluster radius of 2.02 {+-} 0.04 {angstrom}. The high-resolution STEM images show the individual Os atoms in the clusters, confirming the triangular structures of their frames and determining OsOs distances of 2.80 {+-} 0.14 {angstrom}, matching the EXAFS value of 2.89 {+-} 0.06 {angstrom}. IR and EXAFS spectra demonstrate the presence of CO ligands on the clusters. This set of techniques is recommended as optimal for detailed and reliable structural characterization of supported clusters.« less
Li, Siyue; Zhang, Quanfa
2010-04-15
A data matrix (4032 observations), obtained during a 2-year monitoring period (2005-2006) from 42 sites in the upper Han River is subjected to various multivariate statistical techniques including cluster analysis, principal component analysis (PCA), factor analysis (FA), correlation analysis and analysis of variance to determine the spatial characterization of dissolved trace elements and heavy metals. Our results indicate that waters in the upper Han River are primarily polluted by Al, As, Cd, Pb, Sb and Se, and the potential pollutants include Ba, Cr, Hg, Mn and Ni. Spatial distribution of trace metals indicates the polluted sections mainly concentrate in the Danjiang, Danjiangkou Reservoir catchment and Hanzhong Plain, and the most contaminated river is in the Hanzhong Plain. Q-model clustering depends on geographical location of sampling sites and groups the 42 sampling sites into four clusters, i.e., Danjiang, Danjiangkou Reservoir region (lower catchment), upper catchment and one river in headwaters pertaining to water quality. The headwaters, Danjiang and lower catchment, and upper catchment correspond to very high polluted, moderate polluted and relatively low polluted regions, respectively. Additionally, PCA/FA and correlation analysis demonstrates that Al, Cd, Mn, Ni, Fe, Si and Sr are controlled by natural sources, whereas the other metals appear to be primarily controlled by anthropogenic origins though geogenic source contributing to them. 2009 Elsevier B.V. All rights reserved.
Synchronization of world economic activity
NASA Astrophysics Data System (ADS)
Groth, Andreas; Ghil, Michael
2017-12-01
Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.
Synchronization of world economic activity.
Groth, Andreas; Ghil, Michael
2017-12-01
Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.
Jongenburger, I; Reij, M W; Boer, E P J; Gorris, L G M; Zwietering, M H
2011-11-15
The actual spatial distribution of microorganisms within a batch of food influences the results of sampling for microbiological testing when this distribution is non-homogeneous. In the case of pathogens being non-homogeneously distributed, it markedly influences public health risk. This study investigated the spatial distribution of Cronobacter spp. in powdered infant formula (PIF) on industrial batch-scale for both a recalled batch as well a reference batch. Additionally, local spatial occurrence of clusters of Cronobacter cells was assessed, as well as the performance of typical sampling strategies to determine the presence of the microorganisms. The concentration of Cronobacter spp. was assessed in the course of the filling time of each batch, by taking samples of 333 g using the most probable number (MPN) enrichment technique. The occurrence of clusters of Cronobacter spp. cells was investigated by plate counting. From the recalled batch, 415 MPN samples were drawn. The expected heterogeneous distribution of Cronobacter spp. could be quantified from these samples, which showed no detectable level (detection limit of -2.52 log CFU/g) in 58% of samples, whilst in the remainder concentrations were found to be between -2.52 and 2.75 log CFU/g. The estimated average concentration in the recalled batch was -2.78 log CFU/g and a standard deviation of 1.10 log CFU/g. The estimated average concentration in the reference batch was -4.41 log CFU/g, with 99% of the 93 samples being below the detection limit. In the recalled batch, clusters of cells occurred sporadically in 8 out of 2290 samples of 1g taken. The two largest clusters contained 123 (2.09 log CFU/g) and 560 (2.75 log CFU/g) cells. Various sampling strategies were evaluated for the recalled batch. Taking more and smaller samples and keeping the total sampling weight constant, considerably improved the performance of the sampling plans to detect such a type of contaminated batch. Compared to random sampling, stratified random sampling improved the probability to detect the heterogeneous contamination. Copyright © 2011 Elsevier B.V. All rights reserved.
Automated detectionof very low surface brightness galaxiesin the Virgo cluster
NASA Astrophysics Data System (ADS)
Prole, D. J.; Davies, J. I.; Keenan, O. C.; Davies, L. J. M.
2018-07-01
We report the automatic detection of a new sample of very low surface brightness (LSB) galaxies, likely members of the Virgo cluster. We introduce our new software, DeepScan, that has been designed specifically to detect extended LSB features automatically using the DBSCAN algorithm. We demonstrate the technique by applying it over a 5 deg2 portion of the Next Generation Virgo Survey (NGVS) data to reveal 53 LSB galaxies that are candidate cluster members based on their sizes and colours. 30 of these sources are new detections despite the region being searched specifically for LSB galaxies previously. Our final sample contains galaxies with 26.0 ≤ ⟨μe⟩ ≤ 28.5 and 19 ≤ mg ≤ 21, making them some of the faintest known in Virgo. The majority of them have colours consistent with the red sequence, and have a mean stellar mass of 106.3 ± 0.5 M⊙ assuming cluster membership. After using ProFit to fit Sérsic profiles to our detections, none of the new sources have effective radii larger than 1.5 Kpc and do not meet the criteria for ultra-diffuse galaxy (UDG) classification, so we classify them as ultra-faint dwarfs.
Phylogeny of kemenyan (Styrax sp.) from North Sumatra based on morphological characters
NASA Astrophysics Data System (ADS)
Susilowati, A.; Kholibrina, C. R.; Rachmat, H. H.; Munthe, M. A.
2018-02-01
Kemenyan is the most famous local tree species from North Sumatra. Kemenyan is known as rosin producer that very valuable for pharmacheutical, cosmetic, food preservatives and vernis. Based on its history, there were only two species of kemenyan those were kemenyan durame and toba, but in its the natural distribution we also found others species showing different characteristics with previously known ones. The objectives of this research were:The objectives of this research were: (1). To determine the morphological diversity of kemenyan in North Sumatra and (2). To determine phylogeny clustering based on the morphological characters. Data was collected from direct observation and morphological characterization, based on purposive sampling technique to those samples trees atPakpak Bharat, North Sumatra. Morphological characters were examined using descriptive analysis, phenotypic variability using standard deviation, and cluster analysis. The result showed that there was a difference between 4 species kemenyen (batak, minyak, durame and toba) according to 75 observed characters including flower, fruits, leaf, stem, bark, crown type, wood and the resin. Analysis and both quantitative and qualitative characters kemenyan clustered into two groups. In which, kemenyan toba separated with other clusters.
Automated detection of very Low Surface Brightness galaxies in the Virgo Cluster
NASA Astrophysics Data System (ADS)
Prole, D. J.; Davies, J. I.; Keenan, O. C.; Davies, L. J. M.
2018-04-01
We report the automatic detection of a new sample of very low surface brightness (LSB) galaxies, likely members of the Virgo cluster. We introduce our new software, DeepScan, that has been designed specifically to detect extended LSB features automatically using the DBSCAN algorithm. We demonstrate the technique by applying it over a 5 degree2 portion of the Next-Generation Virgo Survey (NGVS) data to reveal 53 low surface brightness galaxies that are candidate cluster members based on their sizes and colours. 30 of these sources are new detections despite the region being searched specifically for LSB galaxies previously. Our final sample contains galaxies with 26.0 ≤ ⟨μe⟩ ≤ 28.5 and 19 ≤ mg ≤ 21, making them some of the faintest known in Virgo. The majority of them have colours consistent with the red sequence, and have a mean stellar mass of 106.3 ± 0.5M⊙ assuming cluster membership. After using ProFit to fit Sérsic profiles to our detections, none of the new sources have effective radii larger than 1.5 Kpc and do not meet the criteria for ultra-diffuse galaxy (UDG) classification, so we classify them as ultra-faint dwarfs.
NASA Astrophysics Data System (ADS)
Makahinda, T.
2018-02-01
The purpose of this research is to find out the effect of learning model based on technology and assessment technique toward thermodynamic achievement by controlling students intelligence. This research is an experimental research. The sample is taken through cluster random sampling with the total respondent of 80 students. The result of the research shows that the result of learning of thermodynamics of students who taught the learning model of environmental utilization is higher than the learning result of student thermodynamics taught by simulation animation, after controlling student intelligence. There is influence of student interaction, and the subject between models of technology-based learning with assessment technique to student learning result of Thermodynamics, after controlling student intelligence. Based on the finding in the lecture then should be used a thermodynamic model of the learning environment with the use of project assessment technique.
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
Lux, Markus; Kruger, Jan; Rinke, Christian; ...
2016-12-20
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lux, Markus; Kruger, Jan; Rinke, Christian
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
NASA Astrophysics Data System (ADS)
Wilkinson, Aaron; Almaini, Omar; Chen, Chian-Chou; Smail, Ian; Arumugam, Vinodiran; Blain, Andrew; Chapin, Edward L.; Chapman, Scott C.; Conselice, Christopher J.; Cowley, William I.; Dunlop, James S.; Farrah, Duncan; Geach, James; Hartley, William G.; Ivison, Rob J.; Maltby, David T.; Michałowski, Michał J.; Mortlock, Alice; Scott, Douglas; Simpson, Chris; Simpson, James M.; van der Werf, Paul; Wild, Vivienne
2017-01-01
Submillimetre galaxies (SMGs) are among the most luminous dusty galaxies in the Universe, but their true nature remains unclear; are SMGs the progenitors of the massive elliptical galaxies we see in the local Universe, or are they just a short-lived phase among more typical star-forming galaxies? To explore this problem further, we investigate the clustering of SMGs identified in the SCUBA-2 Cosmology Legacy Survey. We use a catalogue of submillimetre (850 μm) source identifications derived using a combination of radio counterparts and colour/infrared selection to analyse a sample of 610 SMG counterparts in the United Kingdom Infrared Telescope (UKIRT) Infrared Deep Survey (UKIDSS) Ultra Deep Survey (UDS), making this the largest high-redshift sample of these galaxies to date. Using angular cross-correlation techniques, we estimate the halo masses for this large sample of SMGs and compare them with passive and star-forming galaxies selected in the same field. We find that SMGs, on average, occupy high-mass dark matter haloes (Mhalo > 1013 M⊙) at redshifts z > 2.5, consistent with being the progenitors of massive quiescent galaxies in present-day galaxy clusters. We also find evidence of downsizing, in which SMG activity shifts to lower mass haloes at lower redshifts. In terms of their clustering and halo masses, SMGs appear to be consistent with other star-forming galaxies at a given redshift.
NASA Astrophysics Data System (ADS)
Miura, Shinichi
2018-03-01
In this paper, the ground state of para-hydrogen clusters for size regime N ≤ 40 has been studied by our variational path integral molecular dynamics method. Long molecular dynamics calculations have been performed to accurately evaluate ground state properties. The chemical potential of the hydrogen molecule is found to have a zigzag size dependence, indicating the magic number stability for the clusters of the size N = 13, 26, 29, 34, and 39. One-body density of the hydrogen molecule is demonstrated to have a structured profile, not a melted one. The observed magic number stability is examined using the inherent structure analysis. We also have developed a novel method combining our variational path integral hybrid Monte Carlo method with the replica exchange technique. We introduce replicas of the original system bridging from the structured to the melted cluster, which is realized by scaling the potential energy of the system. Using the enhanced sampling method, the clusters are demonstrated to have the structured density profile in the ground state.
Miura, Shinichi
2018-03-14
In this paper, the ground state of para-hydrogen clusters for size regime N ≤ 40 has been studied by our variational path integral molecular dynamics method. Long molecular dynamics calculations have been performed to accurately evaluate ground state properties. The chemical potential of the hydrogen molecule is found to have a zigzag size dependence, indicating the magic number stability for the clusters of the size N = 13, 26, 29, 34, and 39. One-body density of the hydrogen molecule is demonstrated to have a structured profile, not a melted one. The observed magic number stability is examined using the inherent structure analysis. We also have developed a novel method combining our variational path integral hybrid Monte Carlo method with the replica exchange technique. We introduce replicas of the original system bridging from the structured to the melted cluster, which is realized by scaling the potential energy of the system. Using the enhanced sampling method, the clusters are demonstrated to have the structured density profile in the ground state.
Specht, Petra; Kisielowski, Christian
2016-08-30
Ternary In xGa 1–xN alloys became technologically attractive when p-doping was achieved to produce blue and green light emitting diodes (LED)s. Starting in the mid 1990th, investigations of their chemical homogeneity were driven by the need to understand carrier recombination mechanisms in optical device structures to optimize their performance. Transmission electron microscopy (TEM) is the technique of choice to complement optical data evaluations, which suggests the coexistence of local carrier recombination mechanisms based on piezoelectric field effects and on indium clustering in the quantum wells of LEDs. We summarize the historic context of homogeneity investigations using electron microscopy techniques thatmore » can principally resolve the question of indium segregation and clustering in In xGa 1–xN alloys if optimal sample preparation and electron dose-controlled imaging techniques are employed together with advanced data evaluation.« less
Factors associated with social interaction anxiety among Chinese adolescents.
Peng, Z W; Lam, L T; Jin, J
2011-12-01
To investigate potential risk factors for social anxiety, particularly social interaction anxiety among the Chinese adolescents. A cross-sectional health survey was conducted in Guangzhou city of the Guangdong Province where high school students aged 13 to 18 years were recruited. The sample was selected from all high schools in the city using a 2-stage random cluster sampling technique. Social interaction anxiety was assessed using the Social Interaction Anxiety Scale. Information collected in the survey included: demographics, self-perception on school performance, relationship with teachers and peers, satisfaction with self-image, achievements, and parenting style of the mother. The parent-child relationship, specifically the relationship between respondents and their mothers, was assessed using the mother attachment subscale of the Inventory of Parent and Peer Attachment. Self-esteem was assessed using the Rosenberg Self-Esteem Scale. The multiple linear regression technique was applied to investigate associations between selected potential risk factors and social interaction anxiety, with adjustments for cluster sampling. Lower family income, lower self-esteem, and hostility were significantly associated with social interaction anxiety among adolescents. Variables identified as risk factors of anxiety disorder in the literature, such as gender, were not associated with social interaction anxiety in this sample. These results were consistent with those of other studies conducted mainly in the United States and Europe. Regarding non-significant results related to gender, they need viewing in the context of parenting styles of Chinese mothers.
The effect of heavy metal contamination on the bacterial community structure at Jiaozhou Bay, China.
Yao, Xie-Feng; Zhang, Jiu-Ming; Tian, Li; Guo, Jian-Hua
In this study, determination of heavy metal parameters and microbiological characterization of marine sediments obtained from two heavily polluted sites and one low-grade contaminated reference station at Jiaozhou Bay in China were carried out. The microbial communities found in the sampled marine sediments were studied using PCR-DGGE (denaturing gradient gel electrophoresis) fingerprinting profiles in combination with multivariate analysis. Clustering analysis of DGGE and matrix of heavy metals displayed similar occurrence patterns. On this basis, 17 samples were classified into two clusters depending on the presence or absence of the high level contamination. Moreover, the cluster of highly contaminated samples was further classified into two sub-groups based on the stations of their origin. These results showed that the composition of the bacterial community is strongly influenced by heavy metal variables present in the sediments found in the Jiaozhou Bay. This study also suggested that metagenomic techniques such as PCR-DGGE fingerprinting in combination with multivariate analysis is an efficient method to examine the effect of metal contamination on the bacterial community structure. Copyright © 2016 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.
[Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].
Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés
2018-03-06
To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Application of adaptive cluster sampling to low-density populations of freshwater mussels
Smith, D.R.; Villella, R.F.; Lemarie, D.P.
2003-01-01
Freshwater mussels appear to be promising candidates for adaptive cluster sampling because they are benthic macroinvertebrates that cluster spatially and are frequently found at low densities. We applied adaptive cluster sampling to estimate density of freshwater mussels at 24 sites along the Cacapon River, WV, where a preliminary timed search indicated that mussels were present at low density. Adaptive cluster sampling increased yield of individual mussels and detection of uncommon species; however, it did not improve precision of density estimates. Because finding uncommon species, collecting individuals of those species, and estimating their densities are important conservation activities, additional research is warranted on application of adaptive cluster sampling to freshwater mussels. However, at this time we do not recommend routine application of adaptive cluster sampling to freshwater mussel populations. The ultimate, and currently unanswered, question is how to tell when adaptive cluster sampling should be used, i.e., when is a population sufficiently rare and clustered for adaptive cluster sampling to be efficient and practical? A cost-effective procedure needs to be developed to identify biological populations for which adaptive cluster sampling is appropriate.
The XMM Cluster Outskirts Project (X-COP)
NASA Astrophysics Data System (ADS)
Eckert, D.
2017-10-01
The outskirts of galaxy clusters (typically the regions located beyond R500) are the regions where the transition between the virialized ICM and the infalling material from the large-scale structure takes place. As such, they play a central role in our understanding of the processes leading to the virialization of the accreting gas within the central dark-matter halo. I will give an overview of the XMM cluster outskirts project (X-COP), a very large program on XMM to study the virial region of galaxy clusters with unprecedented details. I will show how X-ray observations can be combined with the Sunyaev-Zeldovich signal to recover the thermodynamic properties and hydrostatic mass of the ICM, bypassing the need for expensive X-ray spectroscopic observations. I will discuss the results obtained using this technique on Abell 2142 and Abell 2319 and give prospects for the results expected using the full X-COP sample. I will also present recent results on the search for warm-hot baryons in the filaments connected to clusters, emphasizing on the discovery of 3 filaments of 10-million-degree gas connected to the massive cluster Abell 2744.
Grain Cluster Microstructure and Grain Boundary Character Distribution in Alloy 690
NASA Astrophysics Data System (ADS)
Xia, Shuang; Zhou, Bangxin; Chen, Wenjue
2009-12-01
The effects of thermal-mechanical processing (TMP) on microstructure evolution during recrystallization and grain boundary character distribution (GBCD) in aged Alloy 690 were investigated by the electron backscatter diffraction (EBSD) technique and optical microscopy. The original grain boundaries of the deformed microstructure did not play an important role in the manipulation of the proportion of the Σ3 n ( n = 1, 2, 3…) type boundaries. Instead, the grain cluster formed by multiple twinning starting from a single nucleus during recrystallization was the key microstructural feature affecting the GBCD. All of the grains in this kind of cluster had Σ3 n mutual misorientations regardless of whether they were adjacent. A large grain cluster containing 91 grains was found in the sample after a small-strain (5 pct) and a high-temperature (1100 °C) recrystallization anneal, and twin relationships up to the ninth generation (Σ39) were found in this cluster. The ratio of cluster size over grain size (including all types of boundaries as defining individual grains) dictated the proportion of Σ3 n boundaries.
Nasr, Michel R; Mukhopadhyay, Sanjay; Zhang, Shengle; Katzenstein, Anna-Luise A
2009-12-01
An association between Hashimoto thyroiditis and papillary thyroid carcinoma has been postulated for decades. We undertook this study to identify potential precursors of papillary thyroid carcinoma in Hashimoto thyroiditis using a combination of morphologic, immunohistochemical, and molecular techniques. For the study, samples from 59 cases of Hashimoto thyroiditis were stained with antibodies to HBME1 and cytokeratin (CK)19. Tiny HBME1+ and CK19+ atypical cell clusters were identified and analyzed for the BRAF mutation by the colorimetric Mutector assay and allele-specific polymerase chain reaction. HBME1+ and CK19+ atypical cell clusters were identified in 12 (20%) of 59 cases. The minute size (<1 mm) of the clusters and the incomplete nuclear changes precluded a diagnosis of papillary microcarcinoma. The atypical cell clusters from all 12 cases were negative for BRAF. The absence of the BRAF mutation in these atypical cell clusters suggests that they may not be preneoplastic. Caution should be exercised in interpreting positive HBME1 or CK19 staining in Hashimoto thyroiditis.
Analyzing coastal environments by means of functional data analysis
NASA Astrophysics Data System (ADS)
Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.
2017-07-01
Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
NASA Astrophysics Data System (ADS)
Hamprecht, Fred A.; Peter, Christine; Daura, Xavier; Thiel, Walter; van Gunsteren, Wilfred F.
2001-02-01
We propose an approach for summarizing the output of long simulations of complex systems, affording a rapid overview and interpretation. First, multidimensional scaling techniques are used in conjunction with dimension reduction methods to obtain a low-dimensional representation of the configuration space explored by the system. A nonparametric estimate of the density of states in this subspace is then obtained using kernel methods. The free energy surface is calculated from that density, and the configurations produced in the simulation are then clustered according to the topography of that surface, such that all configurations belonging to one local free energy minimum form one class. This topographical cluster analysis is performed using basin spanning trees which we introduce as subgraphs of Delaunay triangulations. Free energy surfaces obtained in dimensions lower than four can be visualized directly using iso-contours and -surfaces. Basin spanning trees also afford a glimpse of higher-dimensional topographies. The procedure is illustrated using molecular dynamics simulations on the reversible folding of peptide analoga. Finally, we emphasize the intimate relation of density estimation techniques to modern enhanced sampling algorithms.
Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.
Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra
2016-11-20
The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Determining the Optimal Number of Clusters with the Clustergram
NASA Technical Reports Server (NTRS)
Fluegemann, Joseph K.; Davies, Misty D.; Aguirre, Nathan D.
2011-01-01
Cluster analysis aids research in many different fields, from business to biology to aerospace. It consists of using statistical techniques to group objects in large sets of data into meaningful classes. However, this process of ordering data points presents much uncertainty because it involves several steps, many of which are subject to researcher judgment as well as inconsistencies depending on the specific data type and research goals. These steps include the method used to cluster the data, the variables on which the cluster analysis will be operating, the number of resulting clusters, and parts of the interpretation process. In most cases, the number of clusters must be guessed or estimated before employing the clustering method. Many remedies have been proposed, but none is unassailable and certainly not for all data types. Thus, the aim of current research for better techniques of determining the number of clusters is generally confined to demonstrating that the new technique excels other methods in performance for several disparate data types. Our research makes use of a new cluster-number-determination technique based on the clustergram: a graph that shows how the number of objects in the cluster and the cluster mean (the ordinate) change with the number of clusters (the abscissa). We use the features of the clustergram to make the best determination of the cluster-number.
NASA Astrophysics Data System (ADS)
Sarparandeh, Mohammadali; Hezarkhani, Ardeshir
2017-12-01
The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed methods and geological studies leads to finding some hidden information, and this approach has the best results compared to using only one of them.
Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos
2017-08-18
Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO 3 , Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na + , K + , Ca 2+ , Mg 2+ , SO 4 2- , and Cl - ), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO 3 - , SiO 2 , and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.
Sampling Methods in Cardiovascular Nursing Research: An Overview.
Kandola, Damanpreet; Banner, Davina; O'Keefe-McCarthy, Sheila; Jassal, Debbie
2014-01-01
Cardiovascular nursing research covers a wide array of topics from health services to psychosocial patient experiences. The selection of specific participant samples is an important part of the research design and process. The sampling strategy employed is of utmost importance to ensure that a representative sample of participants is chosen. There are two main categories of sampling methods: probability and non-probability. Probability sampling is the random selection of elements from the population, where each element of the population has an equal and independent chance of being included in the sample. There are five main types of probability sampling including simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. Non-probability sampling methods are those in which elements are chosen through non-random methods for inclusion into the research study and include convenience sampling, purposive sampling, and snowball sampling. Each approach offers distinct advantages and disadvantages and must be considered critically. In this research column, we provide an introduction to these key sampling techniques and draw on examples from the cardiovascular research. Understanding the differences in sampling techniques may aid nurses in effective appraisal of research literature and provide a reference pointfor nurses who engage in cardiovascular research.
SPIDERS: the spectroscopic follow-up of X-ray-selected clusters of galaxies in SDSS-IV
Clerc, N.; Merloni, A.; Zhang, Y. -Y.; ...
2016-09-05
SPIDERS (The SPectroscopic IDentification of ERosita Sources) is a programme dedicated to the homogeneous and complete spectroscopic follow-up of X-ray active galactic nuclei and galaxy clusters over a large area (~7500 deg 2) of the extragalactic sky. SPIDERS is part of the Sloan Digital Sky Survey (SDSS)-IV project, together with the Extended Baryon Oscillation Spectroscopic Survey and the Time-Domain Spectroscopic Survey. This study describes the largest project within SPIDERS before the launch of eROSITA: an optical spectroscopic survey of X-ray-selected, massive (~10 14–10 15 M⊙) galaxy clusters discovered in ROSAT and XMM–Newton imaging. The immediate aim is to determine precisemore » (Δz ~ 0.001) redshifts for 4000–5000 of these systems out to z ~ 0.6. The scientific goal of the program is precision cosmology, using clusters as probes of large-scale structure in the expanding Universe. We present the cluster samples, target selection algorithms and observation strategies. We demonstrate the efficiency of selecting targets using a combination of SDSS imaging data, a robust red-sequence finder and a dedicated prioritization scheme. We describe a set of algorithms and work-flow developed to collate spectra and assign cluster membership, and to deliver catalogues of spectroscopically confirmed clusters. We discuss the relevance of line-of-sight velocity dispersion estimators for the richer systems. We illustrate our techniques by constructing a catalogue of 230 spectroscopically validated clusters (0.031 < z < 0.658), found in pilot observations. Finally, we discuss two potential science applications of the SPIDERS sample: the study of the X-ray luminosity-velocity dispersion (LX–σ) relation and the building of stacked phase-space diagrams.« less
SPIDERS: the spectroscopic follow-up of X-ray selected clusters of galaxies in SDSS-IV
NASA Astrophysics Data System (ADS)
Clerc, N.; Merloni, A.; Zhang, Y.-Y.; Finoguenov, A.; Dwelly, T.; Nandra, K.; Collins, C.; Dawson, K.; Kneib, J.-P.; Rozo, E.; Rykoff, E.; Sadibekova, T.; Brownstein, J.; Lin, Y.-T.; Ridl, J.; Salvato, M.; Schwope, A.; Steinmetz, M.; Seo, H.-J.; Tinker, J.
2016-12-01
SPIDERS (The SPectroscopic IDentification of eROSITA Sources) is a programme dedicated to the homogeneous and complete spectroscopic follow-up of X-ray active galactic nuclei and galaxy clusters over a large area (˜7500 deg2) of the extragalactic sky. SPIDERS is part of the Sloan Digital Sky Survey (SDSS)-IV project, together with the Extended Baryon Oscillation Spectroscopic Survey and the Time-Domain Spectroscopic Survey. This paper describes the largest project within SPIDERS before the launch of eROSITA: an optical spectroscopic survey of X-ray-selected, massive (˜1014-1015 M⊙) galaxy clusters discovered in ROSAT and XMM-Newton imaging. The immediate aim is to determine precise (Δz ˜ 0.001) redshifts for 4000-5000 of these systems out to z ˜ 0.6. The scientific goal of the program is precision cosmology, using clusters as probes of large-scale structure in the expanding Universe. We present the cluster samples, target selection algorithms and observation strategies. We demonstrate the efficiency of selecting targets using a combination of SDSS imaging data, a robust red-sequence finder and a dedicated prioritization scheme. We describe a set of algorithms and work-flow developed to collate spectra and assign cluster membership, and to deliver catalogues of spectroscopically confirmed clusters. We discuss the relevance of line-of-sight velocity dispersion estimators for the richer systems. We illustrate our techniques by constructing a catalogue of 230 spectroscopically validated clusters (0.031 < z < 0.658), found in pilot observations. We discuss two potential science applications of the SPIDERS sample: the study of the X-ray luminosity-velocity dispersion (LX-σ) relation and the building of stacked phase-space diagrams.
Marital and Procreative Projections of Rural Louisiana Youth: A Historical Comparison.
ERIC Educational Resources Information Center
Smith, Kevin B.; Ohlendorf, George W.
Changes in marital and procreative projections among rural Louisiana high school youth between 1968 and 1972 were examined. In 1968 a proportionate, stratified, random cluster sampling technique was employed to secure data on seniors from 13 white and 7 black high schools. In 1972 public school integration and the establishment of private schools…
Kos, Gregor; Krska, Rudolf; Lohninger, Hans; Griffiths, Peter R
2004-01-01
An investigation into the rapid detection of mycotoxin-producing fungi on corn by two mid-infrared spectroscopic techniques was undertaken. Corn samples from a single genotype (RWA2, blanks, and contaminated with Fusarium graminearum) were ground, sieved and, after appropriate sample preparation, subjected to mid-infrared spectroscopy using two different accessories (diffuse reflection and attenuated total reflection). The measured spectra were evaluated with principal component analysis (PCA) and the blank and contaminated samples were classified by cluster analysis. Reference data for fungal metabolites were obtained with conventional methods. After extraction and clean-up, each sample was analyzed for the toxin deoxynivalenol (DON) by gas chromatography with electron capture detection (GC-ECD) and ergosterol (a parameter for the total fungal biomass) by high-performance liquid chromatography with diode array detection (HPLC-DAD). The concentration ranges for contaminated samples were 880-3600 microg/kg for ergosterol and 300-2600 microg/kg for DON. Classification efficiency was 100% for ATR spectra. DR spectra did not show as obvious a clustering of contaminated and blank samples. Results and trends were also observed in single spectra plots. Quantification using a PLS1 regression algorithm showed good correlation with DON reference data, but a rather high standard error of prediction (SEP) with 600 microg/kg (DR) and 490 microg/kg (ATR), respectively, for ergosterol. Comparing measurement procedures and results showed advantages for the ATR technique, mainly owing to its ease of use and the easier interpretation of results that were better with respect to classification and quantification.
Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein
2012-12-17
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.
2012-01-01
The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area. PMID:23369182
[Applying the clustering technique for characterising maintenance outsourcing].
Cruz, Antonio M; Usaquén-Perilla, Sandra P; Vanegas-Pabón, Nidia N; Lopera, Carolina
2010-06-01
Using clustering techniques for characterising companies providing health institutions with maintenance services. The study analysed seven pilot areas' equipment inventory (264 medical devices). Clustering techniques were applied using 26 variables. Response time (RT), operation duration (OD), availability and turnaround time (TAT) were amongst the most significant ones. Average biomedical equipment obsolescence value was 0.78. Four service provider clusters were identified: clusters 1 and 3 had better performance, lower TAT, RT and DR values (56 % of the providers coded O, L, C, B, I, S, H, F and G, had 1 to 4 day TAT values:
Impact of Sampling Density on the Extent of HIV Clustering
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2014-01-01
Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J
2018-06-04
Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.
Using Machine Learning Techniques in the Analysis of Oceanographic Data
NASA Astrophysics Data System (ADS)
Falcinelli, K. E.; Abuomar, S.
2017-12-01
Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.
Na, Na; Shi, Ruixia; Long, Zi; Lu, Xin; Jiang, Fubin; Ouyang, Jin
2014-10-01
In this study, the real-time analysis of self-assembled nucleobases was employed by Venturi easy ambient sonic-spray ionization mass spectrometry (V-EASI-MS). With the analysis of three nucleobases including 6-methyluracil (6MU), uracil (U) and thymine (T) as examples, different orders of clusters centered with different metal ions were recorded in both positive and negative modes. Compared with the results obtained by traditional electrospray ionization mass spectrometry (ESI-MS) under the same condition, more clusters with high orders, such as [6MU7+Na](+), [6MU15+2NH4](2+), [6MU10+Na](+), [T7+Na](+), and [T15+2NH4](2+) were detected by V-EASI-MS, which demonstrated the soft ionization ability of V-EASI for studying the non-covalent interaction in a self-assembly process. Furthermore, with the injection of K(+) to the system by a syringe pumping, the real-time monitoring of the formation of nucleobases clusters was achieved by the direct extraction of samples from the system under the Venturi effect. Therefore, the effect of cations on the formation of clusters during self-assembly of nucleobases was demonstrated, which was in accordance with the reports. Free of high voltage, heating or radiation during the ionization, this technique is much soft and suitable for obtaining the real-time information of the self-assembly system, which also makes it quite convenient for extraction samples from the reaction system. This "easy and soft" ionization technique has provided a potential pathway for monitoring and controlling the self-assembly processes. Copyright © 2014 Elsevier B.V. All rights reserved.
Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour
2018-01-12
Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Jaejin; Woo, Jong-Hak; Mulchaey, John S.
We perform a comprehensive study of X-ray cavities using a large sample of X-ray targets selected from the Chandra archive. The sample is selected to cover a large dynamic range including galaxy clusters, groups, and individual galaxies. Using β -modeling and unsharp masking techniques, we investigate the presence of X-ray cavities for 133 targets that have sufficient X-ray photons for analysis. We detect 148 X-ray cavities from 69 targets and measure their properties, including cavity size, angle, and distance from the center of the diffuse X-ray gas. We confirm the strong correlation between cavity size and distance from the X-raymore » center similar to previous studies. We find that the detection rates of X-ray cavities are similar among galaxy clusters, groups and individual galaxies, suggesting that the formation mechanism of X-ray cavities is independent of environment.« less
Knox, Stephanie A; Chondros, Patty
2004-01-01
Background Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Methods Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Results Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. Conclusions The demonstrated precision and reliability of the estimated intra-cluster correlations indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit. PMID:15613248
NASA Astrophysics Data System (ADS)
Li, Jin; Zhang, Xian; Gong, Jinzhe; Tang, Jingtian; Ren, Zhengyong; Li, Guang; Deng, Yanli; Cai, Jin
A new technique is proposed for signal-noise identification and targeted de-noising of Magnetotelluric (MT) signals. This method is based on fractal-entropy and clustering algorithm, which automatically identifies signal sections corrupted by common interference (square, triangle and pulse waves), enabling targeted de-noising and preventing the loss of useful information in filtering. To implement the technique, four characteristic parameters — fractal box dimension (FBD), higuchi fractal dimension (HFD), fuzzy entropy (FuEn) and approximate entropy (ApEn) — are extracted from MT time-series. The fuzzy c-means (FCM) clustering technique is used to analyze the characteristic parameters and automatically distinguish signals with strong interference from the rest. The wavelet threshold (WT) de-noising method is used only to suppress the identified strong interference in selected signal sections. The technique is validated through signal samples with known interference, before being applied to a set of field measured MT/Audio Magnetotelluric (AMT) data. Compared with the conventional de-noising strategy that blindly applies the filter to the overall dataset, the proposed method can automatically identify and purposefully suppress the intermittent interference in the MT/AMT signal. The resulted apparent resistivity-phase curve is more continuous and smooth, and the slow-change trend in the low-frequency range is more precisely reserved. Moreover, the characteristic of the target-filtered MT/AMT signal is close to the essential characteristic of the natural field, and the result more accurately reflects the inherent electrical structure information of the measured site.
Clustering on very small scales from a large, complete sample of confirmed quasar pairs
NASA Astrophysics Data System (ADS)
Eftekharzadeh, Sarah; Myers, Adam D.; Djorgovski, Stanislav G.; Graham, Matthew J.; Hennawi, Joseph F.; Mahabal, Ashish A.; Richards, Gordon T.
2016-06-01
We present by far the largest sample of spectroscopically confirmed binaryquasars with proper transverse separations of 17.0 ≤ Rprop ≤ 36.6 h-1 kpc. Our sample, whichis an order-of-magnitude larger than previous samples, is selected from Sloan Digital Sky Survey (SDSS) imaging over an area corresponding to the SDSS 6th data release (DR6). Our quasars are targeted using a Kernel Density Estimation technique (KDE), and confirmed using long-slit spectroscopy on a range of facilities.Our most complete sub-sample of 44 binary quasars with g<20.85, extends across angular scales of 2.9" < Δθ < 6.3", and is targeted from a parent sample that would be equivalent to a full spectroscopic survey of nearly 300,000 quasars.We determine the projected correlation function of quasars (\\bar Wp) over proper transverse scales of 17.0 ≤ Rprop ≤ 36.6 h-1 kpc, and also in 4 bins of scale within this complete range.To investigate the redshift evolution of quasar clustering on small scales, we make the first self-consistent measurement of the projected quasar correlation function in 4 bins of redshift over 0.4 ≤ z ≤ 2.3.
Approaches to Recruiting 'Hard-To-Reach' Populations into Re-search: A Review of the Literature.
Shaghaghi, Abdolreza; Bhopal, Raj S; Sheikh, Aziz
2011-01-01
'Hard-to-reach' is a term used to describe those sub-groups of the population that may be difficult to reach or involve in research or public health programmes. Application of a single term to call these sub-sections of populations implies a homogeneity within distinct groups, which does not necessarily exist. Different sampling techniques were introduced so far to recruit hard-to-reach populations. In this article, we have reviewed a range of ap-proaches that have been used to widen participation in studies. We performed a Pubmed and Google search for relevant English language articles using the keywords and phrases: (hard-to-reach AND population* OR sampl*), (hidden AND population* OR sample*) and ("hard to reach" AND population* OR sample*) and a consul-tation of the retrieved articles' bibliographies to extract empirical evidence from publications that discussed or examined the use of sampling techniques to recruit hidden or hard-to-reach populations in health studies. Reviewing the literature has identified a range of techniques to recruit hard-to-reach populations, including snowball sampling, respondent-driven sampling (RDS), indigenous field worker sampling (IFWS), facility-based sampling (FBS), targeted sampling (TS), time-location (space) sampling (TLS), conventional cluster sampling (CCS) and capture re-capture sampling (CR). The degree of compliance with a study by a certain 'hard-to-reach' group de-pends on the characteristics of that group, recruitment technique used and the subject of inter-est. Irrespective of potential advantages or limitations of the recruitment techniques reviewed, their successful use depends mainly upon our knowledge about specific characteristics of the target populations. Thus in line with attempts to expand the current boundaries of our know-ledge about recruitment techniques in health studies and their applications in varying situa-tions, we should also focus on possibly all contributing factors which may have an impact on participation rate within a defined population group.
NASA Astrophysics Data System (ADS)
Zitrin, Adi; Broadhurst, Tom; Barkana, Rennan; Rephaeli, Yoel; Benítez, Narciso
2011-01-01
We present the results of a strong-lensing analysis of a complete sample of 12 very luminous X-ray clusters at z > 0.5 using HST/ACS images. Our modelling technique has uncovered some of the largest known critical curves outlined by many accurately predicted sets of multiple images. The distribution of Einstein radii has a median value of ≃28 arcsec (for a source redshift of zs˜ 2), twice as large as other lower z samples, and extends to 55 arcsec for MACS J0717.5+3745, with an impressive enclosed Einstein mass of 7.4 × 1014 M⊙. We find that nine clusters cover a very large area (>2.5 arcmin2) of high magnification (μ > 10×) for a source redshift of zs˜ 8, providing primary targets for accessing the first stars and galaxies. We compare our results with theoretical predictions of the standard Λ cold dark matter (ΛCDM) model which we show systematically fall short of our measured Einstein radii by a factor of ≃1.4, after accounting for the effect of lensing projection. Nevertheless, a revised analysis, once arc redshifts become available, and similar analyses of larger samples, is needed in order to establish more precisely the level of discrepancy with ΛCDM predictions.
Melo, Armindo; Pinto, Edgar; Aguiar, Ana; Mansilha, Catarina; Pinho, Olívia; Ferreira, Isabel M P L V O
2012-07-01
A monitoring program of nitrate, nitrite, potassium, sodium, and pesticides was carried out in water samples from an intensive horticulture area in a vulnerable zone from north of Portugal. Eight collecting points were selected and water-analyzed in five sampling campaigns, during 1 year. Chemometric techniques, such as cluster analysis, principal component analysis (PCA), and discriminant analysis, were used in order to understand the impact of intensive horticulture practices on dug and drilled wells groundwater and to study variations in the hydrochemistry of groundwater. PCA performed on pesticide data matrix yielded seven significant PCs explaining 77.67% of the data variance. Although PCA rendered considerable data reduction, it could not clearly group and distinguish the sample types. However, a visible differentiation between the water samples was obtained. Cluster and discriminant analysis grouped the eight collecting points into three clusters of similar characteristics pertaining to water contamination, indicating that it is necessary to improve the use of water, fertilizers, and pesticides. Inorganic fertilizers such as potassium nitrate were suspected to be the most important factors for nitrate contamination since highly significant Pearson correlation (r = 0.691, P < 0.01) was obtained between groundwater nitrate and potassium contents. Water from dug wells is especially prone to contamination from the grower and their closer neighbor's practices. Water from drilled wells is also contaminated from distant practices.
NASA Technical Reports Server (NTRS)
Smith, A. C.
1982-01-01
Trace gases evolved from a polyimide film during its thermal curing stages have been studied using ion-induced nucleation mass spectrometry. The technique involved exposing the test gas sample to a low energy beta source and recording the masses of the ion-induced molecular clusters formed in the reaction chamber. On the basis of the experimentally observed molecular cluster spectra, it has been concluded that the dominant trace component had a molecular weight of 87 atomic mass units. This component has been identified as a molecule of dimethylacetamide (DMAC) which had been used as a solvent in the preparation of the test polyimide specimen. This identification has been further confirmed by comparing the spectra of the test gas sample and the DMAC calibration sample obtained with a conventional mass spectrometer. The advantages of the ion-induced nucleation mass spectrometer versus the conventional mass spectrometer are discussed.
Feder, Stephan; Sundermann, Benedikt; Wersching, Heike; Teuber, Anja; Kugel, Harald; Teismann, Henning; Heindel, Walter; Berger, Klaus; Pfleiderer, Bettina
2017-11-01
Combinations of resting-state fMRI and machine-learning techniques are increasingly employed to develop diagnostic models for mental disorders. However, little is known about the neurobiological heterogeneity of depression and diagnostic machine learning has mainly been tested in homogeneous samples. Our main objective was to explore the inherent structure of a diverse unipolar depression sample. The secondary objective was to assess, if such information can improve diagnostic classification. We analyzed data from 360 patients with unipolar depression and 360 non-depressed population controls, who were subdivided into two independent subsets. Cluster analyses (unsupervised learning) of functional connectivity were used to generate hypotheses about potential patient subgroups from the first subset. The relationship of clusters with demographical and clinical measures was assessed. Subsequently, diagnostic classifiers (supervised learning), which incorporated information about these putative depression subgroups, were trained. Exploratory cluster analyses revealed two weakly separable subgroups of depressed patients. These subgroups differed in the average duration of depression and in the proportion of patients with concurrently severe depression and anxiety symptoms. The diagnostic classification models performed at chance level. It remains unresolved, if subgroups represent distinct biological subtypes, variability of continuous clinical variables or in part an overfitting of sparsely structured data. Functional connectivity in unipolar depression is associated with general disease effects. Cluster analyses provide hypotheses about potential depression subtypes. Diagnostic models did not benefit from this additional information regarding heterogeneity. Copyright © 2017 Elsevier B.V. All rights reserved.
As-built design specification for proportion estimate software subsystem
NASA Technical Reports Server (NTRS)
Obrien, S. (Principal Investigator)
1980-01-01
The Proportion Estimate Processor evaluates four estimation techniques in order to get an improved estimate of the proportion of a scene that is planted in a selected crop. The four techniques to be evaluated were provided by the techniques development section and are: (1) random sampling; (2) proportional allocation, relative count estimate; (3) proportional allocation, Bayesian estimate; and (4) sequential Bayesian allocation. The user is given two options for computation of the estimated mean square error. These are referred to as the cluster calculation option and the segment calculation option. The software for the Proportion Estimate Processor is operational on the IBM 3031 computer.
ERIC Educational Resources Information Center
Guttmacher, Mary Johnson
A case study was conducted using a sample of 271 women selected from a state college by a stratified random cluster technique that approximates proportional representation of women in all four classes and all college majors. The data source was an extensive questionnaire designed to measure the attitudes and behavior of interest. The major…
The Impact of Tertiary Education on Development of Moderate Society in Pakistan
ERIC Educational Resources Information Center
Atika, Samrana
2010-01-01
The study aimed to find out the impact of tertiary education on development of moderate Islamic society in Pakistan. The population of the study constituted of all the teachers engaged on teaching and all the students studying in the colleges. The study was delimited to the area of public sector college education. Cluster sampling technique was…
Cebi, Nur; Yilmaz, Mustafa Tahsin; Sagdic, Osman
2017-08-15
Sibutramine may be illicitly included in herbal slimming foods and supplements marketed as "100% natural" to enhance weight loss. Considering public health and legal regulations, there is an urgent need for effective, rapid and reliable techniques to detect sibutramine in dietetic herbal foods, teas and dietary supplements. This research comprehensively explored, for the first time, detection of sibutramine in green tea, green coffee and mixed herbal tea using ATR-FTIR spectroscopic technique combined with chemometrics. Hierarchical cluster analysis and PCA principle component analysis techniques were employed in spectral range (2746-2656cm -1 ) for classification and discrimination through Euclidian distance and Ward's algorithm. Unadulterated and adulterated samples were classified and discriminated with respect to their sibutramine contents with perfect accuracy without any false prediction. The results suggest that existence of the active substance could be successfully determined at the levels in the range of 0.375-12mg in totally 1.75g of green tea, green coffee and mixed herbal tea by using FTIR-ATR technique combined with chemometrics. Copyright © 2017 Elsevier Ltd. All rights reserved.
RRW: repeated random walks on genome-scale protein networks for local cluster discovery
Macropol, Kathy; Can, Tolga; Singh, Ambuj K
2009-01-01
Background We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. Results We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. Conclusion RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters. PMID:19740439
Planck's view on the spectrum of the Sunyaev-Zeldovich effect
NASA Astrophysics Data System (ADS)
Erler, Jens; Basu, Kaustuv; Chluba, Jens; Bertoldi, Frank
2018-05-01
We present a detailed analysis of the stacked frequency spectrum of a large sample of galaxy clusters using Planck data, together with auxiliary data from the AKARI and IRAS missions. Our primary goal is to search for the imprint of relativistic corrections to the thermal Sunyaev-Zeldovich effect (tSZ) spectrum, which allow to measure the temperature of the intracluster medium. We remove Galactic and extragalactic foregrounds with a matched filtering technique, which is validated using simulations with realistic mock data sets. The extracted spectra show the tSZ signal at high significance and reveal an additional far-infrared (FIR) excess, which we attribute to thermal emission from the galaxy clusters themselves. This excess FIR emission from clusters is accounted for in our spectral model. We are able to measure the tSZ relativistic corrections at 2.2σ by constraining the mean temperature of our cluster sample to 4.4^{+2.1}_{-2.0} keV. We repeat the same analysis on a subsample containing only the 100 hottest clusters, for which we measure the mean temperature to be 6.0^{+3.8}_{-2.9} keV, corresponding to 2.0σ. The temperature of the emitting dust grains in our FIR model is constrained to ≃20 K, consistent with previous studies. Control for systematic biases is done by fitting mock clusters, from which we also show that using the non-relativistic spectrum for SZ signal extraction will lead to a bias in the integrated Compton parameter Y, which can be up to 14% for the most massive clusters. We conclude by providing an outlook for the upcoming CCAT-prime telescope, which will improve upon Planck with lower noise and better spatial resolution.
Relative efficiency and sample size for cluster randomized trials with variable cluster sizes.
You, Zhiying; Williams, O Dale; Aban, Inmaculada; Kabagambe, Edmond Kato; Tiwari, Hemant K; Cutter, Gary
2011-02-01
The statistical power of cluster randomized trials depends on two sample size components, the number of clusters per group and the numbers of individuals within clusters (cluster size). Variable cluster sizes are common and this variation alone may have significant impact on study power. Previous approaches have taken this into account by either adjusting total sample size using a designated design effect or adjusting the number of clusters according to an assessment of the relative efficiency of unequal versus equal cluster sizes. This article defines a relative efficiency of unequal versus equal cluster sizes using noncentrality parameters, investigates properties of this measure, and proposes an approach for adjusting the required sample size accordingly. We focus on comparing two groups with normally distributed outcomes using t-test, and use the noncentrality parameter to define the relative efficiency of unequal versus equal cluster sizes and show that statistical power depends only on this parameter for a given number of clusters. We calculate the sample size required for an unequal cluster sizes trial to have the same power as one with equal cluster sizes. Relative efficiency based on the noncentrality parameter is straightforward to calculate and easy to interpret. It connects the required mean cluster size directly to the required sample size with equal cluster sizes. Consequently, our approach first determines the sample size requirements with equal cluster sizes for a pre-specified study power and then calculates the required mean cluster size while keeping the number of clusters unchanged. Our approach allows adjustment in mean cluster size alone or simultaneous adjustment in mean cluster size and number of clusters, and is a flexible alternative to and a useful complement to existing methods. Comparison indicated that we have defined a relative efficiency that is greater than the relative efficiency in the literature under some conditions. Our measure of relative efficiency might be less than the measure in the literature under some conditions, underestimating the relative efficiency. The relative efficiency of unequal versus equal cluster sizes defined using the noncentrality parameter suggests a sample size approach that is a flexible alternative and a useful complement to existing methods.
Ntozini, Robert; Marks, Sara J; Mangwadu, Goldberg; Mbuya, Mduduzi N N; Gerema, Grace; Mutasa, Batsirai; Julian, Timothy R; Schwab, Kellogg J; Humphrey, Jean H; Zungu, Lindiwe I
2015-12-15
Access to water and sanitation are important determinants of behavioral responses to hygiene and sanitation interventions. We estimated cluster-specific water access and sanitation coverage to inform a constrained randomization technique in the SHINE trial. Technicians and engineers inspected all public access water sources to ascertain seasonality, function, and geospatial coordinates. Households and water sources were mapped using open-source geospatial software. The distance from each household to the nearest perennial, functional, protected water source was calculated, and for each cluster, the median distance and the proportion of households within <500 m and >1500 m of such a water source. Cluster-specific sanitation coverage was ascertained using a random sample of 13 households per cluster. These parameters were included as covariates in randomization to optimize balance in water and sanitation access across treatment arms at the start of the trial. The observed high variability between clusters in both parameters suggests that constraining on these factors was needed to reduce risk of bias. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America.
Spectral characteristics and the extent of paleosols of the Palouse formation
NASA Technical Reports Server (NTRS)
Frazier, B. E.; Busacca, A.; Cheng, Y.; Wherry, D.; Hart, J.; Gill, S.
1986-01-01
Spectral relationships were investigated for several bare soil fields which were in summer fallow rotation on the date of the imagery. Printouts of each band were examined and compared to aerial photography. Bands with dissimilar reflectance patterns for known areas were then combined using ratio techniques which were proven useful in other studies (Williams, 1983). Selected ratios were Thematic Mapper (TM) 1/TM4, TM3/TM4, and TM5/TM4. Cluster analyses and Baysian and Fastclass classifier images were produced using the three ratio images. Plots of cluster analysis outputs revealed distinct groupings of reflectance data representing green crops, ripened crops, soil and green plants, and bare soil. Bare soil was represented by a line of clusters on plots of the ratios TM5/TM4 and TM3/TM4. The soil line was investigated further to determine factors involved in the distributin of clusters alone the line. The clusters representing the bare soil line were also studied by plotting the Tm5/TM4, TM1/TM4 dimension. A total of 76 soil samples were gathered and analyzed for organic carbon.
NASA Technical Reports Server (NTRS)
Bonamente, Massimillano; Joy, Marshall K.; Carlstrom, John E.; Reese, Erik D.; LaRoque, Samuel J.
2004-01-01
X-ray and Sunyaev-Zel'dovich effect data can be combined to determine the distance to galaxy clusters. High-resolution X-ray data are now available from Chandra, which provides both spatial and spectral information, and Sunyaev-Zel'dovich effect data were obtained from the BIMA and Owens Valley Radio Observatory (OVRO) arrays. We introduce a Markov Chain Monte Carlo procedure for the joint analysis of X-ray and Sunyaev- Zel'dovich effect data. The advantages of this method are the high computational efficiency and the ability to measure simultaneously the probability distribution of all parameters of interest, such as the spatial and spectral properties of the cluster gas and also for derivative quantities such as the distance to the cluster. We demonstrate this technique by applying it to the Chandra X-ray data and the OVRO radio data for the galaxy cluster A611. Comparisons with traditional likelihood ratio methods reveal the robustness of the method. This method will be used in follow-up paper to determine the distances to a large sample of galaxy cluster.
Bosomprah, Samuel; Dotse-Gborgbortsi, Winfred; Aboagye, Patrick; Matthews, Zoe
2016-11-01
To identify and evaluate clusters of births that occurred outside health facilities in Ghana for targeted intervention. A retrospective study was conducted using a convenience sample of live births registered in Ghanaian health facilities from January 1 to December 31, 2014. Data were extracted from the district health information system. A spatial scan statistic was used to investigate clusters of home births through a discrete Poisson probability model. Scanning with a circular spatial window was conducted only for clusters with high rates of such deliveries. The district was used as the geographic unit of analysis. The likelihood P value was estimated using Monte Carlo simulations. Ten statistically significant clusters with a high rate of home birth were identified. The relative risks ranged from 1.43 ("least likely" cluster; P=0.001) to 1.95 ("most likely" cluster; P=0.001). The relative risks of the top five "most likely" clusters ranged from 1.68 to 1.95; these clusters were located in Ashanti, Brong Ahafo, and the Western, Eastern, and Greater regions of Accra. Health facility records, geospatial techniques, and geographic information systems provided locally relevant information to assist policy makers in delivering targeted interventions to small geographic areas. Copyright © 2016 International Federation of Gynecology and Obstetrics. Published by Elsevier Ireland Ltd. All rights reserved.
Baxter, E. J.; Keisler, R.; Dodelson, S.; ...
2015-06-22
Clusters of galaxies are expected to gravitationally lens the cosmic microwave background (CMB) and thereby generate a distinct signal in the CMB on arcminute scales. Measurements of this effect can be used to constrain the masses of galaxy clusters with CMB data alone. Here we present a measurement of lensing of the CMB by galaxy clusters using data from the South Pole Telescope (SPT). We also develop a maximum likelihood approach to extract the CMB cluster lensing signal and validate the method on mock data. We quantify the effects on our analysis of several potential sources of systematic error andmore » find that they generally act to reduce the best-fit cluster mass. It is estimated that this bias to lower cluster mass is roughly 0.85σ in units of the statistical error bar, although this estimate should be viewed as an upper limit. Furthermore, we apply our maximum likelihood technique to 513 clusters selected via their Sunyaev–Zeldovich (SZ) signatures in SPT data, and rule out the null hypothesis of no lensing at 3.1σ. The lensing-derived mass estimate for the full cluster sample is consistent with that inferred from the SZ flux: M 200,lens = 0.83 +0.38 -0.37 M 200,SZ (68% C.L., statistical error only).« less
NASA Astrophysics Data System (ADS)
Davis, C.; Rozo, E.; Roodman, A.; Alarcon, A.; Cawthon, R.; Gatti, M.; Lin, H.; Miquel, R.; Rykoff, E. S.; Troxel, M. A.; Vielzeuf, P.; Abbott, T. M. C.; Abdalla, F. B.; Allam, S.; Annis, J.; Bechtol, K.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Buckley-Geer, E.; Burke, D. L.; Carnero Rosell, A.; Carrasco Kind, M.; Carretero, J.; Castander, F. J.; Crocce, M.; Cunha, C. E.; D'Andrea, C. B.; da Costa, L. N.; Desai, S.; Diehl, H. T.; Doel, P.; Drlica-Wagner, A.; Fausti Neto, A.; Flaugher, B.; Fosalba, P.; Frieman, J.; García-Bellido, J.; Gaztanaga, E.; Gerdes, D. W.; Giannantonio, T.; Gruen, D.; Gruendl, R. A.; Gutierrez, G.; Honscheid, K.; Jain, B.; James, D. J.; Jeltema, T.; Krause, E.; Kuehn, K.; Kuhlmann, S.; Kuropatkin, N.; Lahav, O.; Li, T. S.; Lima, M.; March, M.; Marshall, J. L.; Martini, P.; Melchior, P.; Ogando, R. L. C.; Plazas, A. A.; Romer, A. K.; Sanchez, E.; Scarpine, V.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, M.; Soares-Santos, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Vikram, V.; Walker, A. R.; Wechsler, R. H.
2018-06-01
Galaxy cross-correlations with high-fidelity redshift samples hold the potential to precisely calibrate systematic photometric redshift uncertainties arising from the unavailability of complete and representative training and validation samples of galaxies. However, application of this technique in the Dark Energy Survey (DES) is hampered by the relatively low number density, small area, and modest redshift overlap between photometric and spectroscopic samples. We propose instead using photometric catalogues with reliable photometric redshifts for photo-z calibration via cross-correlations. We verify the viability of our proposal using redMaPPer clusters from the Sloan Digital Sky Survey (SDSS) to successfully recover the redshift distribution of SDSS spectroscopic galaxies. We demonstrate how to combine photo-z with cross-correlation data to calibrate photometric redshift biases while marginalizing over possible clustering bias evolution in either the calibration or unknown photometric samples. We apply our method to DES Science Verification (DES SV) data in order to constrain the photometric redshift distribution of a galaxy sample selected for weak lensing studies, constraining the mean of the tomographic redshift distributions to a statistical uncertainty of Δz ˜ ±0.01. We forecast that our proposal can, in principle, control photometric redshift uncertainties in DES weak lensing experiments at a level near the intrinsic statistical noise of the experiment over the range of redshifts where redMaPPer clusters are available. Our results provide strong motivation to launch a programme to fully characterize the systematic errors from bias evolution and photo-z shapes in our calibration procedure.
Davis, C.; Rozo, E.; Roodman, A.; ...
2018-03-26
Galaxy cross-correlations with high-fidelity redshift samples hold the potential to precisely calibrate systematic photometric redshift uncertainties arising from the unavailability of complete and representative training and validation samples of galaxies. However, application of this technique in the Dark Energy Survey (DES) is hampered by the relatively low number density, small area, and modest redshift overlap between photometric and spectroscopic samples. We propose instead using photometric catalogs with reliable photometric redshifts for photo-z calibration via cross-correlations. We verify the viability of our proposal using redMaPPer clusters from the Sloan Digital Sky Survey (SDSS) to successfully recover the redshift distribution of SDSS spectroscopic galaxies. We demonstrate how to combine photo-z with cross-correlation data to calibrate photometric redshift biases while marginalizing over possible clustering bias evolution in either the calibration or unknown photometric samples. We apply our method to DES Science Verification (DES SV) data in order to constrain the photometric redshift distribution of a galaxy sample selected for weak lensing studies, constraining the mean of the tomographic redshift distributions to a statistical uncertainty ofmore » $$\\Delta z \\sim \\pm 0.01$$. We forecast that our proposal can in principle control photometric redshift uncertainties in DES weak lensing experiments at a level near the intrinsic statistical noise of the experiment over the range of redshifts where redMaPPer clusters are available. Here, our results provide strong motivation to launch a program to fully characterize the systematic errors from bias evolution and photo-z shapes in our calibration procedure.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, C.; Rozo, E.; Roodman, A.
Galaxy cross-correlations with high-fidelity redshift samples hold the potential to precisely calibrate systematic photometric redshift uncertainties arising from the unavailability of complete and representative training and validation samples of galaxies. However, application of this technique in the Dark Energy Survey (DES) is hampered by the relatively low number density, small area, and modest redshift overlap between photometric and spectroscopic samples. We propose instead using photometric catalogs with reliable photometric redshifts for photo-z calibration via cross-correlations. We verify the viability of our proposal using redMaPPer clusters from the Sloan Digital Sky Survey (SDSS) to successfully recover the redshift distribution of SDSS spectroscopic galaxies. We demonstrate how to combine photo-z with cross-correlation data to calibrate photometric redshift biases while marginalizing over possible clustering bias evolution in either the calibration or unknown photometric samples. We apply our method to DES Science Verification (DES SV) data in order to constrain the photometric redshift distribution of a galaxy sample selected for weak lensing studies, constraining the mean of the tomographic redshift distributions to a statistical uncertainty ofmore » $$\\Delta z \\sim \\pm 0.01$$. We forecast that our proposal can in principle control photometric redshift uncertainties in DES weak lensing experiments at a level near the intrinsic statistical noise of the experiment over the range of redshifts where redMaPPer clusters are available. Here, our results provide strong motivation to launch a program to fully characterize the systematic errors from bias evolution and photo-z shapes in our calibration procedure.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solaimani, Mohiuddin; Iftekhar, Mohammed; Khan, Latifur
Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. Asmore » a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.« less
NASA Technical Reports Server (NTRS)
Bonamente, Massimiliano; Joy, Marshall K.; Carlstrom, John E.; LaRoque, Samuel J.
2004-01-01
X-ray and Sunyaev-Zeldovich Effect data ca,n be combined to determine the distance to galaxy clusters. High-resolution X-ray data are now available from the Chandra Observatory, which provides both spatial and spectral information, and interferometric radio measurements of the Sunyam-Zeldovich Effect are available from the BIMA and 0VR.O arrays. We introduce a Monte Carlo Markov chain procedure for the joint analysis of X-ray and Sunyaev-Zeldovich Effect data. The advantages of this method are the high computational efficiency and the ability to measure the full probability distribution of all parameters of interest, such as the spatial and spectral properties of the cluster gas and the cluster distance. We apply this technique to the Chandra X-ray data and the OVRO radio data for the galaxy cluster Abell 611. Comparisons with traditional likelihood-ratio methods reveal the robustness of the method. This method will be used in a follow-up paper to determine the distance of a large sample of galaxy clusters for which high-resolution Chandra X-ray and BIMA/OVRO radio data are available.
The evaluation of alternate methodologies for land cover classification in an urbanizing area
NASA Technical Reports Server (NTRS)
Smekofski, R. M.
1981-01-01
The usefulness of LANDSAT in classifying land cover and in identifying and classifying land use change was investigated using an urbanizing area as the study area. The question of what was the best technique for classification was the primary focus of the study. The many computer-assisted techniques available to analyze LANDSAT data were evaluated. Techniques of statistical training (polygons from CRT, unsupervised clustering, polygons from digitizer and binary masks) were tested with minimum distance to the mean, maximum likelihood and canonical analysis with minimum distance to the mean classifiers. The twelve output images were compared to photointerpreted samples, ground verified samples and a current land use data base. Results indicate that for a reconnaissance inventory, the unsupervised training with canonical analysis-minimum distance classifier is the most efficient. If more detailed ground truth and ground verification is available, the polygons from the digitizer training with the canonical analysis minimum distance is more accurate.
NASA Astrophysics Data System (ADS)
Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan
2017-12-01
Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.
Multiscale visual quality assessment for cluster analysis with self-organizing maps
NASA Astrophysics Data System (ADS)
Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias
2011-01-01
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Red giants and yellow stragglers in the young open cluster NGC 2447
NASA Astrophysics Data System (ADS)
da Silveira, M. D.; Pereira, C. B.; Drake, N. A.
2018-06-01
In this work we analysed, using high-resolution spectroscopy, a sample of 12 single and 4 spectroscopic binary stars of the open cluster NGC 2447. For the single stars, we obtained atmospheric parameters and chemical abundances of Li, C, N, O, Na, Mg, Al, Ca, Si, Ti, Ni, Cr, Y, Zr, La, Ce, Nd, Eu. Rotational velocities were obtained for all the stars. The abundances of the light elements and Eu and the rotational velocities were derived using spectral synthesis technique. We obtained a mean metallicity of [Fe/H] = -0.17 ± 0.05. We found that the abundances of all elements are similar to field giants and/or giants of open clusters, even for the s-process elements, which are enhanced as in other young open clusters. We show that the spectroscopic binaries NGC 2447-26, 38, and 42 are yellow-straggler stars, of which the primary is a giant star and the secondary a main-sequence A-type star.
Preparation of graphene on Cu foils by ion implantation with negative carbon clusters
NASA Astrophysics Data System (ADS)
Li, Hui; Shang, Yan-Xia; Zhang, Zao-Di; Wang, Ze-Song; Zhang, Rui; Fu, De-Jun
2015-01-01
We report on few-layer graphene synthesized on Cu foils by ion implantation using negative carbon cluster ions, followed by annealing at 950 °C in vacuum. Raman spectroscopy reveals IG/I2D values varying from 1.55 to 2.38 depending on energy and dose of the cluster ions, indicating formation of multilayer graphene. The measurements show that the samples with more graphene layers have fewer defects. This is interpreted by graphene growth seeded by the first layers formed via outward diffusion of C from the Cu foil, though nonlinear damage and smoothing effects also play a role. Cluster ion implantation overcomes the solubility limit of carbon in Cu, providing a technique for multilayer graphene synthesis. Project supported by the National Natural Science Foundation of China (Grant Nos. 11105100, 11205116, and 11375135) and the State Key Laboratory of Advanced Welding and Joining, Harbin Institute of Technology, China (Grant No. AWJ-M13-03).
The electronic structure of Au25 clusters: between discrete and continuous
NASA Astrophysics Data System (ADS)
Katsiev, Khabiboulakh; Lozova, Nataliya; Wang, Lu; Sai Krishna, Katla; Li, Ruipeng; Mei, Wai-Ning; Skrabalak, Sara E.; Kumar, Challa S. S. R.; Losovyj, Yaroslav
2016-08-01
Here, an approach based on synchrotron resonant photoemission is employed to explore the transition between quantization and hybridization of the electronic structure in atomically precise ligand-stabilized nanoparticles. While the presence of ligands maintains quantization in Au25 clusters, their removal renders increased hybridization of the electronic states in the vicinity of the Fermi level. These observations are supported by DFT studies.Here, an approach based on synchrotron resonant photoemission is employed to explore the transition between quantization and hybridization of the electronic structure in atomically precise ligand-stabilized nanoparticles. While the presence of ligands maintains quantization in Au25 clusters, their removal renders increased hybridization of the electronic states in the vicinity of the Fermi level. These observations are supported by DFT studies. Electronic supplementary information (ESI) available: Experimental details including chemicals, sample preparation, and characterization methods. Computation techniques, SV-AUC, GIWAXS, XPS, UPS, MALDI-TOF, ESI data of Au25 clusters. See DOI: 10.1039/c6nr02374f
Bonizzi, I; Buffoni, J N; Feligini, M; Enne, G
2009-10-01
To assess the bacterial biodiversity level in bovine raw milk used to produce Fontina, a Protected Designation of Origin cheese manufactured at high-altitude pastures and in valleys of Valle d'Aosta region (North-western Italian Alps) without any starters. To study the relation between microbial composition and pasture altitude, in order to distinguish high-altitude milk against valley and lowland milk. The microflora from milks sampled at different alpine pasture, valley and lowland farms were fingerprinted by PCR of the 16S-23S intergenic transcribed spacers (ITS-PCR). The resulting band patterns were analysed by generalized multivariate statistical techniques to handle discrete (band presence-absence) and continuous (altitude) information. The fingerprints featured numerous bands and marked variability indicating complex, differentiated bacterial communities. Alpine pasture milks were distinguished from lowland ones by cluster analysis, while this technique less clearly discriminated alpine pasture and valley samples. Generalized principal component analysis and clustering-after-ordination enabled a more effective distinction of alpine pasture, valley and lowland samples. Alpine raw milks for Fontina production contain highly diverse bacterial communities, the composition of which is related to the altitude of the pasture where milk was produced. This research may provide analytical support to the important issue represented by the authentication of the geographical origin of alpine milk productions.
Unsupervised classification of remote multispectral sensing data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.
Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks
Karpinets, Tatiana V.; Gopalakrishnan, Vancheswaran; Wargo, Jennifer; ...
2018-03-07
Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putativemore » species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.« less
Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karpinets, Tatiana V.; Gopalakrishnan, Vancheswaran; Wargo, Jennifer
Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putativemore » species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.« less
The use of cluster sampling to determine aid needs in Grozny, Chechnya in 1995.
Drysdale, S; Howarth, J; Powell, V; Healing, T
2000-09-01
War broke out in Chechnya in November 1994 following a three-year economic blockade. It caused widespread destruction in the capital Grozny. In April 1995 Medical Relief International--or Merlin, a British medical non-governmental organisation (NGO)--began a programme to provide medical supplies, support health centres, control communicable disease and promote preventive health-care in Grozny. In July 1995 the agency undertook a city-wide needs assessment using a modification of the cluster sampling technique developed by the Expanded Programme on Immunisation. This showed that most people had enough drinking-water, food and fuel but that provision of medical care was inadequate. The survey allowed Merlin to redirect resources earmarked for a clean water programme towards health education and improving primary health-care services. It also showed that rapid assessment by a statistically satisfactory method is both possible and useful in such a situation.
Electrical Load Profile Analysis Using Clustering Techniques
NASA Astrophysics Data System (ADS)
Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.
2017-03-01
Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard
2014-07-01
Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn
2014-07-15
Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less
Blooming Trees: Substructures and Surrounding Groups of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Yu, Heng; Diaferio, Antonaldo; Serra, Ana Laura; Baldi, Marco
2018-06-01
We develop the Blooming Tree Algorithm, a new technique that uses spectroscopic redshift data alone to identify the substructures and the surrounding groups of galaxy clusters, along with their member galaxies. Based on the estimated binding energy of galaxy pairs, the algorithm builds a binary tree that hierarchically arranges all of the galaxies in the field of view. The algorithm searches for buds, corresponding to gravitational potential minima on the binary tree branches; for each bud, the algorithm combines the number of galaxies, their velocity dispersion, and their average pairwise distance into a parameter that discriminates between the buds that do not correspond to any substructure or group, and thus eventually die, and the buds that correspond to substructures and groups, and thus bloom into the identified structures. We test our new algorithm with a sample of 300 mock redshift surveys of clusters in different dynamical states; the clusters are extracted from a large cosmological N-body simulation of a ΛCDM model. We limit our analysis to substructures and surrounding groups identified in the simulation with mass larger than 1013 h ‑1 M ⊙. With mock redshift surveys with 200 galaxies within 6 h ‑1 Mpc from the cluster center, the technique recovers 80% of the real substructures and 60% of the surrounding groups; in 57% of the identified structures, at least 60% of the member galaxies of the substructures and groups belong to the same real structure. These results improve by roughly a factor of two the performance of the best substructure identification algorithm currently available, the σ plateau algorithm, and suggest that our Blooming Tree Algorithm can be an invaluable tool for detecting substructures of galaxy clusters and investigating their complex dynamics.
Major signal suppression from metal ion clusters in SFC/ESI-MS - Cause and effects.
Haglind, Alfred; Hedeland, Mikael; Arvidsson, Torbjörn; Pettersson, Curt E
2018-05-01
The widening application area of SFC-MS with polar analytes and water-containing samples facilitates the use of quick and simple sample preparation techniques such as "dilute and shoot" and protein precipitation. This has also introduced new polar interfering components such as alkali metal ions naturally abundant in e.g. blood plasma and urine, which have shown to be retained using screening conditions in SFC/ESI-TOF-MS and causing areas of major ion suppression. Analytes co-eluting with these clusters will have a decreased signal intensity, which might have a major effect on both quantification and identification. When investigating the composition of the alkali metal clusters using accurate mass and isotopic pattern, it could be concluded that they were previously not described in the literature. Using NaCl and KCl standards and different chromatographic conditions, varying e.g. column and modifier, the clusters proved to be formed from the alkali metal ions in combination with the alcohol modifier and make-up solvent. Their compositions were [(XOCH 3 ) n + X] + , [(XOH) n + X] + , [(X 2 CO 3 ) n + X] + and [(XOOCOCH 3 ) n + X] + for X = Na + or K + in ESI+. In ESI-, the clusters depended more on modifier, with [(XCl) n + Cl] - and [(XOCH 3 ) n + OCH 3 ] - mainly formed in pure methanol and [(XOOCH) n + OOCH] - when 20 mM NH 4 Fa was added. To prevent the formation of the clusters by avoiding methanol as modifier might be difficult, as this is a widely used modifier providing good solubility when analyzing polar compounds in SFC. A sample preparation with e.g. LLE would remove the alkali ions, however also introducing a time consuming and discriminating step into the method. Since the alkali metal ions were retained and affected by chromatographic adjustments as e.g. mobile phase modifications, a way to avoid them could therefore be chromatographic tuning, when analyzing samples containing them. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Maples, B. L.; Alvarez, L. V.; Moreno, H. A.; Chilson, P. B.; Segales, A.
2017-12-01
Given that classical in-situ direct surveying for geomorphological subsurface information in rivers is time-consuming, labor-intensive, costly, and often involves high-risk activities, it is obvious that non-intrusive technologies, like UAS-based, LIDAR-based remote sensing, have a promising potential and benefits in terms of efficient and accurate measurement of channel topography over large areas within a short time; therefore, a tremendous amount of attention has been paid to the development of these techniques. Over the past two decades, efforts have been undertaken to develop a specialized technique that can penetrate the water body and detect the channel bed to derive river and coastal bathymetry. In this research, we develop a low-cost effective technique for water body bathymetry. With the use of a sUAS and a light-weight sonar, the bathymetry and volume of a small reservoir have been surveyed. The sUAS surveying approach is conducted under low altitudes (2 meters from the water) using the sUAS to tow a small boat with the sonar attached. A cluster analysis is conducted to optimize the sUAS data collection and minimize the standard deviation created by under-sampling in areas of highly variable bathymetry, so measurements are densified in regions featured by steep slopes and drastic changes in the reservoir bed. This technique provides flexibility, efficiency, and free-risk to humans while obtaining high-quality information. The irregularly-spaced bathymetric survey is then interpolated using unstructured Triangular Irregular Network (TIN)-based maps to avoid re-gridding or re-sampling issues.
Quantum annealing for combinatorial clustering
NASA Astrophysics Data System (ADS)
Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph
2018-02-01
Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.
Key-Node-Separated Graph Clustering and Layouts for Human Relationship Graph Visualization.
Itoh, Takayuki; Klein, Karsten
2015-01-01
Many graph-drawing methods apply node-clustering techniques based on the density of edges to find tightly connected subgraphs and then hierarchically visualize the clustered graphs. However, users may want to focus on important nodes and their connections to groups of other nodes for some applications. For this purpose, it is effective to separately visualize the key nodes detected based on adjacency and attributes of the nodes. This article presents a graph visualization technique for attribute-embedded graphs that applies a graph-clustering algorithm that accounts for the combination of connections and attributes. The graph clustering step divides the nodes according to the commonality of connected nodes and similarity of feature value vectors. It then calculates the distances between arbitrary pairs of clusters according to the number of connecting edges and the similarity of feature value vectors and finally places the clusters based on the distances. Consequently, the technique separates important nodes that have connections to multiple large clusters and improves the visibility of such nodes' connections. To test this technique, this article presents examples with human relationship graph datasets, including a coauthorship and Twitter communication network dataset.
Occurrence of Radio Minihalos in a Mass-Limited Sample of Galaxy Clusters
NASA Technical Reports Server (NTRS)
Giacintucci, Simona; Markevitch, Maxim; Cassano, Rossella; Venturi, Tiziana; Clarke, Tracy E.; Brunetti, Gianfranco
2017-01-01
We investigate the occurrence of radio minihalos-diffuse radio sources of unknown origin observed in the cores of some galaxy clusters-in a statistical sample of 58 clusters drawn from the Planck Sunyaev-Zeldovich cluster catalog using a mass cut (M(sub 500) greater than 6 x 10(exp 14) solar mass). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present. Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores-at least 12 out of 15 (80%)-in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or "warm cores." These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.
NASA Astrophysics Data System (ADS)
Ebeling, H.; Edge, A. C.; Bohringer, H.; Allen, S. W.; Crawford, C. S.; Fabian, A. C.; Voges, W.; Huchra, J. P.
1998-12-01
We present a 90 per cent flux-complete sample of the 201 X-ray-brightest clusters of galaxies in the northern hemisphere (delta>=0 deg), at high Galactic latitudes (|b|>=20 deg), with measured redshifts z<=0.3 and fluxes higher than 4.4x10^-12 erg cm^-2 s^-1 in the 0.1-2.4 keV band. The sample, called the ROSAT Brightest Cluster Sample (BCS), is selected from ROSAT All-Sky Survey data and is the largest X-ray-selected cluster sample compiled to date. In addition to Abell clusters, which form the bulk of the sample, the BCS also contains the X-ray-brightest Zwicky clusters and other clusters selected from their X-ray properties alone. Effort has been made to ensure the highest possible completeness of the sample and the smallest possible contamination by non-cluster X-ray sources. X-ray fluxes are computed using an algorithm tailored for the detection and characterization of X-ray emission from galaxy clusters. These fluxes are accurate to better than 15 per cent (mean 1sigma error). We find the cumulative logN-logS distribution of clusters to follow a power law kappa S^alpha with alpha=1.31^+0.06_-0.03 (errors are the 10th and 90th percentiles) down to fluxes of 2x10^-12 erg cm^-2 s^-1, i.e. considerably below the BCS flux limit. Although our best-fitting slope disagrees formally with the canonical value of -1.5 for a Euclidean distribution, the BCS logN-logS distribution is consistent with a non-evolving cluster population if cosmological effects are taken into account. Our sample will allow us to examine large-scale structure in the northern hemisphere, determine the spatial cluster-cluster correlation function, investigate correlations between the X-ray and optical properties of the clusters, establish the X-ray luminosity function for galaxy clusters, and discuss the implications of the results for cluster evolution.
NASA Astrophysics Data System (ADS)
Poppe, Sam; Barette, Florian; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu
2016-04-01
The Virunga Volcanic Province (VVP) is situated within the western branch of the East-African Rift. The geochemistry and petrology of its' volcanic products has been studied extensively in a fragmented manner. They represent a unique collection of silica-undersaturated, ultra-alkaline and ultra-potassic compositions, displaying marked geochemical variations over the area occupied by the VVP. We present a novel spatially-explicit database of existing whole-rock geochemical analyses of the VVP volcanics, compiled from international publications, (post-)colonial scientific reports and PhD theses. In the database, a total of 703 geochemical analyses of whole-rock samples collected from the 1950s until recently have been characterised with a geographical location, eruption source location, analytical results and uncertainty estimates for each of these categories. Comparative box plots and Kruskal-Wallis H tests on subsets of analyses with contrasting ages or analytical methods suggest that the overall database accuracy is consistent. We demonstrate how statistical techniques such as Principal Component Analysis (PCA) and subsequent cluster analysis allow the identification of clusters of samples with similar major-element compositions. The spatial patterns represented by the contrasting clusters show that both the historically active volcanoes represent compositional clusters which can be identified based on their contrasted silica and alkali contents. Furthermore, two sample clusters are interpreted to represent the most primitive, deep magma source within the VVP, different from the shallow magma reservoirs that feed the eight dominant large volcanoes. The samples from these two clusters systematically originate from locations which 1. are distal compared to the eight large volcanoes and 2. mostly coincide with the surface expressions of rift faults or NE-SW-oriented inherited Precambrian structures which were reactivated during rifting. The lava from the Mugogo eruption of 1957 belongs to these primitive clusters and is the only known to have erupted outside the current rift valley in historical times. We thus infer there is a distributed hazard of vent opening susceptibility additional to the susceptibility associated with the main Virunga edifices. This study suggests that the statistical analysis of such geochemical database may help to understand complex volcanic plumbing systems and the spatial distribution of volcanic hazards in active and poorly known volcanic areas such as the Virunga Volcanic Province.
Liao, Minlei; Li, Yunfeng; Kianifard, Farid; Obi, Engels; Arcona, Stephen
2016-03-02
Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward's methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores. The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster.
Hierarchical modeling of cluster size in wildlife surveys
Royle, J. Andrew
2008-01-01
Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Moisture structure of tropical cloud systems as inferred from SSM/I
NASA Technical Reports Server (NTRS)
Robertson, Franklin R.
1989-01-01
The structure of tropical cloud systems was examined using data obtained by the Special Sensor Microwave/Imager on vertically-integrated vapor, ice, and liquid water (including precipitable water) in a cloud cluster associated with a Pacific easterly wave. The cloud cluster provided a sample of the varying signatures of bulk microphysical processes in organized tropical convection. Composition techniques were used to interpret this variability and its significance in terms of the response of convection to its thermodynamic environment. The relative intensities of the ice and liquid-water signatures should provide insight on the relative contribution of stratiform vs convective rain and the characteristics of the water budgets of mesoscale convective systems.
NASA Astrophysics Data System (ADS)
Sri Purnami, Agustina; Adi Widodo, Sri; Charitas Indra Prahmana, Rully
2018-01-01
This study aimed to know the improvement of achievement and motivation of learning mathematics by using Team Accelerated Instruction. The research method used was the experiment with descriptive pre-test post-test experiment. The population in this study was all students of class VIII junior high school in Jogjakarta. The sample was taken using cluster random sampling technique. The instrument used in this research was questionnaire and test. Data analysis technique used was Wilcoxon test. It concluded that there was an increase in motivation and student achievement of class VII on linear equation system material by using the learning model of Team Accelerated Instruction. Based on the results of the learning model Team Accelerated Instruction can be used as a variation model in learning mathematics.
The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters
NASA Astrophysics Data System (ADS)
Bayliss, Matthew
2017-08-01
We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics |*| the infamous |*|gastrophysics|*| in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.
The Chandra Strong Lens Sample: Revealing Baryonic Physics In Strong Lensing Selected Clusters
NASA Astrophysics Data System (ADS)
Bayliss, Matthew
2017-09-01
We propose for Chandra imaging of the hot intra-cluster gas in a unique new sample of 29 galaxy clusters selected purely on their strong gravitational lensing signatures. This will be the first program targeting a purely strong lensing selected cluster sample, enabling new comparisons between the ICM properties and scaling relations of strong lensing and mass/ICM selected cluster samples. Chandra imaging, combined with high precision strong lens models, ensures powerful constraints on the distribution and state of matter in the cluster cores. This represents a novel angle from which we can address the role played by baryonic physics -- the infamous ``gastrophysics''-- in shaping the cores of massive clusters, and opens up an exciting new galaxy cluster discovery space with Chandra.
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Nnane, Daniel Ekane
2011-11-15
Contamination of surface waters is a pervasive threat to human health, hence, the need to better understand the sources and spatio-temporal variations of contaminants within river catchments. River catchment managers are required to sustainably monitor and manage the quality of surface waters. Catchment managers therefore need cost-effective low-cost long-term sustainable water quality monitoring and management designs to proactively protect public health and aquatic ecosystems. Multivariate and phage-lysis techniques were used to investigate spatio-temporal variations of water quality, main polluting chemophysical and microbial parameters, faecal micro-organisms sources, and to establish 'sentry' sampling sites in the Ouse River catchment, southeast England, UK. 350 river water samples were analysed for fourteen chemophysical and microbial water quality parameters in conjunction with the novel human-specific phages of Bacteroides GB-124 (Bacteroides GB-124). Annual, autumn, spring, summer, and winter principal components (PCs) explained approximately 54%, 75%, 62%, 48%, and 60%, respectively, of the total variance present in the datasets. Significant loadings of Escherichia coli, intestinal enterococci, turbidity, and human-specific Bacteroides GB-124 were observed in all datasets. Cluster analysis successfully grouped sampling sites into five clusters. Importantly, multivariate and phage-lysis techniques were useful in determining the sources and spatial extent of water contamination in the catchment. Though human faecal contamination was significant during dry periods, the main source of contamination was non-human. Bacteroides GB-124 could potentially be used for catchment routine microbial water quality monitoring. For a cost-effective low-cost long-term sustainable water quality monitoring design, E. coli or intestinal enterococci, turbidity, and Bacteroides GB-124 should be monitored all-year round in this river catchment. Copyright © 2011 Elsevier B.V. All rights reserved.
Smith, D.R.; Rogala, J.T.; Gray, B.R.; Zigler, S.J.; Newton, T.J.
2011-01-01
Reliable estimates of abundance are needed to assess consequences of proposed habitat restoration and enhancement projects on freshwater mussels in the Upper Mississippi River (UMR). Although there is general guidance on sampling techniques for population assessment of freshwater mussels, the actual performance of sampling designs can depend critically on the population density and spatial distribution at the project site. To evaluate various sampling designs, we simulated sampling of populations, which varied in density and degree of spatial clustering. Because of logistics and costs of large river sampling and spatial clustering of freshwater mussels, we focused on adaptive and non-adaptive versions of single and two-stage sampling. The candidate designs performed similarly in terms of precision (CV) and probability of species detection for fixed sample size. Both CV and species detection were determined largely by density, spatial distribution and sample size. However, designs did differ in the rate that occupied quadrats were encountered. Occupied units had a higher probability of selection using adaptive designs than conventional designs. We used two measures of cost: sample size (i.e. number of quadrats) and distance travelled between the quadrats. Adaptive and two-stage designs tended to reduce distance between sampling units, and thus performed better when distance travelled was considered. Based on the comparisons, we provide general recommendations on the sampling designs for the freshwater mussels in the UMR, and presumably other large rivers.
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
The Mass Function of Abell Clusters
NASA Astrophysics Data System (ADS)
Chen, J.; Huchra, J. P.; McNamara, B. R.; Mader, J.
1998-12-01
The velocity dispersion and mass functions for rich clusters of galaxies provide important constraints on models of the formation of Large-Scale Structure (e.g., Frenk et al. 1990). However, prior estimates of the velocity dispersion or mass function for galaxy clusters have been based on either very small samples of clusters (Bahcall and Cen 1993; Zabludoff et al. 1994) or large but incomplete samples (e.g., the Girardi et al. (1998) determination from a sample of clusters with more than 30 measured galaxy redshifts). In contrast, we approach the problem by constructing a volume-limited sample of Abell clusters. We collected individual galaxy redshifts for our sample from two major galaxy velocity databases, the NASA Extragalactic Database, NED, maintained at IPAC, and ZCAT, maintained at SAO. We assembled a database with velocity information for possible cluster members and then selected cluster members based on both spatial and velocity data. Cluster velocity dispersions and masses were calculated following the procedures of Danese, De Zotti, and di Tullio (1980) and Heisler, Tremaine, and Bahcall (1985), respectively. The final velocity dispersion and mass functions were analyzed in order to constrain cosmological parameters by comparison to the results of N-body simulations. Our data for the cluster sample as a whole and for the individual clusters (spatial maps and velocity histograms) in our sample is available on-line at http://cfa-www.harvard.edu/ huchra/clusters. This website will be updated as more data becomes available in the master redshift compilations, and will be expanded to include more clusters and large groups of galaxies.
Bayesian Analysis and Characterization of Multiple Populations in Galactic Globular Clusters
NASA Astrophysics Data System (ADS)
Wagner-Kaiser, Rachel A.; Stenning, David; Sarajedini, Ata; von Hippel, Ted; van Dyk, David A.; Robinson, Elliot; Stein, Nathan; Jefferys, William H.; BASE-9, HST UVIS Globular Cluster Treasury Program
2017-01-01
Globular clusters have long been important tools to unlock the early history of galaxies. Thus, it is crucial we understand the formation and characteristics of the globular clusters (GCs) themselves. Historically, GCs were thought to be simple and largely homogeneous populations, formed via collapse of a single molecular cloud. However, this classical view has been overwhelmingly invalidated by recent work. It is now clear that the vast majority of globular clusters in our Galaxy host two or more chemically distinct populations of stars, with variations in helium and light elements at discrete abundance levels. No coherent story has arisen that is able to fully explain the formation of multiple populations in globular clusters nor the mechanisms that drive stochastic variations from cluster to cluster.We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of 0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster. We also find that the proportion of the first population of stars increases with mass. Our results are examined in the context of proposed globular cluster formation scenarios.
Design of partially supervised classifiers for multispectral image data
NASA Technical Reports Server (NTRS)
Jeon, Byeungwoo; Landgrebe, David
1993-01-01
A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.
Sequential analysis of hydrochemical data for watershed characterization.
Thyne, Geoffrey; Güler, Cüneyt; Poeter, Eileen
2004-01-01
A methodology for characterizing the hydrogeology of watersheds using hydrochemical data that combine statistical, geochemical, and spatial techniques is presented. Surface water and ground water base flow and spring runoff samples (180 total) from a single watershed are first classified using hierarchical cluster analysis. The statistical clusters are analyzed for spatial coherence confirming that the clusters have a geological basis corresponding to topographic flowpaths and showing that the fractured rock aquifer behaves as an equivalent porous medium on the watershed scale. Then principal component analysis (PCA) is used to determine the sources of variation between parameters. PCA analysis shows that the variations within the dataset are related to variations in calcium, magnesium, SO4, and HCO3, which are derived from natural weathering reactions, and pH, NO3, and chlorine, which indicate anthropogenic impact. PHREEQC modeling is used to quantitatively describe the natural hydrochemical evolution for the watershed and aid in discrimination of samples that have an anthropogenic component. Finally, the seasonal changes in the water chemistry of individual sites were analyzed to better characterize the spatial variability of vertical hydraulic conductivity. The integrated result provides a method to characterize the hydrogeology of the watershed that fully utilizes traditional data.
Is the cluster environment quenching the Seyfert activity in elliptical and spiral galaxies?
NASA Astrophysics Data System (ADS)
de Souza, R. S.; Dantas, M. L. L.; Krone-Martins, A.; Cameron, E.; Coelho, P.; Hattab, M. W.; de Val-Borro, M.; Hilbe, J. M.; Elliott, J.; Hagen, A.; COIN Collaboration
2016-09-01
We developed a hierarchical Bayesian model (HBM) to investigate how the presence of Seyfert activity relates to their environment, herein represented by the galaxy cluster mass, M200, and the normalized cluster centric distance, r/r200. We achieved this by constructing an unbiased sample of galaxies from the Sloan Digital Sky Survey, with morphological classifications provided by the Galaxy Zoo Project. A propensity score matching approach is introduced to control the effects of confounding variables: stellar mass, galaxy colour, and star formation rate. The connection between Seyfert-activity and environmental properties in the de-biased sample is modelled within an HBM framework using the so-called logistic regression technique, suitable for the analysis of binary data (e.g. whether or not a galaxy hosts an AGN). Unlike standard ordinary least square fitting methods, our methodology naturally allows modelling the probability of Seyfert-AGN activity in galaxies on their natural scale, I.e. as a binary variable. Furthermore, we demonstrate how an HBM can incorporate information of each particular galaxy morphological type in an unified framework. In elliptical galaxies our analysis indicates a strong correlation of Seyfert-AGN activity with r/r200, and a weaker correlation with the mass of the host cluster. In spiral galaxies these trends do not appear, suggesting that the link between Seyfert activity and the properties of spiral galaxies are independent of the environment.
Strengths and limitations of molecular subtyping in a community outbreak of Legionnaires' disease.
Kool, J L; Buchholz, U; Peterson, C; Brown, E W; Benson, R F; Pruckler, J M; Fields, B S; Sturgeon, J; Lehnkering, E; Cordova, R; Mascola, L M; Butler, J C
2000-12-01
An epidemiological and microbiological investigation of a cluster of eight cases of Legionnaires' disease in Los Angeles County in November 1997 yielded conflicting results. The epidemiological part of the investigation implicated one of several mobile cooling towers used by a film studio in the centre of the outbreak area. However, water sampled from these cooling towers contained L. pneumophila serogroup 1 of another subtype than the strain that was recovered from case-patients in the outbreak. Samples from two cooling towers located downwind from all of the case-patients contained a Legionella strain that was indistinguishable from the outbreak strain by four subtyping techniques (AP-PCR, PFGE, MAb, and MLEE). It is unlikely that these cooling towers were the source of infection for all the case-patients, and they were not associated with risk of disease in the case-control study. The outbreak strain also was not distinguishable, by three subtyping techniques (AP-PCR, PFGE, and MAb), from a L. pneumophila strain that had caused an outbreak in Providence, RI, in 1993. Laboratory cross-contamination was unlikely because the initial subtyping was done in different laboratories. In this investigation, microbiology was helpful for distinguishing the outbreak cluster from unrelated cases of Legionnaires' disease occurring elsewhere. However, multiple subtyping techniques failed to distinguish environmental sources that were probably not associated with the outbreak. Persons investigating Legionnaires' disease outbreaks should be aware that microbiological subtyping does not always identify a source with absolute certainty.
Romarís-Hortas, Vanessa; García-Sartal, Cristina; Barciela-Alonso, María Carmen; Moreda-Piñeiro, Antonio; Bermejo-Barrera, Pilar
2010-02-10
Major and trace elements in North Atlantic seaweed originating from Galicia (northwestern Spain) were determined by using inductively coupled plasma-optical emission spectrometry (ICP-OES) (Ba, Ca, Cu, K, Mg, Mn, Na, Sr, and Zn), inductively coupled plasma-mass spectrometry (ICP-MS) (Br and I) and hydride generation-atomic fluorescence spectrometry (HG-AFS) (As). Pattern recognition techniques were then used to classify the edible seaweed according to their type (red, brown, and green seaweed) and also their variety (Wakame, Fucus, Sea Spaghetti, Kombu, Dulse, Nori, and Sea Lettuce). Principal component analysis (PCA) and cluster analysis (CA) were used as exploratory techniques, and linear discriminant analysis (LDA) and soft independent modeling of class analogy (SIMCA) were used as classification procedures. In total, t12 elements were determined in a range of 35 edible seaweed samples (20 brown seaweed, 10 red seaweed, 4 green seaweed, and 1 canned seaweed). Natural groupings of the samples (brown, red, and green types) were observed using PCA and CA (squared Euclidean distance between objects and Ward method as clustering procedure). The application of LDA gave correct assignation percentages of 100% for brown, red, and green types at a significance level of 5%. However, a satisfactory classification (recognition and prediction) using SIMCA was obtained only for red seaweed (100% of cases correctly classified), whereas percentages of 89 and 80% were obtained for brown seaweed for recognition (training set) and prediction (testing set), respectively.
Finding SDSS Galaxy Clusters in 4-dimensional Color Space Using the False Discovery Rate
NASA Astrophysics Data System (ADS)
Nichol, R. C.; Miller, C. J.; Reichart, D.; Wasserman, L.; Genovese, C.; SDSS Collaboration
2000-12-01
We describe a recently developed statistical technique that provides a meaningful cut-off in probability-based decision making. We are concerned with multiple testing, where each test produces a well-defined probability (or p-value). By well-known, we mean that the null hypothesis used to determine the p-value is fully understood and appropriate. The method is entitled False Discovery Rate (FDR) and its largest advantage over other measures is that it allows one to specify a maximal amount of acceptable error. As an example of this tool, we apply FDR to a four-dimensional clustering algorithm using SDSS data. For each galaxy (or test galaxy), we count the number of neighbors that fit within one standard deviation of a four dimensional Gaussian centered on that test galaxy. The mean and standard deviation of that Gaussian are determined from the colors and errors of the test galaxy. We then take that same Gaussian and place it on a random selection of n galaxies and make a similar count. In the limit of large n, we expect the median count around these random galaxies to represent a typical field galaxy. For every test galaxy we determine the probability (or p-value) that it is a field galaxy based on these counts. A low p-value implies that the test galaxy is in a cluster environment. Once we have a p-value for every galaxy, we use FDR to determine at what level we should make our probability cut-off. Once this cut-off is made, we have a final sample of galaxies that are cluster-like galaxies. Using FDR, we also know the maximum amount of field contamination in our cluster galaxy sample. We present our preliminary galaxy clustering results using these methods.
Multicolor photometry of the merging galaxy cluster A2319: Dynamics and star formation properties
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Peng-Fei; Yuan, Qi-Rong; Zhang, Li
2014-05-01
Asymmetric X-ray emission and a powerful cluster-scale radio halo indicate that A2319 is a merging cluster of galaxies. This paper presents our multicolor photometry for A2319 with 15 optical intermediate filters in the Beijing-Arizona-Taiwan-Connecticut (BATC) system. There are 142 galaxies with known spectroscopic redshifts within the viewing field of 58' × 58' centered on this rich cluster, including 128 member galaxies (called sample I). A large velocity dispersion in the rest frame, 1622{sub −70}{sup +91} km s{sup –1}, suggests merger dynamics in A2319. The contour map of projected density and localized velocity structure confirm the so-called A2319B substructure, at ∼10'more » northwest to the main concentration A2319A. The spectral energy distributions (SEDs) of more than 30,000 sources are obtained in our BATC photometry down to V ∼ 20 mag. A u-band (∼3551 Å) image with better seeing and spatial resolution, obtained with the Bok 2.3 m telescope at Kitt Peak, is taken to make star-galaxy separation and distinguish the overlapping contamination in the BATC aperture photometry. With color-color diagrams and photometric redshift technique, 233 galaxies brighter than h {sub BATC} = 19.0 are newly selected as member candidates after an exclusion of false candidates with contaminated BATC SEDs by eyeball-checking the u-band Bok image. The early-type galaxies are found to follow a tight color-magnitude correlation. Based on sample I and the enlarged sample of member galaxies (called sample II), subcluster A2319B is confirmed. The star formation properties of cluster galaxies are derived with the evolutionary synthesis model, PEGASE, assuming a Salpeter initial mass function and an exponentially decreasing star formation rate (SFR). A strong environmental effect on star formation histories is found in the manner that galaxies in the sparse regions have various star formation histories, while galaxies in the dense regions are found to have shorter SFR time scales, older stellar ages, and higher interstellar medium metallicities. For the merging cluster A2319, local surface density is a better environmental indicator rather than the cluster-centric distance. Compared with the well-relaxed cluster A2589, a higher fraction of star-forming galaxies is found in A2319, indicating that the galaxy-scale turbulence stimulated by the subcluster merger might have played a role in triggering the star formation activity.« less
Locality-Aware CTA Clustering For Modern GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Ang; Song, Shuaiwen; Liu, Weifeng
2017-04-08
In this paper, we proposed a novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-CTA locality. We first demonstrated the capability of the existing GPU hardware to exploit such locality, both spatially and temporally, on L1 or L1/Tex unified cache. To verify the potential of this locality, we quantified its existence in a broad spectrum of applications and discussed its sources of origin. Based on these insights, we proposed the concept of CTA-Clustering and its associated software techniques. Finally, We evaluated these techniques on all modern generations of NVIDIA GPU architectures. Themore » experimental results showed that our proposed clustering techniques could significantly improve on-chip cache performance.« less
Approaches to Recruiting ‘Hard-To-Reach’ Populations into Research: A Review of the Literature
Shaghaghi, Abdolreza; Bhopal, Raj S; Sheikh, Aziz
2011-01-01
Background: ‘Hard-to-reach’ is a term used to describe those sub-groups of the population that may be difficult to reach or involve in research or public health programmes. Application of a single term to call these sub-sections of populations implies a homogeneity within distinct groups, which does not necessarily exist. Different sampling techniques were introduced so far to recruit hard-to-reach populations. In this article, we have reviewed a range of approaches that have been used to widen participation in studies. Methods: We performed a Pubmed and Google search for relevant English language articles using the keywords and phrases: (hard-to-reach AND population* OR sampl*), (hidden AND population* OR sample*) and (“hard to reach” AND population* OR sample*) and a consultation of the retrieved articles’ bibliographies to extract empirical evidence from publications that discussed or examined the use of sampling techniques to recruit hidden or hard-to-reach populations in health studies. Results: Reviewing the literature has identified a range of techniques to recruit hard-to-reach populations, including snowball sampling, respondent-driven sampling (RDS), indigenous field worker sampling (IFWS), facility-based sampling (FBS), targeted sampling (TS), time-location (space) sampling (TLS), conventional cluster sampling (CCS) and capture re-capture sampling (CR). Conclusion: The degree of compliance with a study by a certain ‘hard-to-reach’ group depends on the characteristics of that group, recruitment technique used and the subject of interest. Irrespective of potential advantages or limitations of the recruitment techniques reviewed, their successful use depends mainly upon our knowledge about specific characteristics of the target populations. Thus in line with attempts to expand the current boundaries of our knowledge about recruitment techniques in health studies and their applications in varying situations, we should also focus on possibly all contributing factors which may have an impact on participation rate within a defined population group. PMID:24688904
Data Mining Techniques Applied to Hydrogen Lactose Breath Test.
Rubio-Escudero, Cristina; Valverde-Fernández, Justo; Nepomuceno-Chamorro, Isabel; Pontes-Balanza, Beatriz; Hernández-Mendoza, Yoedusvany; Rodríguez-Herrera, Alfonso
2017-01-01
Analyze a set of data of hydrogen breath tests by use of data mining tools. Identify new patterns of H2 production. Hydrogen breath tests data sets as well as k-means clustering as the data mining technique to a dataset of 2571 patients. Six different patterns have been extracted upon analysis of the hydrogen breath test data. We have also shown the relevance of each of the samples taken throughout the test. Analysis of the hydrogen breath test data sets using data mining techniques has identified new patterns of hydrogen generation upon lactose absorption. We can see the potential of application of data mining techniques to clinical data sets. These results offer promising data for future research on the relations between gut microbiota produced hydrogen and its link to clinical symptoms.
Random whole metagenomic sequencing for forensic discrimination of soils.
Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian
2014-01-01
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683
Stochastic coupled cluster theory: Efficient sampling of the coupled cluster expansion
NASA Astrophysics Data System (ADS)
Scott, Charles J. C.; Thom, Alex J. W.
2017-09-01
We consider the sampling of the coupled cluster expansion within stochastic coupled cluster theory. Observing the limitations of previous approaches due to the inherently non-linear behavior of a coupled cluster wavefunction representation, we propose new approaches based on an intuitive, well-defined condition for sampling weights and on sampling the expansion in cluster operators of different excitation levels. We term these modifications even and truncated selections, respectively. Utilising both approaches demonstrates dramatically improved calculation stability as well as reduced computational and memory costs. These modifications are particularly effective at higher truncation levels owing to the large number of terms within the cluster expansion that can be neglected, as demonstrated by the reduction of the number of terms to be sampled when truncating at triple excitations by 77% and hextuple excitations by 98%.
Quality Evaluation of Agricultural Distillates Using an Electronic Nose
Dymerski, Tomasz; Gębicki, Jacek; Wardencki, Waldemar; Namieśnik, Jacek
2013-01-01
The paper presents the application of an electronic nose instrument to fast evaluation of agricultural distillates differing in quality. The investigations were carried out using a prototype of electronic nose equipped with a set of six semiconductor sensors by FIGARO Co., an electronic circuit converting signal into digital form and a set of thermostats able to provide gradient temperature characteristics to a gas mixture. A volatile fraction of the agricultural distillate samples differing in quality was obtained by barbotage. Interpretation of the results involved three data analysis techniques: principal component analysis, single-linkage cluster analysis and cluster analysis with spheres method. The investigations prove the usefulness of the presented technique in the quality control of agricultural distillates. Optimum measurements conditions were also defined, including volumetric flow rate of carrier gas (15 L/h), thermostat temperature during the barbotage process (15 °C) and time of sensor signal acquisition from the onset of the barbotage process (60 s). PMID:24287525
A Photometric redshift galaxy catalog from the Red-Sequence Cluster Survey
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hsieh, Bau-Ching; /Taiwan, Natl. Central U. /Taipei, Inst. Astron. Astrophys.; Yee, H.K.C.
2005-02-01
The Red-Sequence Cluster Survey (RCS) provides a large and deep photometric catalog of galaxies in the z' and R{sub c} bands for 90 square degrees of sky, and supplemental V and B data have been obtained for 33.6 deg{sup 2}. They compile a photometric redshift catalog from these 4-band data by utilizing the empirical quadratic polynomial photometric redshift fitting technique in combination with CNOC2 and GOODS/HDF-N redshift data. The training set includes 4924 spectral redshifts. The resulting catalog contains more than one million galaxies with photometric redshifts < 1.5 and R{sub c} < 24, giving an rms scatter {delta}({Delta}z)
Dating the Tidal Disruption of Globular Clusters with GAIA Data on Their Stellar Streams
NASA Astrophysics Data System (ADS)
Bose, Sownak; Ginsburg, Idan; Loeb, Abraham
2018-05-01
The Gaia mission promises to deliver precision astrometry at an unprecedented level, heralding a new era for discerning the kinematic and spatial coordinates of stars in our Galaxy. Here, we present a new technique for estimating the age of tidally disrupted globular cluster streams using the proper motions and parallaxes of tracer stars. We evolve the collisional dynamics of globular clusters within the evolving potential of a Milky Way-like halo extracted from a cosmological ΛCDM simulation and analyze the resultant streams as they would be observed by Gaia. The simulations sample a variety of globular cluster orbits, and account for stellar evolution and the gravitational influence of the disk of the Milky Way. We show that a characteristic timescale, obtained from the dispersion of the proper motions and parallaxes of stars within the stream, is a good indicator for the time elapsed since the stream has been freely expanding away due to the tidal disruption of the globular cluster. This timescale, in turn, places a lower limit on the age of the cluster. The age can be deduced from astrometry using a modest number of stars, with the error on this estimate depending on the proximity of the stream and the number of tracer stars used.
Deep spectroscopy of nearby galaxy clusters - II. The Hercules cluster
NASA Astrophysics Data System (ADS)
Agulli, I.; Aguerri, J. A. L.; Diaferio, A.; Dominguez Palmero, L.; Sánchez-Janssen, R.
2017-06-01
We carried out the deep spectroscopic observations of the nearby cluster A 2151 with AF2/WYFFOS@WHT. The caustic technique enables us to identify 360 members brighter than Mr = -16 and within 1.3R200. We separated the members into subsamples according to photometrical and dynamical properties such as colour, local environment and infall time. The completeness of the catalogue and our large sample allow us to analyse the velocity dispersion and the luminosity functions (LFs) of the identified populations. We found evidence of a cluster still in its collapsing phase. The LF of the red population of A 2151 shows a deficit of dwarf red galaxies. Moreover, the normalized LFs of the red and blue populations of A 2151 are comparable to the red and blue LFs of the field, even if the blue galaxies start dominating 1 mag fainter and the red LF is well represented by a single Schechter function rather than a double Schechter function. We discuss how the evolution of cluster galaxies depends on their mass: bright and intermediate galaxies are mainly affected by dynamical friction and internal/mass quenching, while the evolution of dwarfs is driven by environmental processes that need time and a hostile cluster environment to remove the gas reservoirs and halt the star formation.
Occurrence of Radio Minihalos in a Mass-limited Sample of Galaxy Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Giacintucci, Simona; Clarke, Tracy E.; Markevitch, Maxim
2017-06-01
We investigate the occurrence of radio minihalos—diffuse radio sources of unknown origin observed in the cores of some galaxy clusters—in a statistical sample of 58 clusters drawn from the Planck Sunyaev–Zel’dovich cluster catalog using a mass cut ( M {sub 500} > 6 × 10{sup 14} M {sub ⊙}). We supplement our statistical sample with a similarly sized nonstatistical sample mostly consisting of clusters in the ACCEPT X-ray catalog with suitable X-ray and radio data, which includes lower-mass clusters. Where necessary (for nine clusters), we reanalyzed the Very Large Array archival radio data to determine whether a minihalo is present.more » Our total sample includes all 28 currently known and recently discovered radio minihalos, including six candidates. We classify clusters as cool-core or non-cool-core according to the value of the specific entropy floor in the cluster center, rederived or newly derived from the Chandra X-ray density and temperature profiles where necessary (for 27 clusters). Contrary to the common wisdom that minihalos are rare, we find that almost all cool cores—at least 12 out of 15 (80%)—in our complete sample of massive clusters exhibit minihalos. The supplementary sample shows that the occurrence of minihalos may be lower in lower-mass cool-core clusters. No minihalos are found in non-cool cores or “warm cores.” These findings will help test theories of the origin of minihalos and provide information on the physical processes and energetics of the cluster cores.« less
NASA Technical Reports Server (NTRS)
Ballew, G.
1977-01-01
The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
A pilot cluster randomized controlled trial of structured goal-setting following stroke.
Taylor, William J; Brown, Melanie; William, Levack; McPherson, Kathryn M; Reed, Kirk; Dean, Sarah G; Weatherall, Mark
2012-04-01
To determine the feasibility, the cluster design effect and the variance and minimal clinical importance difference in the primary outcome in a pilot study of a structured approach to goal-setting. A cluster randomized controlled trial. Inpatient rehabilitation facilities. People who were admitted to inpatient rehabilitation following stroke who had sufficient cognition to engage in structured goal-setting and complete the primary outcome measure. Structured goal elicitation using the Canadian Occupational Performance Measure. Quality of life at 12 weeks using the Schedule for Individualised Quality of Life (SEIQOL-DW), Functional Independence Measure, Short Form 36 and Patient Perception of Rehabilitation (measuring satisfaction with rehabilitation). Assessors were blinded to the intervention. Four rehabilitation services and 41 patients were randomized. We found high values of the intraclass correlation for the outcome measures (ranging from 0.03 to 0.40) and high variance of the SEIQOL-DW (SD 19.6) in relation to the minimally importance difference of 2.1, leading to impractically large sample size requirements for a cluster randomized design. A cluster randomized design is not a practical means of avoiding contamination effects in studies of inpatient rehabilitation goal-setting. Other techniques for coping with contamination effects are necessary.
Clustervision: Visual Supervision of Unsupervised Clustering.
Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam
2018-01-01
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Clermont, Gilles; Chen, Lujie; Dubrawski, Artur W.; Ren, Dianxu; Hoffman, Leslie A.; Pinsky, Michael R.; Hravnak, Marilyn
2018-01-01
Cardiorespiratory instability (CRI) in monitored step-down unit (SDU) patients has a variety of etiologies, and likely manifests in patterns of vital signs (VS) changes. We explored use of clustering techniques to identify patterns in the initial CRI epoch (CRI1; first exceedances of VS beyond stability thresholds after SDU admission) of unstable patients, and inter-cluster differences in admission characteristics and outcomes. Continuous noninvasive monitoring of heart rate (HR), respiratory rate (RR), and pulse oximetry (SpO2) were sampled at 1/20 Hz. We identified CRI1 in 165 patients, employed hierarchical and k-means clustering, tested several clustering solutions, used 10-fold cross validation to establish the best solution and assessed inter-cluster differences in admission characteristics and outcomes. Three clusters (C) were derived: C1) normal/high HR and RR, normal SpO2 (n = 30); C2) normal HR and RR, low SpO2 (n = 103); and C3) low/normal HR, low RR and normal SpO2 (n = 32). Clusters were significantly different based on age (p < 0.001; older patients in C2), number of comorbidities (p = 0.008; more C2 patients had ≥ 2) and hospital length of stay (p = 0.006; C1 patients stayed longer). There were no between-cluster differences in SDU length of stay, or mortality. Three different clusters of VS presentations for CRI1 were identified. Clusters varied on age, number of comorbidities and hospital length of stay. Future study is needed to determine if there are common physiologic underpinnings of VS clusters which might inform clinical decision-making when CRI first manifests. PMID:28229353
The Integrated Cluster Finder for the ARCHES project
NASA Astrophysics Data System (ADS)
Mints, Alexey; Schwope, Axel; Rosen, Simon; Pineau, François-Xavier; Carrera, Francisco
2017-01-01
Context. Clusters of galaxies are important for cosmology and astrophysics. They may be discovered through either the summed optical/IR radiation originating from their member galaxies or via X-ray emission originating from the hot intracluster medium. X-ray samples are not affected by projection effects but a redshift determination typically needs optical and infrared follow-up to then infer X-ray temperatures and luminosities. Aims: We want to confirm serendipitously discovered X-ray emitting cluster candidates and measure their cosmological redshift through the analysis and exploration of multi-wavelength photometric catalogues. Methods: We developed a tool, the Integrated Cluster Finder (ICF), to search for clusters by determining overdensities of potential member galaxies in optical and infrared catalogues. Based on a spectroscopic meta-catalogue we calibrated colour-redshift relations that combine optical (SDSS) and IR data (UKIDSS, WISE). The tool is used to quantify the overdensity of galaxies against the background via a modified redMaPPer technique and to quantify the confidence of a cluster detection. Results: Cluster finding results are compared to reference catalogues found in the literature. The results agree to within 95-98%. The tool is used to confirm 488 out of 830 cluster candidates drawn from 3XMMe in the footprint of the SDSS and CFHT catalogues. Conclusions: The ICF is a flexible and highly efficient tool to search for galaxy clusters in multiple catalogues and is freely available to the community. It may be used to identify the cluster content in future X-ray catalogues from XMM-Newton and eventually from eROSITA.
Wickham, J.D.; Stehman, S.V.; Smith, J.H.; Wade, T.G.; Yang, L.
2004-01-01
Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, within-cluster correlation may reduce the precision of the accuracy estimates. The detailed population information to quantify a priori the effect of within-cluster correlation on precision is typically unavailable. Consequently, a convenient, practical approach to evaluate the likely performance of a two-stage cluster sample is needed. We describe such an a priori evaluation protocol focusing on the spatial distribution of the sample by land-cover class across different cluster sizes and costs of different sampling options, including options not imposing clustering. This protocol also assesses the two-stage design's adequacy for estimating the precision of accuracy estimates for rare land-cover classes. We illustrate the approach using two large-area, regional accuracy assessments from the National Land-Cover Data (NLCD), and describe how the a priorievaluation was used as a decision-making tool when implementing the NLCD design.
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
Cognitive Clusters in Specific Learning Disorder.
Poletti, Michele; Carretta, Elisa; Bonvicini, Laura; Giorgi-Rossi, Paolo
The heterogeneity among children with learning disabilities still represents a barrier and a challenge in their conceptualization. Although a dimensional approach has been gaining support, the categorical approach is still the most adopted, as in the recent fifth edition of the Diagnostic and Statistical Manual of Mental Disorders. The introduction of the single overarching diagnostic category of specific learning disorder (SLD) could underemphasize interindividual clinical differences regarding intracategory cognitive functioning and learning proficiency, according to current models of multiple cognitive deficits at the basis of neurodevelopmental disorders. The characterization of specific cognitive profiles associated with an already manifest SLD could help identify possible early cognitive markers of SLD risk and distinct trajectories of atypical cognitive development leading to SLD. In this perspective, we applied a cluster analysis to identify groups of children with a Diagnostic and Statistical Manual-based diagnosis of SLD with similar cognitive profiles and to describe the association between clusters and SLD subtypes. A sample of 205 children with a diagnosis of SLD were enrolled. Cluster analyses (agglomerative hierarchical and nonhierarchical iterative clustering technique) were used successively on 10 core subtests of the Wechsler Intelligence Scale for Children-Fourth Edition. The 4-cluster solution was adopted, and external validation found differences in terms of SLD subtype frequencies and learning proficiency among clusters. Clinical implications of these findings are discussed, tracing directions for further studies.
Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin
2009-02-01
A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.
Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad
2014-01-24
In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.
THE S{sup 4}G PERSPECTIVE ON CIRCUMSTELLAR DUST EXTINCTION OF ASYMPTOTIC GIANT BRANCH STARS IN M100
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meidt, Sharon E.; Schinnerer, Eva; Munoz-Mateos, Juan-Carlos
2012-04-01
We examine the effect of circumstellar dust extinction on the near-IR (NIR) contribution of asymptotic giant branch (AGB) stars in intermediate-age clusters throughout the disk of M100. For our sample of 17 AGB-dominated clusters we extract optical-to-mid-IR spectral energy distributions (SEDs) and find that NIR brightness is coupled to the mid-IR dust emission in such a way that a significant reduction of AGB light, of up to 1 mag in the K band, follows from extinction by the dust shell formed during this stage. Since the dust optical depth varies with AGB chemistry (C-rich or O-rich), our results suggest thatmore » the contribution of AGB stars to the flux from their host clusters will be closely linked to the metallicity and the progenitor mass of the AGB star, to which dust chemistry and mass-loss rate are sensitive. Our sample of clusters-each the analogue of a {approx}1 Gyr old post-starburst galaxy-has implications within the context of mass and age estimation via SED modeling at high-z: we find that the average {approx}0.5 mag extinction estimated here may be sufficient to reduce the AGB contribution in the (rest-frame) K band from {approx}70%, as predicted in the latest generation of synthesis models, to {approx}35%. Our technique for selecting AGB-dominated clusters in nearby galaxies promises to be effective for discriminating the uncertainties associated with AGB stars in intermediate-age populations that plague age and mass estimation in high-z galaxies.« less
A Highly Efficient Design Strategy for Regression with Outcome Pooling
Mitchell, Emily M.; Lyles, Robert H.; Manatunga, Amita K.; Perkins, Neil J.; Schisterman, Enrique F.
2014-01-01
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. PMID:25220822
A highly efficient design strategy for regression with outcome pooling.
Mitchell, Emily M; Lyles, Robert H; Manatunga, Amita K; Perkins, Neil J; Schisterman, Enrique F
2014-12-10
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.
Focus-based filtering + clustering technique for power-law networks with small world phenomenon
NASA Astrophysics Data System (ADS)
Boutin, François; Thièvre, Jérôme; Hascoët, Mountaz
2006-01-01
Realistic interaction networks usually present two main properties: a power-law degree distribution and a small world behavior. Few nodes are linked to many nodes and adjacent nodes are likely to share common neighbors. Moreover, graph structure usually presents a dense core that is difficult to explore with classical filtering and clustering techniques. In this paper, we propose a new filtering technique accounting for a user-focus. This technique extracts a tree-like graph with also power-law degree distribution and small world behavior. Resulting structure is easily drawn with classical force-directed drawing algorithms. It is also quickly clustered and displayed into a multi-level silhouette tree (MuSi-Tree) from any user-focus. We built a new graph filtering + clustering + drawing API and report a case study.
Scalable Prediction of Energy Consumption using Incremental Time Series Clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Noor, Muhammad Usman
2013-10-09
Time series datasets are a canonical form of high velocity Big Data, and often generated by pervasive sensors, such as found in smart infrastructure. Performing predictive analytics on time series data can be computationally complex, and requires approximation techniques. In this paper, we motivate this problem using a real application from the smart grid domain. We propose an incremental clustering technique, along with a novel affinity score for determining cluster similarity, which help reduce the prediction error for cumulative time series within a cluster. We evaluate this technique, along with optimizations, using real datasets from smart meters, totaling ~700,000 datamore » points, and show the efficacy of our techniques in improving the prediction error of time series data within polynomial time.« less
Bushel, Pierre R; Wolfinger, Russell D; Gibson, Greg
2007-01-01
Background Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. Results We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. Conclusion The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable. PMID:17408499
Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys
Hund, Lauren; Bedrick, Edward J.; Pagano, Marcello
2015-01-01
Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis. PMID:26125967
Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys.
Hund, Lauren; Bedrick, Edward J; Pagano, Marcello
2015-01-01
Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis.
The relative impact of baryons and cluster shape on weak lensing mass estimates of galaxy clusters
NASA Astrophysics Data System (ADS)
Lee, B. E.; Le Brun, A. M. C.; Haq, M. E.; Deering, N. J.; King, L. J.; Applegate, D.; McCarthy, I. G.
2018-05-01
Weak gravitational lensing depends on the integrated mass along the line of sight. Baryons contribute to the mass distribution of galaxy clusters and the resulting mass estimates from lensing analysis. We use the cosmo-OWLS suite of hydrodynamic simulations to investigate the impact of baryonic processes on the bias and scatter of weak lensing mass estimates of clusters. These estimates are obtained by fitting NFW profiles to mock data using MCMC techniques. In particular, we examine the difference in estimates between dark matter-only runs and those including various prescriptions for baryonic physics. We find no significant difference in the mass bias when baryonic physics is included, though the overall mass estimates are suppressed when feedback from AGN is included. For lowest-mass systems for which a reliable mass can be obtained (M200 ≈ 2 × 1014M⊙), we find a bias of ≈-10 per cent. The magnitude of the bias tends to decrease for higher mass clusters, consistent with no bias for the most massive clusters which have masses comparable to those found in the CLASH and HFF samples. For the lowest mass clusters, the mass bias is particularly sensitive to the fit radii and the limits placed on the concentration prior, rendering reliable mass estimates difficult. The scatter in mass estimates between the dark matter-only and the various baryonic runs is less than between different projections of individual clusters, highlighting the importance of triaxiality.
Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J
2009-04-01
Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67x3 (67 clusters of three observations) and a 33x6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67x3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis.
Arnup, Sarah J; McKenzie, Joanne E; Hemming, Karla; Pilcher, David; Forbes, Andrew B
2017-08-15
In a cluster randomised crossover (CRXO) design, a sequence of interventions is assigned to a group, or 'cluster' of individuals. Each cluster receives each intervention in a separate period of time, forming 'cluster-periods'. Sample size calculations for CRXO trials need to account for both the cluster randomisation and crossover aspects of the design. Formulae are available for the two-period, two-intervention, cross-sectional CRXO design, however implementation of these formulae is known to be suboptimal. The aims of this tutorial are to illustrate the intuition behind the design; and provide guidance on performing sample size calculations. Graphical illustrations are used to describe the effect of the cluster randomisation and crossover aspects of the design on the correlation between individual responses in a CRXO trial. Sample size calculations for binary and continuous outcomes are illustrated using parameters estimated from the Australia and New Zealand Intensive Care Society - Adult Patient Database (ANZICS-APD) for patient mortality and length(s) of stay (LOS). The similarity between individual responses in a CRXO trial can be understood in terms of three components of variation: variation in cluster mean response; variation in the cluster-period mean response; and variation between individual responses within a cluster-period; or equivalently in terms of the correlation between individual responses in the same cluster-period (within-cluster within-period correlation, WPC), and between individual responses in the same cluster, but in different periods (within-cluster between-period correlation, BPC). The BPC lies between zero and the WPC. When the WPC and BPC are equal the precision gained by crossover aspect of the CRXO design equals the precision lost by cluster randomisation. When the BPC is zero there is no advantage in a CRXO over a parallel-group cluster randomised trial. Sample size calculations illustrate that small changes in the specification of the WPC or BPC can increase the required number of clusters. By illustrating how the parameters required for sample size calculations arise from the CRXO design and by providing guidance on both how to choose values for the parameters and perform the sample size calculations, the implementation of the sample size formulae for CRXO trials may improve.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peppas, N.A.; Hill-Lievense, M.E.; Hooker, D.T. II
1981-01-01
Seven coal samples ranging from a lignite with 69.95% carbon to an anthracite with 94.17% carbon on a dry mineral matter-free (dmmf) basis were extracted with pyridine at its reflux temperature for two weeks. The coal matrices obtained were subjected to two degradation techniques, the Sternberg reductive alkylation technique and the Miyake alkylation technique. Gel permeation chromatographic analysis of pyridine-extracted liquids of the alkylated coal showed average molecular weights smaller than those of the original coal extracts. Electron impact mass spectrometry was used to obtain the mass spectra of these alkylated coal samples. Based on investigation of the recurring patternmore » of the peaks of the mass spectra of these products it was concluded that a cluster size of 126 to 130 is characteristic of the crosslinked structure of the coal studied. In addition, several chemical compounds in the range of m/e 78-191 were identified.« less
75 FR 44937 - Submission for OMB Review; Comment Request
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-30
... is a block cluster, which consists of one or more contiguous census blocks. The P sample is a sample of housing units and persons obtained independently from the census for a sample of block clusters. The E sample is a sample of census housing units and enumerations in the same block of clusters as the...
Descriptive epidemiology of typhoid fever during an epidemic in Harare, Zimbabwe, 2012.
Polonsky, Jonathan A; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J
2014-01-01
Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range.
Descriptive Epidemiology of Typhoid Fever during an Epidemic in Harare, Zimbabwe, 2012
Polonsky, Jonathan A.; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J.
2014-01-01
Background Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. Methods A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. Principal Findings We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. Conclusions This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range. PMID:25486292
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.
Thermodynamics and Kinetics of Prenucleation Clusters, Classical and Non-Classical Nucleation
Zahn, Dirk
2015-01-01
Recent observations of prenucleation species and multi-stage crystal nucleation processes challenge the long-established view on the thermodynamics of crystal formation. Here, we review and generalize extensions to classical nucleation theory. Going beyond the conventional implementation as has been used for more than a century now, nucleation inhibitors, precursor clusters and non-classical nucleation processes are rationalized as well by analogous concepts based on competing interface and bulk energy terms. This is illustrated by recent examples of species formed prior to/instead of crystal nucleation and multi-step nucleation processes. Much of the discussed insights were obtained from molecular simulation using advanced sampling techniques, briefly summarized herein for both nucleation-controlled and diffusion-controlled aggregate formation. PMID:25914369
Yücel, Yasin; Sultanoğlu, Pınar
2013-09-01
Chemical characterisation has been carried out on 45 honey samples collected from Hatay region of Turkey. The concentrations of 17 elements were determined by inductively coupled plasma optical emission spectrometry (ICP-OES). Ca, K, Mg and Na were the most abundant elements, with mean contents of 219.38, 446.93, 49.06 and 95.91 mg kg(-1) respectively. The trace element mean contents ranged between 0.03 and 15.07 mg kg(-1). Chemometric methods such as principal component analysis (PCA) and cluster analysis (CA) techniques were applied to classify honey according to mineral content. The first most important principal component (PC) was strongly associated with the value of Al, B, Cd and Co. CA showed eight clusters corresponding to the eight botanical origins of honey. PCA explained 75.69% of the variance with the first six PC variables. Chemometric analysis of the analytical data allowed the accurate classification of the honey samples according to origin. Copyright © 2013 Elsevier Ltd. All rights reserved.
Identification of novel Theileria genotypes from Grant's gazelle
Hooge, Janis; Howe, Laryssa; Ezenwa, Vanessa O.
2015-01-01
Blood samples collected from Grant's gazelles (Nanger granti) in Kenya were screened for hemoparasites using a combination of microscopic and molecular techniques. All 69 blood smears examined by microscopy were positive for hemoparasites. In addition, Theileria/Babesia DNA was detected in all 65 samples screened by PCR for a ~450-base pair fragment of the V4 hypervariable region of the 18S rRNA gene. Sequencing and BLAST analysis of a subset of PCR amplicons revealed widespread co-infection (25/39) and the existence of two distinct Grant's gazelle Theileria subgroups. One group of 11 isolates clustered as a subgroup with previously identified Theileria ovis isolates from small ruminants from Europe, Asia and Africa; another group of 3 isolates clustered with previously identified Theileria spp. isolates from other African antelope. Based on extensive levels of sequence divergence (1.2–2%) from previously reported Theileria species within Kenya and worldwide, the Theileria isolates detected in Grant's gazelles appear to represent at least two novel Theileria genotypes. PMID:25973394
Identification of novel Theileria genotypes from Grant's gazelle.
Hooge, Janis; Howe, Laryssa; Ezenwa, Vanessa O
2015-08-01
Blood samples collected from Grant's gazelles (Nanger granti) in Kenya were screened for hemoparasites using a combination of microscopic and molecular techniques. All 69 blood smears examined by microscopy were positive for hemoparasites. In addition, Theileria/Babesia DNA was detected in all 65 samples screened by PCR for a ~450-base pair fragment of the V4 hypervariable region of the 18S rRNA gene. Sequencing and BLAST analysis of a subset of PCR amplicons revealed widespread co-infection (25/39) and the existence of two distinct Grant's gazelle Theileria subgroups. One group of 11 isolates clustered as a subgroup with previously identified Theileria ovis isolates from small ruminants from Europe, Asia and Africa; another group of 3 isolates clustered with previously identified Theileria spp. isolates from other African antelope. Based on extensive levels of sequence divergence (1.2-2%) from previously reported Theileria species within Kenya and worldwide, the Theileria isolates detected in Grant's gazelles appear to represent at least two novel Theileria genotypes.
A HST/WFC3 Search for Substellar Companions in the Orion Nebula Cluster
NASA Astrophysics Data System (ADS)
Strampelli, Giovanni Maria; Aguilar, Jonathan; Aparicio, Antonio; Piotto, Giampaolo; Pueyo, Laurent; Robberto, Massimo
2018-01-01
We present new results relative to the population of substellar binaries in the Orion Nebula Cluster. We reprocessed HST/WFC3 data using an analysis technique developed to detect close companions in the wings of the stellar PSFs, based on the PyKLIP implementation of the KLIP PSF subtraction algorithm. Starting from a sample of ~1200 stars selected over the range J=11-15 mag, we were able to uncover ~80 candidate companions in the magnitude range J=16-23 mag. We use the presence of the 1.4 micron H2O absorption feature in the companion photosphere to discriminate 32 bona-fide substellar candidates from a population of reddened background objects. We derive an estimate of the companion mass assuming a 2Myr isochrone and the reddening of their primary. With 8 stellar companions, 19 brown dwarfs and 5 planetary mass objects, our study provide us with an unbiased sample of companions at the low-mass end of the IMF, probing the transition from binary to planetary systems.
Jeemon, Panniyammakal; Narayanan, Gitanjali; Kondal, Dimple; Kahol, Kashvi; Bharadwaj, Ashok; Purty, Anil; Negi, Prakash; Ladhani, Sulaiman; Sanghvi, Jyoti; Singh, Kuldeep; Kapoor, Deksha; Sobti, Nidhi; Lall, Dorothy; Manimunda, Sathyaprakash; Dwivedi, Supriya; Toteja, Gurudyal; Prabhakaran, Dorairaj
2016-03-15
Effective task-shifting interventions targeted at reducing the global cardiovascular disease (CVD) epidemic in low and middle-income countries (LMICs) are urgently needed. DISHA is a cluster randomised controlled trial conducted across 10 sites (5 in phase 1 and 5 in phase 2) in India in 120 clusters. At each site, 12 clusters were randomly selected from a district. A cluster is defined as a small village with 250-300 households and well defined geographical boundaries. They were then randomly allocated to intervention and control clusters in a 1:1 allocation sequence. If any of the intervention and control clusters were <10 km apart, one was dropped and replaced with another randomly selected cluster from the same district. The study included a representative baseline cross-sectional survey, development of a structured intervention model, delivery of intervention for a minimum period of 18 months by trained frontline health workers (mainly Anganwadi workers and ASHA workers) and a post intervention survey in a representative sample. The study staff had no information on intervention allocation until the completion of the baseline survey. In order to ensure comparability of data across sites, the DISHA study follows a common protocol and manual of operation with standardized measurement techniques. Our study is the largest community based cluster randomised trial in low and middle-income country settings designed to test the effectiveness of 'task shifting' interventions involving frontline health workers for cardiovascular risk reduction. CTRI/2013/10/004049 . Registered 7 October 2013.
DOE Office of Scientific and Technical Information (OSTI.GOV)
The plpdfa software is a product of an LDRD project at LLNL entitked "Adaptive Sampling for Very High Throughput Data Streams" (tracking number 11-ERD-035). This software was developed by a graduate student summer intern, Chris Challis, who worked under project PI Dan Merl furing the summer of 2011. The software the source code is implementing is a statistical analysis technique for clustering and classification of text-valued data. The method had been previously published by the PI in the open literature.
ERIC Educational Resources Information Center
Sawangsamutchai, Yutthasak; Rattanavich, Saowalak
2016-01-01
The objective of this research is to compare the English reading comprehension and motivation to read of seventh grade Thai students taught with applied instruction through the genre-based approach and teachers' manual. A randomized pre-test post-test control group design was used through the cluster random sampling technique. The data were…
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review
Morris, Tom; Gray, Laura
2017-01-01
Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637
McDonald, Gene D; Storrie-Lombardi, Michael C
2006-02-01
The relative abundance of the protein amino acids has been previously investigated as a potential marker for biogenicity in meteoritic samples. However, these investigations were executed without a quantitative metric to evaluate distribution variations, and they did not account for the possibility of interdisciplinary systematic error arising from inter-laboratory differences in extraction and detection techniques. Principal component analysis (PCA), hierarchical cluster analysis (HCA), and stochastic probabilistic artificial neural networks (ANNs) were used to compare the distributions for nine protein amino acids previously reported for the Murchison carbonaceous chondrite, Mars meteorites (ALH84001, Nakhla, and EETA79001), prebiotic synthesis experiments, and terrestrial biota and sediments. These techniques allowed us (1) to identify a shift in terrestrial amino acid distributions secondary to diagenesis; (2) to detect differences in terrestrial distributions that may be systematic differences between extraction and analysis techniques in biological and geological laboratories; and (3) to determine that distributions in meteoritic samples appear more similar to prebiotic chemistry samples than they do to the terrestrial unaltered or diagenetic samples. Both diagenesis and putative interdisciplinary differences in analysis complicate interpretation of meteoritic amino acid distributions. We propose that the analysis of future samples from such diverse sources as meteoritic influx, sample return missions, and in situ exploration of Mars would be less ambiguous with adoption of standardized assay techniques, systematic inclusion of assay standards, and the use of a quantitative, probabilistic metric. We present here one such metric determined by sequential feature extraction and normalization (PCA), information-driven automated exploration of classification possibilities (HCA), and prediction of classification accuracy (ANNs).
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.
Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo
2017-01-01
Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
2013-01-01
Background Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. Results To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations. The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. Conclusions We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs. PMID:24160725
Hedt-Gauthier, Bethany L; Mitsunaga, Tisha; Hund, Lauren; Olives, Casey; Pagano, Marcello
2013-10-26
Traditional Lot Quality Assurance Sampling (LQAS) designs assume observations are collected using simple random sampling. Alternatively, randomly sampling clusters of observations and then individuals within clusters reduces costs but decreases the precision of the classifications. In this paper, we develop a general framework for designing the cluster(C)-LQAS system and illustrate the method with the design of data quality assessments for the community health worker program in Rwanda. To determine sample size and decision rules for C-LQAS, we use the beta-binomial distribution to account for inflated risk of errors introduced by sampling clusters at the first stage. We present general theory and code for sample size calculations.The C-LQAS sample sizes provided in this paper constrain misclassification risks below user-specified limits. Multiple C-LQAS systems meet the specified risk requirements, but numerous considerations, including per-cluster versus per-individual sampling costs, help identify optimal systems for distinct applications. We show the utility of C-LQAS for data quality assessments, but the method generalizes to numerous applications. This paper provides the necessary technical detail and supplemental code to support the design of C-LQAS for specific programs.
NASA Astrophysics Data System (ADS)
Kazin, Eyal A.; Sánchez, Ariel G.; Cuesta, Antonio J.; Beutler, Florian; Chuang, Chia-Hsun; Eisenstein, Daniel J.; Manera, Marc; Padmanabhan, Nikhil; Percival, Will J.; Prada, Francisco; Ross, Ashley J.; Seo, Hee-Jong; Tinker, Jeremy; Tojeiro, Rita; Xu, Xiaoying; Brinkmann, J.; Joel, Brownstein; Nichol, Robert C.; Schlegel, David J.; Schneider, Donald P.; Thomas, Daniel
2013-10-01
We analyse the 2D correlation function of the Sloan Digital Sky Survey-III Baryon Oscillation Spectroscopic Survey (BOSS) CMASS sample of massive galaxies of the ninth data release to measure cosmic expansion H and the angular diameter distance DA at a mean redshift of
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters
NASA Astrophysics Data System (ADS)
Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.
2018-06-01
We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Dimensional assessment of personality pathology in patients with eating disorders.
Goldner, E M; Srikameswaran, S; Schroeder, M L; Livesley, W J; Birmingham, C L
1999-02-22
This study examined patients with eating disorders on personality pathology using a dimensional method. Female subjects who met DSM-IV diagnostic criteria for eating disorder (n = 136) were evaluated and compared to an age-controlled general population sample (n = 68). We assessed 18 features of personality disorder with the Dimensional Assessment of Personality Pathology - Basic Questionnaire (DAPP-BQ). Factor analysis and cluster analysis were used to derive three clusters of patients. A five-factor solution was obtained with limited intercorrelation between factors. Cluster analysis produced three clusters with the following characteristics: Cluster 1 members (constituting 49.3% of the sample and labelled 'rigid') had higher mean scores on factors denoting compulsivity and interpersonal difficulties; Cluster 2 (18.4% of the sample) showed highest scores in factors denoting psychopathy, neuroticism and impulsive features, and appeared to constitute a borderline psychopathology group; Cluster 3 (32.4% of the sample) was characterized by few differences in personality pathology in comparison to the normal population sample. Cluster membership was associated with DSM-IV diagnosis -- a large proportion of patients with anorexia nervosa were members of Cluster 1. An empirical classification of eating-disordered patients derived from dimensional assessment of personality pathology identified three groups with clinical relevance.
ERIC Educational Resources Information Center
Firdausiah Mansur, Andi Besse; Yusof, Norazah
2013-01-01
Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…
Tang, Haijing; Wang, Siye; Zhang, Yanjun
2013-01-01
Clustering has become a common trend in very long instruction words (VLIW) architecture to solve the problem of area, energy consumption, and design complexity. Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption penalty caused by explicit inter-cluster data move operations in traditional bus-connected clustered (BCC) VLIW architecture. However, the limit number of access ports to the global register file has become an issue which must be well addressed; otherwise the performance and energy consumption would be harmed. In this paper, we presented compiler optimization techniques for an RFCC VLIW architecture called Lily, which is designed for encryption systems. These techniques aim at optimizing performance and energy consumption for Lily architecture, through appropriate manipulation of the code generation process to maintain a better management of the accesses to the global register file. All the techniques have been implemented and evaluated. The result shows that our techniques can significantly reduce the penalty of performance and energy consumption due to access port limitation of global register file. PMID:23970841
NASA Astrophysics Data System (ADS)
Wheeler, K. I.; Levia, D. F., Jr.; Hudson, J. E.
2017-12-01
As trees undergo autumnal processes such as resorption, senescence, and leaf abscission, the dissolved organic matter (DOM) contribution of leaf litter leachate to streams changes. However, little research has investigated how the fluorescent DOM (FDOM) changes throughout the autumn and how this differs inter- and intraspecifically. Two of the major impacts of global climate change on forested ecosystems include altering phenology and causing forest community species and subspecies composition restructuring. We examined changes in FDOM in leachate from American beech (Fagus grandifolia Ehrh.) leaves in Maryland, Rhode Island, Vermont, and North Carolina and yellow poplar (Liriodendron tulipifera L.) leaves from Maryland throughout three different phenophases: green, senescing, and freshly abscissed. Beech leaves from Maryland and Rhode Island have previously been identified as belonging to the same distinct genetic cluster and beech trees from Vermont and the study site in North Carolina from the other. FDOM in samples was characterized using excitation-emission matrices (EEMs) and a six-component parallel factor analysis (PARAFAC) model was created to identify components. Self-organizing maps (SOMs) were used to visualize variation and patterns in the PARAFAC component proportions of the leachate samples. Phenophase and species had the greatest influence on determining where a sample mapped on the SOM when compared to genetic clusters and geographic origin. Throughout senescence, FDOM from all the trees transitioned from more protein-like components to more humic-like ones. Percent greenness of the sampled leaves and the proportion of the tyrosine-like component 1 were found to significantly differ between the two genetic beech clusters. This suggests possible differences in photosynthesis and resorption between the two genetic clusters of beech. The use of SOMs to visualize differences in patterns of senescence between the different species and genetic populations proved to be useful in ways that other multivariate analysis techniques lack.
The Origin of B-type Runaway Stars: Non-LTE Abundances as a Diagnostic
DOE Office of Scientific and Technical Information (OSTI.GOV)
McEvoy, Catherine M.; Dufton, Philip L.; Smoker, Jonathan V.
There are two accepted mechanisms to explain the origin of runaway OB-type stars: the binary supernova (SN) scenario and the cluster ejection scenario. In the former, an SN explosion within a close binary ejects the secondary star, while in the latter close multibody interactions in a dense cluster cause one or more of the stars to be ejected from the region at high velocity. Both mechanisms have the potential to affect the surface composition of the runaway star. tlusty non-LTE model atmosphere calculations have been used to determine the atmospheric parameters and the C, N, Mg, and Si abundances formore » a sample of B-type runaways. These same analytical tools were used by Hunter et al. for their analysis of 50 B-type open-cluster Galactic stars (i.e., nonrunaways). Effective temperatures were deduced using the Si-ionization balance technique, surface gravities from Balmer line profiles, and microturbulent velocities derived using the Si spectrum. The runaways show no obvious abundance anomalies when compared with stars in the open clusters. The runaways do show a spread in composition that almost certainly reflects the Galactic abundance gradient and a range in the birthplaces of the runaways in the Galactic disk. Since the observed Galactic abundance gradients of C, N, Mg, and Si are of a similar magnitude, the abundance ratios (e.g., N/Mg) are as obtained essentially uniform across the sample.« less
NASA Astrophysics Data System (ADS)
Fletcher, John S.; Henderson, Alexander; Jarvis, Roger M.; Lockyer, Nicholas P.; Vickerman, John C.; Goodacre, Royston
2006-07-01
Advances in time of flight secondary ion mass spectrometry (ToF-SIMS) have enabled this technique to become a powerful tool for the analysis of biological samples. Such samples are often very complex and as a result full interpretation of the acquired data can be extremely difficult. To simplify the interpretation of these information rich data, the use of chemometric techniques is becoming widespread in the ToF-SIMS community. Here we discuss the application of principal components-discriminant function analysis (PC-DFA) to the separation and classification of a number of bacterial samples that are known to be major causal agents of urinary tract infection. A large data set has been generated using three biological replicates of each isolate and three machine replicates were acquired from each biological replicate. Ordination plots generated using the PC-DFA are presented demonstrating strain level discrimination of the bacteria. The results are discussed in terms of biological differences between certain species and with reference to FT-IR, Raman spectroscopy and pyrolysis mass spectrometric studies of similar samples.
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
Hamchevici, Carmen; Udrea, Ion
2013-11-01
The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.
NASA Technical Reports Server (NTRS)
Menanteau, Felipe; Gonzalez, Jorge; Juin, Jean-Baptiste; Marriage, Tobias; Reese, Erik D.; Acquaviva, Viviana; Aguirre, Paula; Appel, John Willam; Baker, Andrew J.; Barrientos, L. Felipe;
2010-01-01
We present optical and X-ray properties for the first confirmed galaxy cluster sample selected by the Sunyaev-Zel'dovich Effect from 148 GHz maps over 455 square degrees of sky made with the Atacama Cosmology Telescope. These maps. coupled with multi-band imaging on 4-meter-class optical telescopes, have yielded a sample of 23 galaxy clusters with redshifts between 0.118 and 1.066. Of these 23 clusters, 10 are newly discovered. The selection of this sample is approximately mass limited and essentially independent of redshift. We provide optical positions, images, redshifts and X-ray fluxes and luminosities for the full sample, and X-ray temperatures of an important subset. The mass limit of the full sample is around 8.0 x 10(exp 14) Stellar Mass. with a number distribution that peaks around a redshift of 0.4. For the 10 highest significance SZE-selected cluster candidates, all of which are optically confirmed, the mass threshold is 1 x 10(exp 15) Stellar Mass and the redshift range is 0.167 to 1.066. Archival observations from Chandra, XMM-Newton. and ROSAT provide X-ray luminosities and temperatures that are broadly consistent with this mass threshold. Our optical follow-up procedure also allowed us to assess the purity of the ACT cluster sample. Eighty (one hundred) percent of the 148 GHz candidates with signal-to-noise ratios greater than 5.1 (5.7) are confirmed as massive clusters. The reported sample represents one of the largest SZE-selected sample of massive clusters over all redshifts within a cosmologically-significant survey volume, which will enable cosmological studies as well as future studies on the evolution, morphology, and stellar populations in the most massive clusters in the Universe.
Focusing cosmic telescopes: systematics of strong lens modeling
NASA Astrophysics Data System (ADS)
Johnson, Traci Lin; Sharon, Keren q.
2018-01-01
The use of strong gravitational lensing by galaxy clusters has become a popular method for studying the high redshift universe. While diverse in computational methods, lens modeling techniques have grasped the means for determining statistical errors on cluster masses and magnifications. However, the systematic errors have yet to be quantified, arising from the number of constraints, availablity of spectroscopic redshifts, and various types of image configurations. I will be presenting my dissertation work on quantifying systematic errors in parametric strong lensing techniques. I have participated in the Hubble Frontier Fields lens model comparison project, using simulated clusters to compare the accuracy of various modeling techniques. I have extended this project to understanding how changing the quantity of constraints affects the mass and magnification. I will also present my recent work extending these studies to clusters in the Outer Rim Simulation. These clusters are typical of the clusters found in wide-field surveys, in mass and lensing cross-section. These clusters have fewer constraints than the HFF clusters and thus, are more susceptible to systematic errors. With the wealth of strong lensing clusters discovered in surveys such as SDSS, SPT, DES, and in the future, LSST, this work will be influential in guiding the lens modeling efforts and follow-up spectroscopic campaigns.
Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ling; Lee, Doris; Sim, Alex
Current practice in whole time series clustering of residential meter data focuses on aggregated or subsampled load data at the customer level, which ignores day-to-day differences within customers. This information is critical to determine each customer’s suitability to various demand side management strategies that support intelligent power grids and smart energy management. Clustering daily load shapes provides fine-grained information on customer attributes and sources of variation for subsequent models and customer segmentation. In this paper, we apply 11 clustering methods to daily residential meter data. We evaluate their parameter settings and suitability based on 6 generic performance metrics and post-checkingmore » of resulting clusters. Finally, we recommend suitable techniques and parameters based on the goal of discovering diverse daily load patterns among residential customers. To the authors’ knowledge, this paper is the first robust comparative review of clustering techniques applied to daily residential load shape time series in the power systems’ literature.« less
Olives, Casey; Pagano, Marcello; Deitchler, Megan; Hedt, Bethany L; Egge, Kari; Valadez, Joseph J
2009-01-01
Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis. PMID:20011037
NASA Astrophysics Data System (ADS)
Marinoni, Christian; Davis, Marc; Newman, Jeffrey A.; Coil, Alison L.
2002-11-01
We have developed a new geometrical method for identifying and reconstructing a homogeneous and highly complete set of galaxy groups within flux-limited redshift surveys. Our method combines information from the three-dimensional Voronoi diagram and its dual, the Delaunay triangulation, to obtain group and cluster catalogs that are remarkably robust over wide ranges in redshift and degree of density enhancement. As free by-products, this Voronoi-Delaunay method (VDM) provides a nonparametric measurement of the galaxy density around each object observed and a quantitative measure of the distribution of cosmological voids in the survey volume. In this paper, we describe the VDM algorithm in detail and test its effectiveness using a family of mock catalogs that simulate the Deep Extragalactic Evolutionary Probe (DEEP2) Redshift Survey, which should present at least as much challenge to cluster reconstruction methods as any other near-future survey that is capable of resolving their velocity dispersions. Using these mock DEEP2 catalogs, we demonstrate that the VDM algorithm can be used to identify a homogeneous set of groups in a magnitude-limited sample throughout the survey redshift window 0.7
An algol program for dissimilarity analysis: a divisive-omnithetic clustering technique
Tipper, J.C.
1979-01-01
Clustering techniques are used properly to generate hypotheses about patterns in data. Of the hierarchical techniques, those which are divisive and omnithetic possess many theoretically optimal properties. One such method, dissimilarity analysis, is implemented here in ALGOL 60, and determined to be competitive computationally with most other methods. ?? 1979.
High-redshift Luminous Red Galaxies clustering analysis in SDSS Stripe82
NASA Astrophysics Data System (ADS)
Nikoloudakis, N.
2012-01-01
We have measured the clustering of Luminous Red Galaxies in Stripe 82 using the angular correlation function. We have selected 130000 LRGs via colour cuts in R-I:I-K with the K band data coming from UKIDSS LAS. We have used the cross-correlation technique of Newman (2008) to establish the redshift distribution of the LRGs as a function of colour cut, cross-correlating the LRGs with SDSS QSOs, DEEP2 and VVDS galaxies. We also used the AUS LRG redshift survey to establish the n(z) at z<1. We then compare the w(theta) results to the results of Sawangwit et al (2010) from 3 samples of SDSS LRGs at lower redshift to measure the dependence of clustering on redshift and LRG luminosity. We have compared the results for luminosity-matched LRG samples with simple evolutionary models, such as those expected from long-lived, passive models for LRGs and for the HOD models of Wake et al (2009) and find that the long-lived model may be a poorer fit than at lower redshifts. We find some evidence for evolution in the LRG correlation function slope in that the 2-halo term appears to flatten in slope at z>1. We present arguments that this is not caused by systematics.
Image quality guided approach for adaptive modelling of biometric intra-class variations
NASA Astrophysics Data System (ADS)
Abboud, Ali J.; Jassim, Sabah A.
2010-04-01
The high intra-class variability of acquired biometric data can be attributed to several factors such as quality of acquisition sensor (e.g. thermal), environmental (e.g. lighting), behavioural (e.g. change face pose). Such large fuzziness of biometric data can cause a big difference between an acquired and stored biometric data that will eventually lead to reduced performance. Many systems store multiple templates in order to account for such variations in the biometric data during enrolment stage. The number and typicality of these templates are the most important factors that affect system performance than other factors. In this paper, a novel offline approach is proposed for systematic modelling of intra-class variability and typicality in biometric data by regularly selecting new templates from a set of available biometric images. Our proposed technique is a two stage algorithm whereby in the first stage image samples are clustered in terms of their image quality profile vectors, rather than their biometric feature vectors, and in the second stage a per cluster template is selected from a small number of samples in each clusters to create an ultimate template sets. These experiments have been conducted on five face image databases and their results will demonstrate the effectiveness of proposed quality guided approach.
Pollen assemblages as paleoenvironmental proxies in the Florida Everglades
Willard, D.A.; Weimer, L.M.; Riegel, W.L.
2001-01-01
Analysis of 170 pollen assemblages from surface samples in eight vegetation types in the Florida Everglades indicates that these wetland sub-environments are distinguishable from the pollen record and that they are useful proxies for hydrologic and edaphic parameters. Vegetation types sampled include sawgrass marshes, cattail marshes, sloughs with floating aquatics, wet prairies, brackish marshes, tree islands, cypress swamps, and mangrove forests. The distribution of these vegetation types is controlled by specific environmental parameters, such as hydrologic regime, nutrient availability, disturbance level, substrate type, and salinity; ecotones between vegetation types may be sharp. Using R-mode cluster analysis of pollen data, we identified diagnostic species groupings; Q-mode cluster analysis was used to differentiate pollen signatures of each vegetation type. Cluster analysis and the modern analog technique were applied to interpret vegetational and environmental trends over the last two millennia at a site in Water Conservation Area 3A. The results show that close modern analogs exist for assemblages in the core and indicate past hydrologic changes at the site, correlated with both climatic and land-use changes. The ability to differentiate marshes with different hydrologic and edaphic requirements using the pollen record facilitates assessment of relative impacts of climatic and anthropogenic changes on this wetland ecosystem on smaller spatial and temporal scales than previously were possible. ?? 2001 Elsevier Science B.V.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Shape analysis of H II regions - I. Statistical clustering
NASA Astrophysics Data System (ADS)
Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred
2018-07-01
We present here our shape analysis method for a sample of 76 Galactic H II regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation are linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorize H II regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionized by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilizing synthetic observations from numerical simulations of H II regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.
Shape Analysis of HII Regions - I. Statistical Clustering
NASA Astrophysics Data System (ADS)
Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred
2018-04-01
We present here our shape analysis method for a sample of 76 Galactic HII regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation is linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorise HII regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionised by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilising synthetic observations from numerical simulations of HII regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.
NASA Astrophysics Data System (ADS)
Xia, Shuang; Li, Hui; Liu, Ting Guang; Zhou, Bang Xin
2011-09-01
The feasibility of applying the grain boundary engineering (GBE) processing to Alloy 690 tube manufacturing for improving the intergranular corrosion resistance was studied. Through small amount of deformation by cold drawing using a draw-bench on a production line and subsequent short time annealing at high temperature, the proportion of low Σ coincidence site lattice (CSL) grain boundaries of the Alloy 690 tube can be enhanced to about 75% which mainly were of Σ3 n ( n = 1, 2, 3, …) type. In this case, the grain boundary network (GBN) was featured by the formation of highly twinned large size grain-clusters produced by multiple twinning during recrystallization. All of the grains inside this kind of cluster had Σ3 n mutual misorientations, and hence all the boundaries inside the cluster were of Σ3 n type and formed many interconnected Σ3 n type triple junctions. The weight losses due to grain dropping during intergranular corrosion for the samples with the modified GBN were much less than that with conventional microstructure. Based on the characterization by scanning electron microscopy (SEM) and electron backscatter diffraction (EBSD) technique, it was shown that the highly twinned large size grain-cluster microstructure played a key role in enhancing the intergranular corrosion resistance: (1) the large grain-cluster can arrest the penetration of intergranular corrosion; (2) the large grain-cluster can protect the underlying microstructure.
Basati, Zahra; Jamshidi, Bahareh; Rasekh, Mansour; Abbaspour-Gilandeh, Yousef
2018-05-30
The presence of sunn pest-damaged grains in wheat mass reduces the quality of flour and bread produced from it. Therefore, it is essential to assess the quality of the samples in collecting and storage centers of wheat and flour mills. In this research, the capability of visible/near-infrared (Vis/NIR) spectroscopy combined with pattern recognition methods was investigated for discrimination of wheat samples with different percentages of sunn pest-damaged. To this end, various samples belonging to five classes (healthy and 5%, 10%, 15% and 20% unhealthy) were analyzed using Vis/NIR spectroscopy (wavelength range of 350-1000 nm) based on both supervised and unsupervised pattern recognition methods. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) as the unsupervised techniques and soft independent modeling of class analogies (SIMCA) and partial least squares-discriminant analysis (PLS-DA) as supervised methods were used. The results showed that Vis/NIR spectra of healthy samples were correctly clustered using both PCA and HCA. Due to the high overlapping between the four unhealthy classes (5%, 10%, 15% and 20%), it was not possible to discriminate all the unhealthy samples in individual classes. However, when considering only the two main categories of healthy and unhealthy, an acceptable degree of separation between the classes can be obtained after classification with supervised pattern recognition methods of SIMCA and PLS-DA. SIMCA based on PCA modeling correctly classified samples in two classes of healthy and unhealthy with classification accuracy of 100%. Moreover, the power of the wavelengths of 839 nm, 918 nm and 995 nm were more than other wavelengths to discriminate two classes of healthy and unhealthy. It was also concluded that PLS-DA provides excellent classification results of healthy and unhealthy samples (R 2 = 0.973 and RMSECV = 0.057). Therefore, Vis/NIR spectroscopy based on pattern recognition techniques can be useful for rapid distinguishing the healthy wheat samples from those damaged by sunn pest in the maintenance and processing centers. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Sambeka, Yana; Nahadi, Sriyati, Siti
2017-05-01
The study aimed to obtain the scientific information about increase of student's concept mastering in project based learning that used authentic assessment. The research was conducted in May 2016 at one of junior high school in Bandung in the academic year of 2015/2016. The research method was weak experiment with the one-group pretest-posttest design. The sample was taken by random cluster sampling technique and the sample was 24 students. Data collected through instruments, i.e. written test, observation sheet, and questionnaire sheet. Student's concept mastering test obtained N-Gain of 0.236 with the low category. Based on the result of paired sample t-test showed that implementation of authentic assessment in the project based learning increased student's concept mastering significantly, (sig<0.05).
Navigating complex sample analysis using national survey data.
Saylor, Jennifer; Friedmann, Erika; Lee, Hyeon Joo
2012-01-01
The National Center for Health Statistics conducts the National Health and Nutrition Examination Survey and other national surveys with probability-based complex sample designs. Goals of national surveys are to provide valid data for the population of the United States. Analyses of data from population surveys present unique challenges in the research process but are valuable avenues to study the health of the United States population. The aim of this study was to demonstrate the importance of using complex data analysis techniques for data obtained with complex multistage sampling design and provide an example of analysis using the SPSS Complex Samples procedure. Illustration of challenges and solutions specific to secondary data analysis of national databases are described using the National Health and Nutrition Examination Survey as the exemplar. Oversampling of small or sensitive groups provides necessary estimates of variability within small groups. Use of weights without complex samples accurately estimates population means and frequency from the sample after accounting for over- or undersampling of specific groups. Weighting alone leads to inappropriate population estimates of variability, because they are computed as if the measures were from the entire population rather than a sample in the data set. The SPSS Complex Samples procedure allows inclusion of all sampling design elements, stratification, clusters, and weights. Use of national data sets allows use of extensive, expensive, and well-documented survey data for exploratory questions but limits analysis to those variables included in the data set. The large sample permits examination of multiple predictors and interactive relationships. Merging data files, availability of data in several waves of surveys, and complex sampling are techniques used to provide a representative sample but present unique challenges. In sophisticated data analysis techniques, use of these data is optimized.
Cross-correlating the γ-ray Sky with Catalogs of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro; Fornengo, Nicolao; Regis, Marco; Viel, Matteo; Xia, Jun-Qing
2017-01-01
We report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ-ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few to tens of megaparsecs, I.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, I.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ-ray emission from the intracluster medium. We argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.
NASA Astrophysics Data System (ADS)
Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Grasha, Kathryn; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Ubeda, Leonardo; Zackrisson, Erik
2015-10-01
We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.
The Observations of Redshift Evolution in Large Scale Environments (ORELSE) Survey
NASA Astrophysics Data System (ADS)
Squires, Gordon K.; Lubin, L. M.; Gal, R. R.
2007-05-01
We present the motivation, design, and latest results from the Observations of Redshift Evolution in Large Scale Environments (ORELSE) Survey, a systematic search for structure on scales greater than 10 Mpc around 20 known galaxy clusters at z > 0.6. When complete, the survey will cover nearly 5 square degrees, all targeted at high-density regions, making it complementary and comparable to field surveys such as DEEP2, GOODS, and COSMOS. For the survey, we are using the Large Format Camera on the Palomar 5-m and SuPRIME-Cam on the Subaru 8-m to obtain optical/near-infrared imaging of an approximately 30 arcmin region around previously studied high-redshift clusters. Colors are used to identify likely member galaxies which are targeted for follow-up spectroscopy with the DEep Imaging Multi-Object Spectrograph on the Keck 10-m. This technique has been used to identify successfully the Cl 1604 supercluster at z = 0.9, a large scale structure containing at least eight clusters (Gal & Lubin 2004; Gal, Lubin & Squires 2005). We present the most recent structures to be photometrically and spectroscopically confirmed through this program, discuss the properties of the member galaxies as a function of environment, and describe our planned multi-wavelength (radio, mid-IR, and X-ray) observations of these systems. The goal of this survey is to identify and examine a statistical sample of large scale structures during an active period in the assembly history of the most massive clusters. With such a sample, we can begin to constrain large scale cluster dynamics and determine the effect of the larger environment on galaxy evolution.
Insights into quasar UV spectra using unsupervised clustering analysis
NASA Astrophysics Data System (ADS)
Tammour, A.; Gallagher, S. C.; Daley, M.; Richards, G. T.
2016-06-01
Machine learning techniques can provide powerful tools to detect patterns in multidimensional parameter space. We use K-means - a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabelled data - to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey (SDSS-DR10) of Paris et al. Detecting patterns in large data sets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 Å line, the C IV 1549 Å line, and the C III] 1908 Å blend in samples of broad absorption line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing spectra energy distribution (SED) probed by the strength of He II and the Si III]/C III] ratio. We find this to be particularly evident when the properties of C III] are used to find the clusters, while those of Mg II proved to be less strongly correlated with the properties of the other lines in the spectra such as the width of C IV or the Si III]/C III] ratio. We conclude that unsupervised clustering methods (such as K-means) are powerful methods for finding `natural' binning boundaries in multidimensional data sets and discuss caveats and future work.
Clustering: An Interactive Technique to Enhance Learning in Biology.
ERIC Educational Resources Information Center
Ambron, Joanna
1988-01-01
Explains an interdisciplinary approach to biology and writing which increases students' mastery of vocabulary, scientific concepts, creativity, and expression. Describes modifications of the clustering technique used to summarize lectures, integrate reading and understand textbook material. (RT)
Unsupervised color image segmentation using a lattice algebra clustering technique
NASA Astrophysics Data System (ADS)
Urcid, Gonzalo; Ritter, Gerhard X.
2011-08-01
In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.
Re-estimating sample size in cluster randomised trials with active recruitment within clusters.
van Schie, S; Moerbeek, M
2014-08-30
Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Zhu, Te; Jin, Shuoxue; Zhang, Peng; Song, Ligang; Lian, Xiangyu; Fan, Ping; Zhang, Qiaoli; Yuan, Daqing; Wu, Haibiao; Yu, Runsheng; Cao, Xingzhong; Xu, Qiu; Wang, Baoyi
2018-07-01
The formation of helium bubble precursors, i.e., helium-vacancy complexes, was investigated for Fe9Cr alloy, which was uniformly irradiated by using 100 keV helium ions with fluences up to 5 × 1016 ions/cm2 at RT, 523, 623, 723, and 873 K. Helium-irradiation-induced microstructures in the alloy were probed by positron annihilation technique. The results show that the ratio of helium atom to vacancy (m/n) in the irradiation induced HemVn clusters is affected by the irradiation temperature. Irradiated at room temperature, there is a coexistence of large amounts of HemV1 and mono-vacancies in the sample. However, the overpressured HemVn (m > n) clusters or helium bubbles are easily formed by the helium-filled vacancy clusters (HemV1 and HemVn (m ≈ n)) absorbing helium atoms when irradiated at 523 K and 823 K. The results also show that void swelling of the alloy is the largest under 723 K irradiation.
Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, withi...
Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang
2017-10-30
In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.
NASA Astrophysics Data System (ADS)
Fischer, P.
1997-12-01
Weak distortions of background galaxies are rapidly emerging as a powerful tool for the measurement of galaxy cluster mass distributions. Lensing based studies have the advantage of being direct measurements of mass and are not model-dependent as are other techniques (X-ray, radial velocities). To date studies have been limited by CCD field size meaning that full coverage of the clusters out to the virial radii and beyond has not been possible. Probing this large radius region is essential for testing models of large scale structure formation. New wide field CCD mosaics, for the first time, allow mass measurements out to very large radius. We have obtained images for a sample of clusters with the ``Big Throughput Camera'' (BTC) on the CTIO 4m. This camera comprises four thinned SITE 2048(2) CCDs, each 15arcmin on a side for a total area of one quarter of a square degree. We have developed an automated reduction pipeline which: 1) corrects for spatial distortions, 2) corrects for PSF anisotropy, 3) determines relative scaling and background levels, and 4) combines multiple exposures. In this poster we will present some preliminary results of our cluster lensing study. This will include radial mass and light profiles and 2-d mass and galaxy density maps.
NASA Technical Reports Server (NTRS)
Ballew, G.
1977-01-01
The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed using Johnson's HICLUS program. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.
Cluster Stability Estimation Based on a Minimal Spanning Trees Approach
NASA Astrophysics Data System (ADS)
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora
2009-08-01
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.
2-Way k-Means as a Model for Microbiome Samples.
Jackson, Weston J; Agarwal, Ipsita; Pe'er, Itsik
2017-01-01
Motivation . Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k -means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project.
2-Way k-Means as a Model for Microbiome Samples
2017-01-01
Motivation. Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k-means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project. PMID:29177026
Ages of Extragalactic Intermediate-Age Star Clusters
NASA Technical Reports Server (NTRS)
Flower, P. J.
1983-01-01
A dating technique for faint, distant star clusters observable in the local group of galaxies with the space telescope is discussed. Color-magnitude diagrams of Magellanic Cloud clusters are mentioned along with the metallicity of star clusters.
NASA Astrophysics Data System (ADS)
Dias, B.; Barbuy, B.; Saviane, I.; Held, E. V.; Da Costa, G. S.; Ortolani, S.; Gullieuszik, M.; Vásquez, S.
2016-05-01
Context. Globular clusters trace the formation and evolution of the Milky Way and surrounding galaxies, and outline their chemical enrichment history. To accomplish these tasks it is important to have large samples of clusters with homogeneous data and analysis to derive kinematics, chemical abundances, ages and locations. Aims: We obtain homogeneous metallicities and α-element enhancement for 51 Galactic bulge, disc, and halo globular clusters that are among the most distant and/or highly reddened in the Galaxy's globular cluster system. We also provide membership selection based on stellar radial velocities and atmospheric parameters. The implications of our results are discussed. Methods: We observed R ~ 2000 spectra in the wavelength interval 456-586 nm for over 800 red giant stars in 51 Galactic globular clusters. We applied full spectrum fitting with the code ETOILE together with libraries of observed and synthetic spectra. We compared the mean abundances of all clusters with previous work and with field stars. We used the relation between mean metallicity and horizontal branch morphology defined by all clusters to select outliers for discussion. Results: [Fe/H], [Mg/Fe], and [α/Fe] were derived in a consistent way for almost one-third of all Galactic globular clusters. We find our metallicities are comparable to those derived from high-resolution data to within σ = 0.08 dex over the interval -2.5< [Fe/H] < 0.0. Furthermore, a comparison of previous metallicity scales with our values yields σ< 0.16 dex. We also find that the distribution of [Mg/Fe] and [α/Fe] with [Fe/H] for the 51 clusters follows the general trend exhibited by field stars. It is the first time that the following clusters have been included in a large sample of homogeneous stellar spectroscopic observations and metallicity derivation: BH 176, Djorg 2, Pal 10, NGC 6426, Lynga 7, and Terzan 8. In particular, only photometric metallicities were available previously for the first three clusters, and the available metallicity for NGC 6426 was based on integrated spectroscopy and photometry. Two other clusters, HP 1 and NGC 6558, are confirmed as candidates for the oldest globular clusters in the Milky Way. Conclusions: Stellar spectroscopy in the visible at R ~ 2000 for a large sample of globular clusters is a robust and efficient way to trace the chemical evolution of the host galaxy and to detect interesting objects for follow-up at higher resolution and with forthcoming giant telescopes. The technique used here can also be applied to globular cluster systems in nearby galaxies with current instruments and to distant galaxies with the advent of ELTs. Based on observations collected at the European Southern Observatory/Paranal, Chile, under programmes 68.B-0482(A), 69.D-0455(A), 71.D-0219(A), 077.D-0775(A), and 089.D-0493(B).Full Tables 1 and A.2 with the derived average parameters for the 758 red giant stars are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/590/A9
The structure of deposited metal clusters generated by laser evaporation
NASA Astrophysics Data System (ADS)
Faust, P.; Brandstättner, M.; Ding, A.
1991-09-01
Metal clusters have been produced using a laser evaporation source. A Nd-YAG laser beam focused onto a solid silver rod was used to evaporate the material, which was then cooled to form clusters with the help of a pulsed high pressure He beam. TOF mass spectra of these clusters reveal a strong occurrence of small and medium sized clusters ( n<100). Clusters were also deposited onto grid supported thin layers of carbon-films which were investigated by transmission electron microscopy. Very high resolution pictures of these grids were used to analyze the size distribution and the structure of the deposited clusters. The diffraction pattern caused by crystalline structure of the clusters reveals 3-and 5-fold symmetries as well as fcc bulk structure. This can be explained in terms of icosahedron and cuboctahedron type clusters deposited on the surface of the carbon layer. There is strong evidence that part of these cluster geometries had already been formed before the depostion process. The non-linear dependence of the cluster size and the cluster density on the generating conditions is discussed. Therefore the samples were observed in HREM in the stable DEEKO 100 microscope of the Fritz-Haber-Institut operating at 100 KV with the spherical aberration c S =0.5 mm. The quality of the pictures was improved by using the conditions of minimum phase contrast hollow cone illumination. This procedure led to a minimum of phase contrast artefacts. Among the well-crystallized particles were a great amount of five- and three-fold symmetries, icosahedra and cuboctahedra respectively. The largest clusters with five- and three-fold symmetries have been found with diameters of 7 nm; the smallest particles displaying the same undistorted symmetries were of about 2 mm. Even smaller ones with strong distortions could be observed although their classification is difficult. The quality of the images was improved by applying Fourier filtering techniques.
X-Ray Morphological Analysis of the Planck ESZ Clusters
NASA Astrophysics Data System (ADS)
Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph
2017-09-01
X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev-Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev-Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.
X-Ray Morphological Analysis of the Planck ESZ Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lovisari, Lorenzo; Forman, William R.; Jones, Christine
2017-09-01
X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper wemore » determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.« less
Macromolecular structure of coals. 6. Mass spectroscopic analysis of coal-derived liquids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hooker, D.T.; Lucht, L.M.; Peppas, N.A.
1986-02-01
The macromolecular structure of coal networks was analyzed by depolymerizing coal samples using the Sternberg reductive alkylation and the Miyake alkylation techniques. Electron impact mass spectra showed peaks of greater abundance of 125-132, 252-260, 383-391, and 511-520 m/z ratios. Based on analysis of the patterns of the spectra, the cluster size of the cross-linked structure of bituminous coals was determined as 126-130. Various chemical species were identified.
Characterizing decision-making and reward processing in bipolar disorder: A cluster analysis.
Jiménez, E; Solé, B; Arias, B; Mitjans, M; Varo, C; Reinares, M; Bonnín, C M; Salagre, E; Ruíz, V; Torres, I; Tomioka, Y; Sáiz, P A; García-Portilla, M P; Burón, P; Bobes, J; Martínez-Arán, A; Torrent, C; Vieta, E; Benabarre, A
2018-05-25
The presence of abnormalities in emotional decision-making and reward processing among bipolar patients (BP) has been well rehearsed. These disturbances are not limited to acute phases and are common even during remission. In recent years, the existence of discrete cognitive profiles in this psychiatric population has been replicated. However, emotional decision making and reward processing domains have barely been studied. Therefore, our aim was to explore the existence of different profiles on the aforementioned cognitive dimensions in BP. The sample consisted of 126 euthymic BP. Main sociodemographic, clinical, functioning, and neurocognitive variables were gathered. A hierarchical-clustering technique was used to identify discrete neurocognitive profiles based on the performance in the Iowa Gambling Task. Afterward, the resulting clusters were compared using ANOVA or Chi-squared Test, as appropriate. Evidence for the existence of three different profiles was provided. Cluster 1 was mainly characterized by poor decision ability. Cluster 2 presented the lowest sensitivity to punishment. Finally, cluster 3 presented the best decision-making ability and the highest levels of punishment sensitivity. Comparison between the three clusters indicated that cluster 2 was the most functionally impaired group. The poorest outcomes in attention, executive function domains, and social cognition were also observed within the same group. In conclusion, similarly to that observed in "cold cognitive" domains, our results suggest the existence of three discrete cognitive profiles concerning emotional decision making and reward processing. Amongst all the indexes explored, low punishment sensitivity emerge as a potential correlate of poorer cognitive and functional outcomes in bipolar disorder. Copyright © 2018 Elsevier B.V. and ECNP. All rights reserved.
Yu, Byong Yong; Kwak, Seung-Yeop
2011-10-21
Based on a self-assembly strategy, spherical mesoporous cobalt and nickel ferrite nanocrystal clusters with a large surface area and narrow size distribution were successfully synthesized for the first time via a template-free solvothermal process in ethylene glycol and subsequent heat treatment. In this work, the mesopores in the ferrite clusters were derived mainly from interior voids between aggregated primary nanoparticles (with crystallite size of less than 7 nm) and disordered particle packing domains. The concentration of sodium acetate is shown herein to play a crucial role in the formation of mesoporous ferrite spherical clusters. These ferrite clusters were characterized in detail using wide-angle X-ray diffraction, thermogravimetric-differential thermal analysis, (57)Fe Mössbauer spectroscopy, X-ray photoelectron spectroscopy, field-emission scanning electron microscopy, standard and high-resolution transmission electron microscopy, and other techniques. The results confirmed the formation of both pure-phase ferrite clusters with highly crystalline spinel structure, uniform size (about 160 nm) and spherical morphology, and worm-like mesopore structures. The BET specific surface areas and mean pore sizes of the mesoporous Co and Ni-ferrite clusters were as high as 160 m(2) g(-1) and 182 m(2) g(-1), and 7.91 nm and 6.87 nm, respectively. A model for the formation of the spherical clusters in our system is proposed on the basis of the results. The magnetic properties of both samples were investigated at 300 K, and it was found that these materials are superparamagnetic. This journal is © The Royal Society of Chemistry 2011
Sampling procedures for inventory of commercial volume tree species in Amazon Forest.
Netto, Sylvio P; Pelissari, Allan L; Cysneiros, Vinicius C; Bonazza, Marcelo; Sanquetta, Carlos R
2017-01-01
The spatial distribution of tropical tree species can affect the consistency of the estimators in commercial forest inventories, therefore, appropriate sampling procedures are required to survey species with different spatial patterns in the Amazon Forest. For this, the present study aims to evaluate the conventional sampling procedures and introduce the adaptive cluster sampling for volumetric inventories of Amazonian tree species, considering the hypotheses that the density, the spatial distribution and the zero-plots affect the consistency of the estimators, and that the adaptive cluster sampling allows to obtain more accurate volumetric estimation. We use data from a census carried out in Jamari National Forest, Brazil, where trees with diameters equal to or higher than 40 cm were measured in 1,355 plots. Species with different spatial patterns were selected and sampled with simple random sampling, systematic sampling, linear cluster sampling and adaptive cluster sampling, whereby the accuracy of the volumetric estimation and presence of zero-plots were evaluated. The sampling procedures applied to species were affected by the low density of trees and the large number of zero-plots, wherein the adaptive clusters allowed concentrating the sampling effort in plots with trees and, thus, agglutinating more representative samples to estimate the commercial volume.
Group investigation with scientific approach in mathematics learning
NASA Astrophysics Data System (ADS)
Indarti, D.; Mardiyana; Pramudya, I.
2018-03-01
The aim of this research is to find out the effect of learning model toward mathematics achievement. This research is quasi-experimental research. The population of research is all VII grade students of Karanganyar regency in the academic year of 2016/2017. The sample of this research was taken using stratified cluster random sampling technique. Data collection was done based on mathematics achievement test. The data analysis technique used one-way ANOVA following the normality test with liliefors method and homogeneity test with Bartlett method. The results of this research is the mathematics learning using Group Investigation learning model with scientific approach produces the better mathematics learning achievement than learning with conventional model on material of quadrilateral. Group Investigation learning model with scientific approach can be used by the teachers in mathematics learning, especially in the material of quadrilateral, which is can improve the mathematics achievement.
Adaptive Cluster Sampling for Forest Inventories
Francis A. Roesch
1993-01-01
Adaptive cluster sampling is shown to be a viable alternative for sampling forests when there are rare characteristics of the forest trees which are of interest and occur on clustered trees. The ideas of recent work in Thompson (1990) have been extended to the case in which the initial sample is selected with unequal probabilities. An example is given in which the...
Fast, reagentless and reliable screening of "white powders" during the bioterrorism hoaxes.
Włodarski, Maksymilian; Kaliszewski, Miron; Trafny, Elżbieta Anna; Szpakowska, Małgorzata; Lewandowski, Rafał; Bombalska, Aneta; Kwaśny, Mirosław; Kopczyński, Krzysztof; Mularczyk-Oliwa, Monika
2015-03-01
The classification of dry powder samples is an important step in managing the consequences of terrorist incidents. Fluorescence decays of these samples (vegetative bacteria, bacterial endospores, fungi, albumins and several flours) were measured with stroboscopic technique using an EasyLife LS system PTI. Three pulsed nanosecond LED sources, generating 280, 340 and 460nm were employed for samples excitation. The usefulness of a new 460nm light source for fluorescence measurements of dry microbial cells has been demonstrated. The principal component analysis (PCA) and hierarchical cluster analysis (HCA) have been used for classification of dry biological samples. It showed that the single excitation wavelength was not sufficient for differentiation of biological samples of diverse origin. However, merging fluorescence decays from two or three excitation wavelengths allowed classification of these samples. An experimental setup allowing the practical implementation of this method for the real time fluorescence decay measurement was designed. It consisted of the LED emitting nanosecond pulses at 280nm and two fast photomultiplier tubes (PMTs) for signal detection in two fluorescence bands simultaneously. The positive results of the dry powder samples measurements confirmed that the fluorescence decay-based technique could be a useful tool for fast classification of the suspected "white powders" performed by the first responders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Fine-scale phylogenetic architecture of a complex bacterial community.
Acinas, Silvia G; Klepac-Ceraj, Vanja; Hunt, Dana E; Pharino, Chanathip; Ceraj, Ivica; Distel, Daniel L; Polz, Martin F
2004-07-29
Although molecular data have revealed the vast scope of microbial diversity, two fundamental questions remain unanswered even for well-defined natural microbial communities: how many bacterial types co-exist, and are such types naturally organized into phylogenetically discrete units of potential ecological significance? It has been argued that without such information, the environmental function, population biology and biogeography of microorganisms cannot be rigorously explored. Here we address these questions by comprehensive sampling of two large 16S ribosomal RNA clone libraries from a coastal bacterioplankton community. We show that compensation for artefacts generated by common library construction techniques reveals fine-scale patterns of community composition. At least 516 ribotypes (unique rRNA sequences) were detected in the sample and, by statistical extrapolation, at least 1,633 co-existing ribotypes in the sampled population. More than 50% of the ribotypes fall into discrete clusters containing less than 1% sequence divergence. This pattern cannot be accounted for by interoperon variation, indicating a large predominance of closely related taxa in this community. We propose that such microdiverse clusters arise by selective sweeps and persist because competitive mechanisms are too weak to purge diversity from within them.
Candel, Math J J M; Van Breukelen, Gerard J P
2010-06-30
Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.
Searching for the 3.5 keV Line in the Stacked Suzaku Observations of Galaxy Clusters
NASA Technical Reports Server (NTRS)
Bulbul, Esra; Markevitch, Maxim; Foster, Adam; Miller, Eric; Bautz, Mark; Lowenstein, Mike; Randall, Scott W.; Smith, Randall K.
2016-01-01
We perform a detailed study of the stacked Suzaku observations of 47 galaxy clusters, spanning a redshift range of 0.01-0.45, to search for the unidentified 3.5 keV line. This sample provides an independent test for the previously detected line. We detect a 2sigma-significant spectral feature at 3.5 keV in the spectrum of the full sample. When the sample is divided into two subsamples (cool-core and non-cool core clusters), the cool-core subsample shows no statistically significant positive residuals at the line energy. A very weak (approx. 2sigma confidence) spectral feature at 3.5 keV is permitted by the data from the non-cool-core clusters sample. The upper limit on a neutrino decay mixing angle of sin(sup 2)(2theta) = 6.1 x 10(exp -11) from the full Suzaku sample is consistent with the previous detections in the stacked XMM-Newton sample of galaxy clusters (which had a higher statistical sensitivity to faint lines), M31, and Galactic center, at a 90% confidence level. However, the constraint from the present sample, which does not include the Perseus cluster, is in tension with previously reported line flux observed in the core of the Perseus cluster with XMM-Newton and Suzaku.
The Hubble Space Telescope Medium Deep Survey Cluster Sample: Methodology and Data
NASA Astrophysics Data System (ADS)
Ostrander, E. J.; Nichol, R. C.; Ratnatunga, K. U.; Griffiths, R. E.
1998-12-01
We present a new, objectively selected, sample of galaxy overdensities detected in the Hubble Space Telescope Medium Deep Survey (MDS). These clusters/groups were found using an automated procedure that involved searching for statistically significant galaxy overdensities. The contrast of the clusters against the field galaxy population is increased when morphological data are used to search around bulge-dominated galaxies. In total, we present 92 overdensities above a probability threshold of 99.5%. We show, via extensive Monte Carlo simulations, that at least 60% of these overdensities are likely to be real clusters and groups and not random line-of-sight superpositions of galaxies. For each overdensity in the MDS cluster sample, we provide a richness and the average of the bulge-to-total ratio of galaxies within each system. This MDS cluster sample potentially contains some of the most distant clusters/groups ever detected, with about 25% of the overdensities having estimated redshifts z > ~0.9. We have made this sample publicly available to facilitate spectroscopic confirmation of these clusters and help more detailed studies of cluster and galaxy evolution. We also report the serendipitous discovery of a new cluster close on the sky to the rich optical cluster Cl l0016+16 at z = 0.546. This new overdensity, HST 001831+16208, may be coincident with both an X-ray source and a radio source. HST 001831+16208 is the third cluster/group discovered near to Cl 0016+16 and appears to strengthen the claims of Connolly et al. of superclustering at high redshift.
Raina, Sunil Kumar; Mengi, Vijay; Singh, Gurdeep
2012-07-01
Breast feeding is universally and traditionally practicised in India. Experts advocate breast feeding as the best method of feeding young infants. To assess the role of various factors in determining colostrum feeding in block R. S. Pura of district Jammu. A stratified two-stage design with villages as the primary sampling unit and lactating mothers as secondary sampling unit. Villages were divided into different clusters on the basis of population and sampling units were selected by a simple random technique. Breastfeeding is almost universal in R. S. Pura. Differentials in discarding the first milk were not found to be important among various socioeconomic groups and the phenomenon appeared more general than specific.
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.
Kristunas, Caroline; Morris, Tom; Gray, Laura
2017-11-15
To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Kristunas, Caroline A; Smith, Karen L; Gray, Laura J
2017-03-07
The current methodology for sample size calculations for stepped-wedge cluster randomised trials (SW-CRTs) is based on the assumption of equal cluster sizes. However, as is often the case in cluster randomised trials (CRTs), the clusters in SW-CRTs are likely to vary in size, which in other designs of CRT leads to a reduction in power. The effect of an imbalance in cluster size on the power of SW-CRTs has not previously been reported, nor what an appropriate adjustment to the sample size calculation should be to allow for any imbalance. We aimed to assess the impact of an imbalance in cluster size on the power of a cross-sectional SW-CRT and recommend a method for calculating the sample size of a SW-CRT when there is an imbalance in cluster size. The effect of varying degrees of imbalance in cluster size on the power of SW-CRTs was investigated using simulations. The sample size was calculated using both the standard method and two proposed adjusted design effects (DEs), based on those suggested for CRTs with unequal cluster sizes. The data were analysed using generalised estimating equations with an exchangeable correlation matrix and robust standard errors. An imbalance in cluster size was not found to have a notable effect on the power of SW-CRTs. The two proposed adjusted DEs resulted in trials that were generally considerably over-powered. We recommend that the standard method of sample size calculation for SW-CRTs be used, provided that the assumptions of the method hold. However, it would be beneficial to investigate, through simulation, what effect the maximum likely amount of inequality in cluster sizes would be on the power of the trial and whether any inflation of the sample size would be required.
CORS BAADE-WESSELINK DISTANCE TO THE LMC NGC 1866 BLUE POPULOUS CLUSTER
DOE Office of Scientific and Technical Information (OSTI.GOV)
Molinaro, R.; Ripepi, V.; Marconi, M.
2012-03-20
We used optical, near-infrared photometry, and radial velocity data for a sample of 11 Cepheids belonging to the young LMC blue populous cluster NGC 1866 to estimate their radii and distances on the basis of the CORS Baade-Wesselink method. This technique, based on an accurate calibration of surface brightness as a function of (U - B), (V - K) colors, allows us to estimate, simultaneously, the linear radius and the angular diameter of Cepheid variables, and consequently to derive their distance. A rigorous error estimate on radii and distances was derived by using Monte Carlo simulations. Our analysis gives amore » distance modulus for NGC 1866 of 18.51 {+-} 0.03 mag, which is in agreement with several independent results.« less
The theory of variational hybrid quantum-classical algorithms
NASA Astrophysics Data System (ADS)
McClean, Jarrod R.; Romero, Jonathan; Babbush, Ryan; Aspuru-Guzik, Alán
2016-02-01
Many quantum algorithms have daunting resource requirements when compared to what is available today. To address this discrepancy, a quantum-classical hybrid optimization scheme known as ‘the quantum variational eigensolver’ was developed (Peruzzo et al 2014 Nat. Commun. 5 4213) with the philosophy that even minimal quantum resources could be made useful when used in conjunction with classical routines. In this work we extend the general theory of this algorithm and suggest algorithmic improvements for practical implementations. Specifically, we develop a variational adiabatic ansatz and explore unitary coupled cluster where we establish a connection from second order unitary coupled cluster to universal gate sets through a relaxation of exponential operator splitting. We introduce the concept of quantum variational error suppression that allows some errors to be suppressed naturally in this algorithm on a pre-threshold quantum device. Additionally, we analyze truncation and correlated sampling in Hamiltonian averaging as ways to reduce the cost of this procedure. Finally, we show how the use of modern derivative free optimization techniques can offer dramatic computational savings of up to three orders of magnitude over previously used optimization techniques.
X-ray and optical substructures of the DAFT/FADA survey clusters
NASA Astrophysics Data System (ADS)
Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.
2013-04-01
We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Colucci, Janet E.; Bernstein, Rebecca A.; Cameron, Scott A.
2011-07-01
In this paper, we refine our method for the abundance analysis of high-resolution spectroscopy of the integrated light of unresolved globular clusters (GCs). This method was previously demonstrated for the analysis of old (>10 Gyr) Milky Way (MW) GCs. Here, we extend the technique to young clusters using a training set of nine GCs in the Large Magellanic Cloud. Depending on the signal-to-noise ratio of the data, we use 20-100 Fe lines per cluster to successfully constrain the ages of old clusters to within a {approx}5 Gyr range, the ages of {approx}2 Gyr clusters to a 1-2 Gyr range, andmore » the ages of the youngest clusters (0.05-1 Gyr) to a {approx}200 Myr range. We also demonstrate that we can measure [Fe/H] in clusters with any age less than 12 Gyr with similar or only slightly larger uncertainties (0.1-0.25 dex) than those obtained for old MW GCs (0.1 dex); the slightly larger uncertainties are due to the rapid evolution in stellar populations at these ages. In this paper, we present only Fe abundances and ages. In the next paper in this series, we present our complete analysis of {approx}20 elements for which we are able to measure abundances. For several of the clusters in this sample, there are no high-resolution abundances in the literature from individual member stars; our results are the first detailed chemical abundances available. The spectra used in this paper were obtained at Las Campanas with the echelle on the du Pont Telescope and with the MIKE spectrograph on the Magellan Clay Telescope.« less
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.
Zhang, Xianchao; Liu, Han; Zhang, Xiaotong
2017-09-01
Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing algorithms in accuracy and efficiency. Copyright © 2017 Elsevier Ltd. All rights reserved.
LENR BEC Clusters on and below Wires through Cavitation and Related Techniques
NASA Astrophysics Data System (ADS)
Stringham, Roger; Stringham, Julie
2011-03-01
During the last two years I have been working on BEC cluster densities deposited just under the surface of wires, using cavitation, and other techniques. If I get the concentration high enough before the clusters dissipate, in addition to cold fusion related excess heat (and other effects, including helium-4 formation) I anticipate that it may be possible to initiate transient forms of superconductivity at room temperature.
Galaxy masses in large surveys: Connecting luminous and dark matter with weak lensing and kinematics
NASA Astrophysics Data System (ADS)
Reyes, Reinabelle
2011-01-01
Galaxy masses are difficult to determine because light traces stars and gas in a non-trivial way, and does not trace dark matter, which extends well beyond the luminous regions of galaxies. In this thesis, I use the most direct probes of dark matter available---weak gravitational lensing and galaxy kinematics---to trace the total mass in galaxies (and galaxy clusters) in large surveys. In particular, I use the large, homogeneous dataset from the Sloan Digital Sky Survey (SDSS), which provides spectroscopic redshifts for a large sample of galaxies at z ≲ 0.2 and imaging data to a depth of r < 22. By combining complementary probes, I am able to obtain robust observational constraints that cannot be obtained from any single technique alone. First, I use weak lensing of galaxy clusters to derive an optimal optical tracer of cluster mass, which was found to be a combination of cluster richness and the luminosity of the brightest cluster galaxy. Next, I combine weak lensing of luminous red galaxies with redshift distortions and clustering measurements to derive a robust probe of gravity on cosmological scales. Finally, I combine weak lensing with the kinematics of disk galaxies to constrain the total mass profile over several orders of magnitude. I derive a minimal-scatter relation between disk velocity and stellar mass (also known as the Tully-Fisher relation) that can be used, by construction, on a similarly-selected lens sample. Then, I combine this relation with halo mass measurements from weak lensing to place constraints on the ratio of the optical to virial velocities, as well as the ratio of halo to stellar masses, both as a function of stellar mass. These results will serve as inputs to and constraints on disk galaxy formation models, which will be explored in future work.
DICON: interactive visual analysis of multidimensional clusters.
Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin
2011-12-01
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
Techniques for spatio-temporal analysis of vegetation fires in the topical belt of Africa
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brivio, P.A.; Ober, G.; Koffi, B.
1995-12-31
Biomass burning of forests and savannas is a phenomenon of continental or even global proportions, capable of causing large scale environmental changes. Satellite space observations, in particular from NOAA-AVHRR GAC data, are the only source of information allowing one to document burning patterns at regional and continental scale and over long periods of time. This paper presents some techniques, such as clustering and rose-diagram, useful in the spatial-temporal analysis of satellite derived fires maps to characterize the evolution of spatial patterns of vegetation fires at regional scale. An automatic clustering approach is presented which enables one to describe and parameterizemore » spatial distribution of fire patterns at different scales. The problem of geographical distribution of vegetation fires with respect to some location of interest, point or line, is also considered and presented. In particular rose-diagrams are used to relate fires patterns to some reference point, as experimental sites of tropospheric chemistry measurements. Different temporal data-sets in the tropical belt of Africa, covering both Northern and Southern Hemisphere dry seasons, using these techniques were analyzed and showed very promising results when compared with data from rain chemistry studies at different sampling sites in the equatorial forest.« less
An improved initialization center k-means clustering algorithm based on distance and density
NASA Astrophysics Data System (ADS)
Duan, Yanling; Liu, Qun; Xia, Shuyin
2018-04-01
Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
High Prevalence of Intermediate Leptospira spp. DNA in Febrile Humans from Urban and Rural Ecuador.
Chiriboga, Jorge; Barragan, Verónica; Arroyo, Gabriela; Sosa, Andrea; Birdsell, Dawn N; España, Karool; Mora, Ana; Espín, Emilia; Mejía, María Eugenia; Morales, Melba; Pinargote, Carmina; Gonzalez, Manuel; Hartskeerl, Rudy; Keim, Paul; Bretas, Gustavo; Eisenberg, Joseph N S; Trueba, Gabriel
2015-12-01
Leptospira spp., which comprise 3 clusters (pathogenic, saprophytic, and intermediate) that vary in pathogenicity, infect >1 million persons worldwide each year. The disease burden of the intermediate leptospires is unclear. To increase knowledge of this cluster, we used new molecular approaches to characterize Leptospira spp. in 464 samples from febrile patients in rural, semiurban, and urban communities in Ecuador; in 20 samples from nonfebrile persons in the rural community; and in 206 samples from animals in the semiurban community. We observed a higher percentage of leptospiral DNA-positive samples from febrile persons in rural (64%) versus urban (21%) and semiurban (25%) communities; no leptospires were detected in nonfebrile persons. The percentage of intermediate cluster strains in humans (96%) was higher than that of pathogenic cluster strains (4%); strains in animal samples belonged to intermediate (49%) and pathogenic (51%) clusters. Intermediate cluster strains may be causing a substantial amount of fever in coastal Ecuador.
High Prevalence of Intermediate Leptospira spp. DNA in Febrile Humans from Urban and Rural Ecuador
Chiriboga, Jorge; Barragan, Verónica; Arroyo, Gabriela; Sosa, Andrea; Birdsell, Dawn N.; España, Karool; Mora, Ana; Espín, Emilia; Mejía, María Eugenia; Morales, Melba; Pinargote, Carmina; Gonzalez, Manuel; Hartskeerl, Rudy; Keim, Paul; Bretas, Gustavo; Eisenberg, Joseph N.S.
2015-01-01
Leptospira spp., which comprise 3 clusters (pathogenic, saprophytic, and intermediate) that vary in pathogenicity, infect >1 million persons worldwide each year. The disease burden of the intermediate leptospires is unclear. To increase knowledge of this cluster, we used new molecular approaches to characterize Leptospira spp. in 464 samples from febrile patients in rural, semiurban, and urban communities in Ecuador; in 20 samples from nonfebrile persons in the rural community; and in 206 samples from animals in the semiurban community. We observed a higher percentage of leptospiral DNA–positive samples from febrile persons in rural (64%) versus urban (21%) and semiurban (25%) communities; no leptospires were detected in nonfebrile persons. The percentage of intermediate cluster strains in humans (96%) was higher than that of pathogenic cluster strains (4%); strains in animal samples belonged to intermediate (49%) and pathogenic (51%) clusters. Intermediate cluster strains may be causing a substantial amount of fever in coastal Ecuador. PMID:26583534
Application of Artificial Intelligence For Euler Solutions Clustering
NASA Astrophysics Data System (ADS)
Mikhailov, V.; Galdeano, A.; Diament, M.; Gvishiani, A.; Agayan, S.; Bogoutdinov, Sh.; Graeva, E.; Sailhac, P.
Results of Euler deconvolution strongly depend on the selection of viable solutions. Synthetic calculations using multiple causative sources show that Euler solutions clus- ter in the vicinity of causative bodies even when they do not group densely about perimeter of the bodies. We have developed a clustering technique to serve as a tool for selecting appropriate solutions. The method RODIN, employed in this study, is based on artificial intelligence and was originally designed for problems of classification of large data sets. It is based on a geometrical approach to study object concentration in a finite metric space of any dimension. The method uses a formal definition of cluster and includes free parameters that facilitate the search for clusters of given proper- ties. Test on synthetic and real data showed that the clustering technique successfully outlines causative bodies more accurate than other methods of discriminating Euler solutions. In complicated field cases such as the magnetic field in the Gulf of Saint Malo region (Brittany, France), the method provides geologically insightful solutions. Other advantages of the clustering method application are: - Clusters provide solutions associated with particular bodies or parts of bodies permitting the analysis of different clusters of Euler solutions separately. This may allow computation of average param- eters for individual causative bodies. - Those measurements of the anomalous field that yield clusters also form dense clusters themselves. The application of cluster- ing technique thus outlines areas where the influence of different causative sources is more prominent. This allows one to focus on areas for reinterpretation, using different window sizes, structural indices and so on.
"A Richness Study of 14 Distant X-Ray Clusters from the 160 Square Degree Survey"
NASA Technical Reports Server (NTRS)
Jones, Christine; West, Donald (Technical Monitor)
2001-01-01
We have measured the surface density of galaxies toward 14 X-ray-selected cluster candidates at redshifts z(sub i) 0.46, and we show that they are associated with rich galaxy concentrations. These clusters, having X-ray luminosities of Lx(0.5-2 keV) approx. (0.5 - 2.6) x 10(exp 44) ergs/ sec are among the most distant and luminous in our 160 deg(exp 2) ROSAT Position Sensitive Proportional Counter cluster survey. We find that the clusters range between Abell richness classes 0 and 2 and have a most probable richness class of 1. We compare the richness distribution of our distant clusters to those for three samples of nearby clusters with similar X-ray luminosities. We find that the nearby and distant samples have similar richness distributions, which shows that clusters have apparently not evolved substantially in richness since redshift z=0.5. There is, however, a marginal tendency for the distant clusters to be slightly poorer than nearby clusters, although deeper multicolor data for a large sample would be required to confirm this trend. We compare the distribution of distant X-ray clusters in the L(sub X)-richness plane to the distribution of optically selected clusters from the Palomar Distant Cluster Survey. The optically selected clusters appear overly rich for their X-ray luminosities, when compared to X-ray-selected clusters. Apparently, X-ray and optical surveys do not necessarily sample identical mass concentrations at large redshifts. This may indicate the existence of a population of optically rich clusters with anomalously low X-ray emission, More likely, however, it reflects the tendency for optical surveys to select unvirialized mass concentrations, as might be expected when peering along large-scale filaments.
[A comparative study of maintenance services using the data-mining technique].
Cruz, Antonio M; Aguilera-Huertas, Wilmer A; Días-Mora, Darío A
2009-08-01
The main goal in this research was comparing two hospitals' maintenance service quality. One of them had a contract service; the other one had an in-house maintenance service. The authors followed the next stages when conducting this research: domain understanding, data characterisation and sample reduction, insight characterisation and building the TAT predictor. Multiple linear regression and clustering techniques were used for improving the efficiency of corrective maintenance tasks in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). The institution having an in-house maintenance service had better quality indicators than the contract maintenance service. There was lineal dependence between availability and service productivity.
The XXL survey XV: evidence for dry merger driven BCG growth in XXL-100-GC X-ray clusters
NASA Astrophysics Data System (ADS)
Lavoie, S.; Willis, J. P.; Démoclès, J.; Eckert, D.; Gastaldello, F.; Smith, G. P.; Lidman, C.; Adami, C.; Pacaud, F.; Pierre, M.; Clerc, N.; Giles, P.; Lieu, M.; Chiappetti, L.; Altieri, B.; Ardila, F.; Baldry, I.; Bongiorno, A.; Desai, S.; Elyiv, A.; Faccioli, L.; Gardner, B.; Garilli, B.; Groote, M. W.; Guennou, L.; Guzzo, L.; Hopkins, A. M.; Liske, J.; McGee, S.; Melnyk, O.; Owers, M. S.; Poggianti, B.; Ponman, T. J.; Scodeggio, M.; Spitler, L.; Tuffs, R. J.
2016-11-01
The growth of brightest cluster galaxies (BCGs) is closely related to the properties of their host cluster. We present evidence for dry mergers as the dominant source of BCG mass growth at z ≲ 1 in the XXL 100 brightest cluster sample. We use the global red sequence, Hα emission and mean star formation history to show that BCGs in the sample possess star formation levels comparable to field ellipticals of similar stellar mass and redshift. XXL 100 brightest clusters are less massive on average than those in other X-ray selected samples such as LoCuSS or HIFLUGCS. Few clusters in the sample display high central gas concentration, rendering inefficient the growth of BCGs via star formation resulting from the accretion of cool gas. Using measures of the relaxation state of their host clusters, we show that BCGs grow as relaxation proceeds. We find that the BCG stellar mass corresponds to a relatively constant fraction 1 per cent of the total cluster mass in relaxed systems. We also show that, following a cluster scale merger event, the BCG stellar mass lags behind the expected value from the Mcluster-MBCG relation but subsequently accretes stellar mass via dry mergers as the BCG and cluster evolve towards a relaxed state.
Clustering approaches to identifying gene expression patterns from DNA microarray data.
Do, Jin Hwan; Choi, Dong-Kug
2008-04-30
The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.
NASA Technical Reports Server (NTRS)
Sehgal, Neelima; Trac, Hy; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John W.; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard;
2010-01-01
We present constraints on cosmological parameters based on a sample of Sunyaev-Zel'dovich-selected galaxy clusters detected in a millimeter-wave survey by the Atacama Cosmology Telescope. The cluster sample used in this analysis consists of 9 optically-confirmed high-mass clusters comprising the high-significance end of the total cluster sample identified in 455 square degrees of sky surveyed during 2008 at 148 GHz. We focus on the most massive systems to reduce the degeneracy between unknown cluster astrophysics and cosmology derived from SZ surveys. We describe the scaling relation between cluster mass and SZ signal with a 4-parameter fit. Marginalizing over the values of the parameters in this fit with conservative priors gives (sigma)8 = 0.851 +/- 0.115 and w = -1.14 +/- 0.35 for a spatially-flat wCDM cosmological model with WMAP 7-year priors on cosmological parameters. This gives a modest improvement in statistical uncertainty over WMAP 7-year constraints alone. Fixing the scaling relation between cluster mass and SZ signal to a fiducial relation obtained from numerical simulations and calibrated by X-ray observations, we find (sigma)8 + 0.821 +/- 0.044 and w = -1.05 +/- 0.20. These results are consistent with constraints from WMAP 7 plus baryon acoustic oscillations plus type Ia supernova which give (sigma)8 = 0.802 +/- 0.038 and w = -0.98 +/- 0.053. A stacking analysis of the clusters in this sample compared to clusters simulated assuming the fiducial model also shows good agreement. These results suggest that, given the sample of clusters used here, both the astrophysics of massive clusters and the cosmological parameters derived from them are broadly consistent with current models.
The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions
NASA Astrophysics Data System (ADS)
Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.
2018-04-01
The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morgan, T.L.
1979-11-01
During the summers of 1976 and 1977, 570 water and 1249 sediment samples were collected from 1517 locations within the 18,000-km/sup 2/ area of the Arminto NTMS quadrangle of central Wyoming. Water samples were collected from wells, springs, streams, and artifical ponds; sediment samples were collected from wet and dry streams, springs, and wet and dry ponds. All water samples were analyzed for 13 elements, including uranium, and each sediment sample was analyzed for 43 elements, including uranium and thorium. Uranium concentrations in water samples range from below the detection limit to 84.60 parts per billion (ppb) with a meanmore » of 4.32 ppb. All water sample types except pond water samples were considered as a single population in interpreting the data. Pond water samples were excluded due to possible concentration of uranium by evaporation. Most of the water samples containing greater than 20 ppb uranium grouped into six clusters that indicate possible areas of interest for further investigation. One cluster is associated with the Pumpkin Buttes District, and two others are near the Kaycee and Mayoworth areas of uranium mineralization. The largest cluster is located on the west side of the Powder River Basin. One cluster is located in the central Big Horn Basin and another is in the Wind River Basin; both are in areas underlain by favorable host units. Uranium concentrations in sediment samples range from 0.08 parts per million (ppm) to 115.50 ppm with a mean of 3.50 ppm. Two clusters of sediment samples over 7 ppm were delineated. The first, containing the two highest-concentration samples, corresponds with the Copper Mountain District. Many of the high uranium concentrations in samples in this cluster may be due to contamination from mining or prospecting activity upstream from the sample sites. The second cluster encompasses a wide area in the Wind River Basin along the southern boundary of the quadrangle.« less
Planck/SDSS Cluster Mass and Gas Scaling Relations for a Volume-Complete redMaPPer Sample
NASA Astrophysics Data System (ADS)
Jimeno, Pablo; Diego, Jose M.; Broadhurst, Tom; De Martino, I.; Lazkoz, Ruth
2018-04-01
Using Planck satellite data, we construct Sunyaev-Zel'dovich (SZ) gas pressure profiles for a large, volume-complete sample of optically selected clusters. We have defined a sample of over 8,000 redMaPPer clusters from the Sloan Digital Sky Survey (SDSS), within the volume-complete redshift region 0.100 < z < 0.325, for which we construct SZ effect maps by stacking Planck data over the full range of richness. Dividing the sample into richness bins we simultaneously solve for the mean cluster mass in each bin together with the corresponding radial pressure profile parameters, employing an MCMC analysis. These profiles are well detected over a much wider range of cluster mass and radius than previous work, showing a clear trend towards larger break radius with increasing cluster mass. Our SZ-based masses fall ˜16% below the mass-richness relations from weak lensing, in a similar fashion as the "hydrostatic bias" related with X-ray derived masses. Finally, we derive a tight Y500-M500 relation over a wide range of cluster mass, with a power law slope equal to 1.70 ± 0.07, that agrees well with the independent slope obtained by the Planck team with an SZ-selected cluster sample, but extends to lower masses with higher precision.
Extending cluster Lot Quality Assurance Sampling designs for surveillance programs
Hund, Lauren; Pagano, Marcello
2014-01-01
Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible non-parametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. PMID:24633656
Extending cluster lot quality assurance sampling designs for surveillance programs.
Hund, Lauren; Pagano, Marcello
2014-07-20
Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance on the basis of the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible nonparametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. Copyright © 2014 John Wiley & Sons, Ltd.
OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.
Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A
2014-10-16
The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.
Dynamics of cD Clusters of Galaxies. 4; Conclusion of a Survey of 25 Abell Clusters
NASA Technical Reports Server (NTRS)
Oegerle, William R.; Hill, John M.; Fisher, Richard R. (Technical Monitor)
2001-01-01
We present the final results of a spectroscopic study of a sample of cD galaxy clusters. The goal of this program has been to study the dynamics of the clusters, with emphasis on determining the nature and frequency of cD galaxies with peculiar velocities. Redshifts measured with the MX Spectrometer have been combined with those obtained from the literature to obtain typically 50 - 150 observed velocities in each of 25 galaxy clusters containing a central cD galaxy. We present a dynamical analysis of the final 11 clusters to be observed in this sample. All 25 clusters are analyzed in a uniform manner to test for the presence of substructure, and to determine peculiar velocities and their statistical significance for the central cD galaxy. These peculiar velocities were used to determine whether or not the central cD galaxy is at rest in the cluster potential well. We find that 30 - 50% of the clusters in our sample possess significant subclustering (depending on the cluster radius used in the analysis), which is in agreement with other studies of non-cD clusters. Hence, the dynamical state of cD clusters is not different than other present-day clusters. After careful study, four of the clusters appear to have a cD galaxy with a significant peculiar velocity. Dressler-Shectman tests indicate that three of these four clusters have statistically significant substructure within 1.5/h(sub 75) Mpc of the cluster center. The dispersion 75 of the cD peculiar velocities is 164 +41/-34 km/s around the mean cluster velocity. This represents a significant detection of peculiar cD velocities, but at a level which is far below the mean velocity dispersion for this sample of clusters. The picture that emerges is one in which cD galaxies are nearly at rest with respect to the cluster potential well, but have small residual velocities due to subcluster mergers.
NASA Astrophysics Data System (ADS)
Buonanno, R.; Corsi, C. E.; Pulone, L.; Fusi Pecci, F.; Bellazzini, M.
1998-05-01
A new procedure is described to derive homogeneous relative ages from the Color-Magnitude Diagrams (CMDs) of Galactic globular clusters (GGCs). It is based on the use of a new observable, Delta V(0.05) , namely the difference in magnitude between an arbitrary point on the upper main sequence (V_{+0.05} -the V magnitude of the MS-ridge, 0.05 mag redder than the Main Sequence (MS) Turn-off, (TO)) and the horizontal branch (HB). The observational error associated to Delta V(0.05) is substantially smaller than that of previous age-indicators, keeping the property of being strictly independent of distance and reddening and of being based on theoretical luminosities rather than on still uncertain theoretical temperatures. As an additional bonus, the theoretical models show that Delta V(0.05) has a low dependence on metallicity. Moreover, the estimates of the relative age so obtained are also sufficiently invariant (to within ~ +/- 1 Gyr) with varying adopted models and transformations. Since the difference in the color difference Delta (B-V)_{TO,RGB} (VandenBerg, Bolte and Stetson 1990 -VBS, Sarajedini and Demarque 1990 -SD) remains the most reliable technique to estimate relative cluster ages for clusters where the horizontal part of the HB is not adequately populated, we have used the differential ages obtained via the "vertical" Delta V(0.05) parameter for a selected sample of clusters (with high quality CMDs, well populated HBs, trustworthy calibrations) to perform an empirical calibration of the "horizontal" observable in terms of [Fe/H] and age. A direct comparison with the corresponding calibration derived from the theoretical models reveals the existence of clear-cut discrepancies, which call into question the model scaling with metallicity in the observational planes. Starting from the global sample of considered clusters, we have thus evaluated, within a homogeneous procedure, relative ages for 33 GGCs having different metallicity, HB-morphologies, and galactocentric distances. These new estimates have also been compared with previous latest determinations (Chaboyer, Demarque and Sarajedini 1996, and Richer {et al. } 1996). The distribution of the cluster ages with varying metallicity and galactocentric distance are briefly discussed: (a) there is no direct indication for any evident age-metallicity relationship; (b) there is some spread in age (still partially compatible with the errors), and the largest dispersion is found for intermediate metal-poor clusters; (c) older clusters populate both the inner and the outer regions of the Milky Way, while the younger globulars are present only in the outer regions, but the sample is far too poor to yield conclusive evidences.
Brief Communication: Buoyancy-Induced Differences in Soot Morphology
NASA Technical Reports Server (NTRS)
Ku, Jerry C.; Griffin, Devon W.; Greenberg, Paul S.; Roma, John
1995-01-01
Reduction or elimination of buoyancy in flames affects the dominant mechanisms driving heat transfer, burning rates and flame shape. The absence of buoyancy produces longer residence times for soot formation, clustering and oxidation. In addition, soot pathlines are strongly affected in microgravity. We recently conducted the first experiments comparing soot morphology in normal and reduced-gravity laminar gas jet diffusion flames. Thermophoretic sampling is a relatively new but well-established technique for studying the morphology of soot primaries and aggregates. Although there have been some questions about biasing that may be induced due to sampling, recent analysis by Rosner et al. showed that the sample is not biased when the system under study is operating in the continuum limit. Furthermore, even if the sampling is preferentially biased to larger aggregates, the size-invariant premise of fractal analysis should produce a correct fractal dimension.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deshpande, Amruta J.; Hughes, John P.; Wittman, David, E-mail: amrejd@physics.rutgers.edu, E-mail: jph@physics.rutgers.edu, E-mail: dwittman@physics.ucdavis.edu
We continue the study of the first sample of shear-selected clusters from the initial 8.6 square degrees of the Deep Lens Survey (DLS); a sample with well-defined selection criteria corresponding to the highest ranked shear peaks in the survey area. We aim to characterize the weak lensing selection by examining the sample’s X-ray properties. There are multiple X-ray clusters associated with nearly all the shear peaks: 14 X-ray clusters corresponding to seven DLS shear peaks. An additional three X-ray clusters cannot be definitively associated with shear peaks, mainly due to large positional offsets between the X-ray centroid and the shearmore » peak. Here we report on the XMM-Newton properties of the 17 X-ray clusters. The X-ray clusters display a wide range of luminosities and temperatures; the L {sub X} − T {sub X} relation we determine for the shear-associated X-ray clusters is consistent with X-ray cluster samples selected without regard to dynamical state, while it is inconsistent with self-similarity. For a subset of the sample, we measure X-ray masses using temperature as a proxy, and compare to weak lensing masses determined by the DLS team. The resulting mass comparison is consistent with equality. The X-ray and weak lensing masses show considerable intrinsic scatter (∼48%), which is consistent with X-ray selected samples when their X-ray and weak lensing masses are independently determined.« less
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-01
... unit is a block cluster, which consists of one or more geographically contiguous census blocks. As in... a number of distinct processes, ranging from forming block clusters, selecting the block clusters... sample of block clusters, while the E Sample is the census of housing units and enumerations in the same...
Classification of Two Class Motor Imagery Tasks Using Hybrid GA-PSO Based K-Means Clustering.
Suraj; Tiwari, Purnendu; Ghosh, Subhojit; Sinha, Rakesh Kumar
2015-01-01
Transferring the brain computer interface (BCI) from laboratory condition to meet the real world application needs BCI to be applied asynchronously without any time constraint. High level of dynamism in the electroencephalogram (EEG) signal reasons us to look toward evolutionary algorithm (EA). Motivated by these two facts, in this work a hybrid GA-PSO based K-means clustering technique has been used to distinguish two class motor imagery (MI) tasks. The proposed hybrid GA-PSO based K-means clustering is found to outperform genetic algorithm (GA) and particle swarm optimization (PSO) based K-means clustering techniques in terms of both accuracy and execution time. The lesser execution time of hybrid GA-PSO technique makes it suitable for real time BCI application. Time frequency representation (TFR) techniques have been used to extract the feature of the signal under investigation. TFRs based features are extracted and relying on the concept of event related synchronization (ERD) and desynchronization (ERD) feature vector is formed.
Classification of Two Class Motor Imagery Tasks Using Hybrid GA-PSO Based K-Means Clustering
Suraj; Tiwari, Purnendu; Ghosh, Subhojit; Sinha, Rakesh Kumar
2015-01-01
Transferring the brain computer interface (BCI) from laboratory condition to meet the real world application needs BCI to be applied asynchronously without any time constraint. High level of dynamism in the electroencephalogram (EEG) signal reasons us to look toward evolutionary algorithm (EA). Motivated by these two facts, in this work a hybrid GA-PSO based K-means clustering technique has been used to distinguish two class motor imagery (MI) tasks. The proposed hybrid GA-PSO based K-means clustering is found to outperform genetic algorithm (GA) and particle swarm optimization (PSO) based K-means clustering techniques in terms of both accuracy and execution time. The lesser execution time of hybrid GA-PSO technique makes it suitable for real time BCI application. Time frequency representation (TFR) techniques have been used to extract the feature of the signal under investigation. TFRs based features are extracted and relying on the concept of event related synchronization (ERD) and desynchronization (ERD) feature vector is formed. PMID:25972896
Genome Engineering and Modification Toward Synthetic Biology for the Production of Antibiotics.
Zou, Xuan; Wang, Lianrong; Li, Zhiqiang; Luo, Jie; Wang, Yunfu; Deng, Zixin; Du, Shiming; Chen, Shi
2018-01-01
Antibiotic production is often governed by large gene clusters composed of genes related to antibiotic scaffold synthesis, tailoring, regulation, and resistance. With the expansion of genome sequencing, a considerable number of antibiotic gene clusters has been isolated and characterized. The emerging genome engineering techniques make it possible towards more efficient engineering of antibiotics. In addition to genomic editing, multiple synthetic biology approaches have been developed for the exploration and improvement of antibiotic natural products. Here, we review the progress in the development of these genome editing techniques used to engineer new antibiotics, focusing on three aspects of genome engineering: direct cloning of large genomic fragments, genome engineering of gene clusters, and regulation of gene cluster expression. This review will not only summarize the current uses of genomic engineering techniques for cloning and assembly of antibiotic gene clusters or for altering antibiotic synthetic pathways but will also provide perspectives on the future directions of rebuilding biological systems for the design of novel antibiotics. © 2017 Wiley Periodicals, Inc.
Brighter galaxy bias: underestimating the velocity dispersions of galaxy clusters
NASA Astrophysics Data System (ADS)
Old, L.; Gray, M. E.; Pearce, F. R.
2013-09-01
We study the systematic bias introduced when selecting the spectroscopic redshifts of brighter cluster galaxies to estimate the velocity dispersion of galaxy clusters from both simulated and observational galaxy catalogues. We select clusters with Ngal ≥ 50 at five low-redshift snapshots from the publicly available De Lucia & Blaziot semi-analytic model galaxy catalogue. Clusters are also selected from the Tempel Sloan Digital Sky Survey Data Release 8 groups and clusters catalogue across the redshift range 0.021 ≤ z ≤ 0.098. We employ various selection techniques to explore whether the velocity dispersion bias is simply due to a lack of dynamical information or is the result of an underlying physical process occurring in the cluster, for example, dynamical friction experienced by the brighter cluster members. The velocity dispersions of the parent dark matter (DM) haloes are compared to the galaxy cluster dispersions and the stacked distribution of DM particle velocities is examined alongside the corresponding galaxy velocity distribution. We find a clear bias between the halo and the semi-analytic galaxy cluster velocity dispersion on the order of σgal/σDM ˜ 0.87-0.95 and a distinct difference in the stacked galaxy and DM particle velocities distribution. We identify a systematic underestimation of the velocity dispersions when imposing increasing absolute I-band magnitude limits. This underestimation is enhanced when using only the brighter cluster members for dynamical analysis on the order of 5-35 per cent, indicating that dynamical friction is a serious source of bias when using galaxy velocities as tracers of the underlying gravitational potential. In contrast to the literature we find that the resulting bias is not only halo mass dependent but also that the nature of the dependence changes according to the galaxy selection strategy. We make a recommendation that, in the realistic case of limited availability of spectral observations, a strictly magnitude-limited sample should be avoided to ensure an unbiased estimate of the velocity dispersion.
Minetti, Andrea; Riera-Montes, Margarita; Nackers, Fabienne; Roederer, Thomas; Koudika, Marie Hortense; Sekkenes, Johanne; Taconet, Aurore; Fermon, Florence; Touré, Albouhary; Grais, Rebecca F; Checchi, Francesco
2012-10-12
Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes.
Harvey, Scott D; Jarman, Kristin H; Moran, James J; Sorensen, Christina M; Wright, Bob W
2012-09-15
The purpose of this study was to perform a preliminary investigation of compound-specific isotope analysis (CSIA) of diesel fuels to evaluate whether the technique could distinguish diesel samples from different sources/locations. The ability to differentiate or correlate diesel samples could be valuable for discovering fuel tax evasion schemes or for environmental forensic studies. Two urea adduction-based techniques were used to isolate the n-alkanes from the fuel. Both carbon isotope ratio (δ(13)C) and hydrogen isotope ratio (δD) values for the n-alkanes were then determined by CSIA in each sample. The samples investigated had δ(13)C values that ranged from -30.1‰ to -26.8‰, whereas δD values ranged from -83‰ to -156‰. Plots of δD versus δ(13)C with sample n-alkane points connected in order of increasing carbon number gave well-separated clusters with characteristic shapes for each sample. Principal components analysis (PCA) with δ(13)C, δD, or combined δ(13)C and δD data was applied to extract the maximum information content. PCA scores plots could clearly differentiate the samples, thereby demonstrating the potential of this approach for distinguishing (e.g., fingerprinting) fuel samples using δ(13)C and δD values. Copyright © 2012 Elsevier B.V. All rights reserved.
Sampling effort affects multivariate comparisons of stream assemblages
Cao, Y.; Larsen, D.P.; Hughes, R.M.; Angermeier, P.L.; Patton, T.M.
2002-01-01
Multivariate analyses are used widely for determining patterns of assemblage structure, inferring species-environment relationships and assessing human impacts on ecosystems. The estimation of ecological patterns often depends on sampling effort, so the degree to which sampling effort affects the outcome of multivariate analyses is a concern. We examined the effect of sampling effort on site and group separation, which was measured using a mean similarity method. Two similarity measures, the Jaccard Coefficient and Bray-Curtis Index were investigated with 1 benthic macroinvertebrate and 2 fish data sets. Site separation was significantly improved with increased sampling effort because the similarity between replicate samples of a site increased more rapidly than between sites. Similarly, the faster increase in similarity between sites of the same group than between sites of different groups caused clearer separation between groups. The strength of site and group separation completely stabilized only when the mean similarity between replicates reached 1. These results are applicable to commonly used multivariate techniques such as cluster analysis and ordination because these multivariate techniques start with a similarity matrix. Completely stable outcomes of multivariate analyses are not feasible. Instead, we suggest 2 criteria for estimating the stability of multivariate analyses of assemblage data: 1) mean within-site similarity across all sites compared, indicating sample representativeness, and 2) the SD of within-site similarity across sites, measuring sample comparability.
Local Prediction Models on Mid-Atlantic Ridge MORB by Principal Component Regression
NASA Astrophysics Data System (ADS)
Ling, X.; Snow, J. E.; Chin, W.
2017-12-01
The isotopic compositions of the daughter isotopes of long-lived radioactive systems (Sr, Nd, Hf and Pb ) can be used to map the scale and history of mantle heterogeneities beneath mid-ocean ridges. Our goal is to relate the multidimensional structure in the existing isotopic dataset with an underlying physical reality of mantle sources. The numerical technique of Principal Component Analysis is useful to reduce the linear dependence of the data to a minimum set of orthogonal eigenvectors encapsulating the information contained (cf Agranier et al 2005). The dataset used for this study covers almost all the MORBs along mid-Atlantic Ridge (MAR), from 54oS to 77oN and 8.8oW to -46.7oW, including replicating the dataset of Agranier et al., 2005 published plus 53 basalt samples dredged and analyzed since then (data from PetDB). The principal components PC1 and PC2 account for 61.56% and 29.21%, respectively, of the total isotope ratios variability. The samples with similar compositions to HIMU and EM and DM are identified to better understand the PCs. PC1 and PC2 are accountable for HIMU and EM whereas PC2 has limited control over the DM source. PC3 is more strongly controlled by the depleted mantle source than PC2. What this means is that all three principal components have a high degree of significance relevant to the established mantle sources. We also tested the relationship between mantle heterogeneity and sample locality. K-means clustering algorithm is a type of unsupervised learning to find groups in the data based on feature similarity. The PC factor scores of each sample are clustered into three groups. Cluster one and three are alternating on the north and south MAR. Cluster two exhibits on 45.18oN to 0.79oN and -27.9oW to -30.40oW alternating with cluster one. The ridge has been preliminarily divided into 16 sections considering both the clusters and ridge segments. The principal component regression models the section based on 6 isotope ratios and PCs. The prediction residual is about 1-2km. It means that the combined 5 isotopes are a strong predictor of geographic location along the ridge, a slightly surprising result. PCR is a robust and powerful method for both visualizing and manipulating the multidimensional representation of isotope data.
Improvements in Ionized Cluster-Beam Deposition
NASA Technical Reports Server (NTRS)
Fitzgerald, D. J.; Compton, L. E.; Pawlik, E. V.
1986-01-01
Lower temperatures result in higher purity and fewer equipment problems. In cluster-beam deposition, clusters of atoms formed by adiabatic expansion nozzle and with proper nozzle design, expanding vapor cools sufficiently to become supersaturated and form clusters of material deposited. Clusters are ionized and accelerated in electric field and then impacted on substrate where films form. Improved cluster-beam technique useful for deposition of refractory metals.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cahill, John F.; Kertesz, Vilmos; Ovchinnikova, Olga S.
2015-06-27
Recently a number of techniques have combined laser ablation with liquid capture for mass spectrometry spot sampling and imaging applications. The newly developed non-contact liquid-vortex capture probe has been used to efficiently collect 355 nm UV laser ablated material in a continuous flow solvent stream in which the captured material dissolves and then undergoes electrospray ionization. This sampling and ionization approach has produced what appear to be classic electrospray ionization spectra; however, the softness of this sampling/ionization process versus simple electrospray ionization has not been definitely determined. A series of benzlypyridinium salts, known as thermometer ions, were used to comparemore » internal energy distributions between electrospray ionization and the UV laser ablation liquid-vortex capture probe electrospray combination. Measured internal energy distributions were identical between the two techniques, even with differences in laser fluence (0.7-3.1 J cm-2) and when using UV-absorbing or non-UV-absorbing sample substrates. This data indicates ions formed directly by UV laser ablation, if any, are likely an extremely small constituent of the total ion signal observed. Instead, neutral molecules, clusters or particulates ejected from the surface during laser ablation, subsequently captured and dissolved in the flowing solvent stream then electrosprayed are the predominant source of ion signal observed. The electrospray ionization process used controls the softness of the technique.« less
mcrA-Targeted Real-Time Quantitative PCR Method To Examine Methanogen Communities▿
Steinberg, Lisa M.; Regan, John M.
2009-01-01
Methanogens are of great importance in carbon cycling and alternative energy production, but quantitation with culture-based methods is time-consuming and biased against methanogen groups that are difficult to cultivate in a laboratory. For these reasons, methanogens are typically studied through culture-independent molecular techniques. We developed a SYBR green I quantitative PCR (qPCR) assay to quantify total numbers of methyl coenzyme M reductase α-subunit (mcrA) genes. TaqMan probes were also designed to target nine different phylogenetic groups of methanogens in qPCR assays. Total mcrA and mcrA levels of different methanogen phylogenetic groups were determined from six samples: four samples from anaerobic digesters used to treat either primarily cow or pig manure and two aliquots from an acidic peat sample stored at 4°C or 20°C. Only members of the Methanosaetaceae, Methanosarcina, Methanobacteriaceae, and Methanocorpusculaceae and Fen cluster were detected in the environmental samples. The three samples obtained from cow manure digesters were dominated by members of the genus Methanosarcina, whereas the sample from the pig manure digester contained detectable levels of only members of the Methanobacteriaceae. The acidic peat samples were dominated by both Methanosarcina spp. and members of the Fen cluster. In two of the manure digester samples only one methanogen group was detected, but in both of the acidic peat samples and two of the manure digester samples, multiple methanogen groups were detected. The TaqMan qPCR assays were successfully able to determine the environmental abundance of different phylogenetic groups of methanogens, including several groups with few or no cultivated members. PMID:19447957
A Multivariate Analysis of Galaxy Cluster Properties
NASA Astrophysics Data System (ADS)
Ogle, P. M.; Djorgovski, S.
1993-05-01
We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.
CROSS-CORRELATING THE γ-RAY SKY WITH CATALOGS OF GALAXY CLUSTERS
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro; ...
2017-01-18
In this article, we report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ-ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few tomore » tens of megaparsecs, i.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, i.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ-ray emission from the intracluster medium. Lastly, we argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.« less
NASA Astrophysics Data System (ADS)
Fassbender, R.; Böhringer, H.; Nastasi, A.; Šuhada, R.; Mühlegger, M.; de Hoon, A.; Kohnert, J.; Lamer, G.; Mohr, J. J.; Pierini, D.; Pratt, G. W.; Quintana, H.; Rosati, P.; Santos, J. S.; Schwope, A. D.
2011-12-01
We present the largest sample to date of spectroscopically confirmed x-ray luminous high-redshift galaxy clusters comprising 22 systems in the range 0.9 as part of the XMM-Newton Distant Cluster Project (XDCP). All systems were initially selected as extended x-ray sources over 76.1 deg2 of non-contiguous deep archival XMM-Newton coverage, of which 49.4 deg2 are part of the core survey with a quantifiable selection function and 17.7 deg2 are classified as ‘gold’ coverage as the starting point for upcoming cosmological applications. Distant cluster candidates were followed up with moderately deep optical and near-infrared imaging in at least two bands to photometrically identify the cluster galaxy populations and obtain redshift estimates based on the colors of simple stellar population models. We test and calibrate the most promising redshift estimation techniques based on the R-z and z-H colors for efficient distant cluster identifications and find a good redshift accuracy performance of the z-H color out to at least z ˜ 1.5, while the redshift evolution of the R-z color leads to increasingly large uncertainties at z ≳ 0.9. Photometrically identified high-z systems are spectroscopically confirmed with VLT/FORS 2 with a minimum of three concordant cluster member redshifts. We present first details of two newly identified clusters, XDCP J0338.5+0029 at z = 0.916 and XDCP J0027.2+1714 at z = 0.959, and investigate the x-ray properties of SpARCS J003550-431224 at z = 1.335, which shows evidence for ongoing major merger activity along the line-of-sight. We provide x-ray properties and luminosity-based total mass estimates for the full sample of 22 high-z clusters, of which 17 are at z ⩾ 1.0 and seven populate the highest redshift bin at z > 1.3. The median system mass of the sample is M200 ≃ 2 × 1014 M⊙, while the probed mass range for the distant clusters spans approximately (0.7-7) × 1014 M⊙. The majority (>70%) of the x-ray selected clusters show rather regular x-ray morphologies, albeit in most cases with a discernible elongation along one axis. In contrast to local clusters, the z > 0.9 systems mostly do not harbor central dominant galaxies coincident with the x-ray centroid position, but rather exhibit significant brightest cluster galaxy (BCG) offsets from the x-ray center with a median value of about 50 kpc in projection and a smaller median luminosity gap to the second-ranked galaxy of Δm12 ≃ 0.3 mag. We estimate a fraction of cluster-associated NVSS 1.4 GHz radio sources of about 30%, preferentially located within 1‧ from the x-ray center. This value suggests an increase of the fraction of very luminous cluster-associated radio sources by about a factor of 2.5-5 relative to low-z systems. The galaxy populations in z ≳ 1.5 cluster environments show first evidence for drastic changes on the high-mass end of galaxies and signs of a gradual disappearance of a well-defined cluster red-sequence as strong star formation activity is observed in an increasing fraction of massive galaxies down to the densest core regions. The presented XDCP high-z sample will allow first detailed studies of the cluster population during the critical cosmic epoch at lookback times of 7.3-9.5 Gyr on the aggregation and evolution of baryons in the cold and hot phases as a function of redshift and system mass. Based on observations under program IDs 079.A-0634 and 085.A-0647 collected at the European Organisation for Astronomical Research in the Southern Hemisphere, Chile, and observations collected at the Centro Astronómico Hispano Alemán (CAHA) at Calar Alto, operated jointly by the Max-Planck Institut für Astronomie and the Instituto de Astrofísica de Andalucía (CSIC).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro
In this article, we report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ-ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few tomore » tens of megaparsecs, i.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, i.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ-ray emission from the intracluster medium. Lastly, we argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro
We report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ -ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few to tens ofmore » megaparsecs, i.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, i.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ -ray emission from the intracluster medium. We argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Radisavljević, Ivana, E-mail: iva@vin.bg.ac.rs; Novaković, Nikola; Matović, Branko
2016-02-15
Highlights: • Zn{sub 0.95}Co{sub 0.05}O nanopowders are characterized by high structural order. • Co atoms show no tendency for Co–Co clustering and Co–Ov complexes formation. • Co–O–Co clustering along the c-axis has not lead to ferromagnetic order. • XMCD provides no evidence of magnetic polarization of O 2p and Co 3d states. - Abstract: X-ray absorption (XANES, EXAFS, XMCD) and photoelectron (XPS) spectroscopic techniques were employed to study local structural, electronic and magnetic properties of Zn{sub 0.95}Co{sub 0.05}O nanopowders. The substitutional Co{sup 2+} ions are incorporated in ZnO lattice at regular Zn sites and the sample is characterized by highmore » structural order. There was no sign of ferromagnetic ordering of Co magnetic moments and the sample is in paramagnetic state at all temperatures down to 5 K. The possible connection of the structural defects with the absence of ferromagnetism is discussed on the basis of theoretical calculations of the O K-edge absorption spectra.« less
Cao, Zhen; Wang, Zhenjie; Shang, Zhonglin; Zhao, Jiancheng
2017-01-01
Fourier-transform infrared spectroscopy (FTIR) with the attenuated total reflectance technique was used to identify Rhodobryum roseum from its four adulterants. The FTIR spectra of six samples in the range from 4000 cm-1 to 600 cm-1 were obtained. The second-derivative transformation test was used to identify the small and nearby absorption peaks. A cluster analysis was performed to classify the spectra in a dendrogram based on the spectral similarity. Principal component analysis (PCA) was used to classify the species of six moss samples. A cluster analysis with PCA was used to identify different genera. However, some species of the same genus exhibited highly similar chemical components and FTIR spectra. Fourier self-deconvolution and discrete wavelet transform (DWT) were used to enhance the differences among the species with similar chemical components and FTIR spectra. Three scales were selected as the feature-extracting space in the DWT domain. The results show that FTIR spectroscopy with chemometrics is suitable for identifying Rhodobryum roseum and its adulterants.
Active constrained clustering by examining spectral Eigenvectors
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; desJardins, Marie; Xu, Qianjun
2005-01-01
This work focuses on the active selection of pairwise constraints for spectral clustering. We develop and analyze a technique for Active Constrained Clustering by Examining Spectral eigenvectorS (ACCESS) derived from a similarity matrix.
An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems
Dawson, Kevin J.; Belkhir, Khalid
2009-01-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306
The JCMT Gould Belt Survey: Dense Core Clusters in Orion B
NASA Astrophysics Data System (ADS)
Kirk, H.; Johnstone, D.; Di Francesco, J.; Lane, J.; Buckle, J.; Berry, D. S.; Broekhoven-Fiene, H.; Currie, M. J.; Fich, M.; Hatchell, J.; Jenness, T.; Mottram, J. C.; Nutter, D.; Pattle, K.; Pineda, J. E.; Quinn, C.; Salji, C.; Tisi, S.; Hogerheijde, M. R.; Ward-Thompson, D.; The JCMT Gould Belt Survey Team
2016-04-01
The James Clerk Maxwell Telescope Gould Belt Legacy Survey obtained SCUBA-2 observations of dense cores within three sub-regions of Orion B: LDN 1622, NGC 2023/2024, and NGC 2068/2071, all of which contain clusters of cores. We present an analysis of the clustering properties of these cores, including the two-point correlation function and Cartwright’s Q parameter. We identify individual clusters of dense cores across all three regions using a minimal spanning tree technique, and find that in each cluster, the most massive cores tend to be centrally located. We also apply the independent M-Σ technique and find a strong correlation between core mass and the local surface density of cores. These two lines of evidence jointly suggest that some amount of mass segregation in clusters has happened already at the dense core stage.
NASA Technical Reports Server (NTRS)
Donahue, Megan; Scharf, Caleb A.; Mack, Jennifer; Lee, Y. Paul; Postman, Marc; Rosait, Piero; Dickinson, Mark; Voit, G. Mark; Stocke, John T.
2002-01-01
We present and analyze the optical and X-ray catalogs of moderate-redshift cluster candidates from the ROSA TOptical X-Ray Survey, or ROXS. The survey covers the sky area contained in the fields of view of 23 deep archival ROSA T PSPC pointings, 4.8 square degrees. The cross-correlated cluster catalogs were con- structed by comparing two independent catalogs extracted from the optical and X-ray bandpasses, using a matched-filter technique for the optical data and a wavelet technique for the X-ray data. We cross-identified cluster candidates in each catalog. As reported in Paper 1, the matched-filter technique found optical counter- parts for at least 60% (26 out of 43) of the X-ray cluster candidates; the estimated redshifts from the matched filter algorithm agree with at least 7 of 1 1 spectroscopic confirmations (Az 5 0.10). The matched filter technique. with an imaging sensitivity of ml N 23, identified approximately 3 times the number of candidates (155 candidates, 142 with a detection confidence >3 u) found in the X-ray survey of nearly the same area. There are 57 X-ray candidates, 43 of which are unobscured by scattered light or bright stars in the optical images. Twenty-six of these have fairly secure optical counterparts. We find that the matched filter algorithm, when applied to images with galaxy flux sensitivities of mI N 23, is fairly well-matched to discovering z 5 1 clusters detected by wavelets in ROSAT PSPC exposures of 8000-60,000 s. The difference in the spurious fractions between the optical and X-ray (30%) and IO%, respectively) cannot account for the difference in source number. In Paper I, we compared the optical and X-ray cluster luminosity functions and we found that the luminosity functions are consistent if the relationship between X-ray and optical luminosities is steep (Lx o( L&f). Here, in Paper 11, we present the cluster catalogs and a numerical simulation of the ROXS. We also present color-magnitude plots for several of the cluster candidates, and examine the prominence of the red sequence in each. We find that the X-ray clusters in our survey do not all have a prominent red sequence. We conclude that while the red sequence may be a distinct feature in the color-magnitude plots for virialized massive clusters, it may be less distinct in lower mass clusters of galaxies at even moderate redshifts. Multiple, complementary methods of selecting and defining clusters may be essential, particularly at high redshift where all methods start to run into completeness limits, incomplete understanding of physical evolution, and projection effects.
Network module detection: Affinity search technique with the multi-node topological overlap measure
Li, Ai; Horvath, Steve
2009-01-01
Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323
Network module detection: Affinity search technique with the multi-node topological overlap measure.
Li, Ai; Horvath, Steve
2009-07-20
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
1994-09-30
relational versus object oriented DBMS, knowledge discovery, data models, rnetadata, data filtering, clustering techniques, and synthetic data. A secondary...The first was the investigation of Al/ES Lapplications (knowledge discovery, data mining, and clustering ). Here CAST collabo.rated with Dr. Fred Petry...knowledge discovery system based on clustering techniques; implemented an on-line data browser to the DBMS; completed preliminary efforts to apply object
The VLT LBG Redshift Survey - III. The clustering and dynamics of Lyman-break galaxies at z ˜ 3
NASA Astrophysics Data System (ADS)
Bielby, R.; Hill, M. D.; Shanks, T.; Crighton, N. H. M.; Infante, L.; Bornancini, C. G.; Francke, H.; Héraudeau, P.; Lambas, D. G.; Metcalfe, N.; Minniti, D.; Padilla, N.; Theuns, T.; Tummuangpak, P.; Weilbacher, P.
2013-03-01
We present a catalogue of 2135 galaxy redshifts from the VLT LBG Redshift Survey (VLRS), a spectroscopic survey of z ≈ 3 galaxies in wide fields centred on background quasi-stellar objects. We have used deep optical imaging to select galaxies via the Lyman-break technique. Spectroscopy of the Lyman-break galaxies (LBGs) was then made using the Very Large Telescope (VLT) Visible Multi-Object Spectrograph (VIMOS) instrument, giving a mean redshift of z = 2.79. We analyse the clustering properties of the VLRS sample and also of the VLRS sample combined with the smaller area Keck-based survey of Steidel et al. From the semiprojected correlation function, wp(σ), for the VLRS and combined surveys, we find that the results are well fit with a single power-law model, with clustering scale lengths of r0 = 3.46 ± 0.41 and 3.83 ± 0.24 h-1 Mpc, respectively. We note that the corresponding combined ξ(r) slope is flatter than for local galaxies at γ = 1.5-1.6 rather than γ = 1.8. This flat slope is confirmed by the z-space correlation function, ξ(s), and in the range 10 < s < 100 h-1 Mpc the VLRS shows an ≈2.5σ excess over the Λ cold dark matter (ΛCDM) linear prediction. This excess may be consistent with recent evidence for non-Gaussianity in clustering results at z ≈ 1. We then analyse the LBG z-space distortions using the 2D correlation function, ξ(σ, π), finding for the combined sample a large-scale infall parameter of β = 0.38 ± 0.19 and a velocity dispersion of sqrt{< w_z^2rangle }=420^{+140}_{-160} km s^{-1}. Based on our measured β, we are able to determine the gravitational growth rate, finding a value of f(z = 3) = 0.99 ± 0.50 (or fσ8 = 0.26 ± 0.13), which is the highest redshift measurement of the growth rate via galaxy clustering and is consistent with ΛCDM. Finally, we constrain the mean halo mass for the LBG population, finding that the VLRS and combined sample suggest mean halo masses of log(MDM/M⊙) = 11.57 ± 0.15 and 11.73 ± 0.07, respectively.
Tremsin, Anton S.; Makowska, Małgorzata G.; Perrodin, Didier; ...
2016-04-12
Neutrons are known to be unique probes in situations where other types of radiation fail to penetrate samples and their surrounding structures. In this paper it is demonstrated how thermal and cold neutron radiography can provide time-resolved imaging of materials while they are being processed (e.g.while growing single crystals). The processing equipment, in this case furnaces, and the scintillator materials are opaque to conventional X-ray interrogation techniques. The distribution of the europium activator within a BaBrCl:Eu scintillator (0.1 and 0.5% nominal doping concentrations per mole) is studiedin situduring the melting and solidification processes with a temporal resolution of 5–7 s.more » The strong tendency of the Eu dopant to segregate during the solidification process is observed in repeated cycles, with Eu forming clusters on multiple length scales (only for clusters larger than ~50 µm, as limited by the resolution of the present experiments). It is also demonstrated that the dopant concentration can be quantified even for very low concentration levels (~0.1%) in 10 mm thick samples. The interface between the solid and liquid phases can also be imaged, provided there is a sufficient change in concentration of one of the elements with a sufficient neutron attenuation cross section. Tomographic imaging of the BaBrCl:0.1%Eu sample reveals a strong correlation between crystal fractures and Eu-deficient clusters. The results of these experiments demonstrate the unique capabilities of neutron imaging forin situdiagnostics and the optimization of crystal-growth procedures.« less
Song, Min; Yu, Hwanjo; Han, Wook-Shin
2011-11-24
Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.
Spitzer Imaging of Planck-Herschel Dusty Proto-Clusters at z=2-3
NASA Astrophysics Data System (ADS)
Cooray, Asantha; Ma, Jingzhe; Greenslade, Joshua; Kubo, Mariko; Nayyeri, Hooshang; Clements, David; Cheng, Tai-An
2018-05-01
We have recently introduced a new proto-cluster selection technique by combing Herschel/SPIRE imaging data and Planck/HFIk all-sky survey point source catalog. These sources are identified as Planck point sources with clumps of Herschel source over-densities with far-IR colors comparable to z=0 ULIRGS redshifted to z=2 to 3. The selection is sensitive to dusty starbursts and obscured QSOs and we have recovered couple of the known proto-clusters and close to 30 new proto-clusters. The candidate proto-clusters selected from this technique have far-IR flux densities several times higher than those that are optically selected, such as using LBG selection, implying that the member galaxies are in a special phase of heightened dusty starburst and dusty QSO activity. This far-IR luminous phase may be short but likely to be necessary piece to understand the whole stellar mass assembly history of clusters. Moreover, our photo-clusters are missed in optical selections, suggesting that optically selected proto-clusters alone do not provide adequate statistics and a comparison of the far-IR and optical selected clusters may reveal the importance of the dusty stellar mass assembly. Here, we propose IRAC observations of six of the highest priority new proto-clusters, to establish the validity of the technique and to determine the total stellar mass through SED models. For a modest observing time the science program will have a substantial impact on an upcoming science topic in cosmology with implications for observations with JWST and WFIRST to understand the mass assembly in the universe.
Manju, Md Abu; Candel, Math J J M; Berger, Martijn P F
2014-07-10
In this paper, the optimal sample sizes at the cluster and person levels for each of two treatment arms are obtained for cluster randomized trials where the cost-effectiveness of treatments on a continuous scale is studied. The optimal sample sizes maximize the efficiency or power for a given budget or minimize the budget for a given efficiency or power. Optimal sample sizes require information on the intra-cluster correlations (ICCs) for effects and costs, the correlations between costs and effects at individual and cluster levels, the ratio of the variance of effects translated into costs to the variance of the costs (the variance ratio), sampling and measuring costs, and the budget. When planning, a study information on the model parameters usually is not available. To overcome this local optimality problem, the current paper also presents maximin sample sizes. The maximin sample sizes turn out to be rather robust against misspecifying the correlation between costs and effects at the cluster and individual levels but may lose much efficiency when misspecifying the variance ratio. The robustness of the maximin sample sizes against misspecifying the ICCs depends on the variance ratio. The maximin sample sizes are robust under misspecification of the ICC for costs for realistic values of the variance ratio greater than one but not robust under misspecification of the ICC for effects. Finally, we show how to calculate optimal or maximin sample sizes that yield sufficient power for a test on the cost-effectiveness of an intervention.
VizieR Online Data Catalog: LAMOST survey of star clusters in M31. II. (Chen+, 2016)
NASA Astrophysics Data System (ADS)
Chen, B.; Liu, X.; Xiang, M.; Yuan, H.; Huang, Y.; Shi, J.; Fan, Z.; Huo, Z.; Wang, C.; Ren, J.; Tian, Z.; Zhang, H.; Liu, G.; Cao, Z.; Zhang, Y.; Hou, Y.; Wang, Y.
2016-09-01
We select a sample of 306 massive star clusters observed with the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) in the vicinity fields of M31 and M33. Massive clusters in our sample are all selected from the catalog presented in Paper I (Chen et al. 2015, Cat. J/other/RAA/15.1392), including five newly discovered clusters selected with the SDSS photometry, three newly confirmed, and 298 previously known clusters from Revised Bologna Catalogue (RBC; Galleti et al. 2012, Cat. V/143; http://www.bo.astro.it/M31/). Since then another two objects, B341 and B207, have also been observed with LAMOST, and they are included in the current analysis. The current sample does not include those listed in Paper I but is selected from Johnson et al. 2012 (Cat. J/ApJ/752/95) since most of them are young but not so massive. All objects are observed with LAMOST between 2011 September and 2014 June. Table1 lists the name, position, and radial velocity of all sample clusters analyzed in the current work. The LAMOST spectra cover the wavelength range 3700-9000Å at a resolving power of R~1800. Details about the observations and data reduction can be found in Paper I. The median signal-to-noise ratio (S/N) per pixel at 4750 and 7450Å of spectra of all clusters in the current sample are, respectively, 14 and 37. Essentially all spectra have S/N(4750Å)>5 except for the spectra of 18 clusters. The latter have S/N(7540Å)>10. Peacock et al. 2010 (Cat. J/MNRAS/402/803) retrieved images of M31 star clusters and candidates from the SDSS archive and extracted ugriz aperture photometric magnitudes from those objects using the SExtractor. They present a catalog containing homogeneous ugriz photometry of 572 star clusters and 373 candidates. Among them, 299 clusters are in our sample. (2 data files).
Hierarchical clustering of EMD based interest points for road sign detection
NASA Astrophysics Data System (ADS)
Khan, Jesmin; Bhuiyan, Sharif; Adhami, Reza
2014-04-01
This paper presents an automatic road traffic signs detection and recognition system based on hierarchical clustering of interest points and joint transform correlation. The proposed algorithm consists of the three following stages: interest points detection, clustering of those points and similarity search. At the first stage, good discriminative, rotation and scale invariant interest points are selected from the image edges based on the 1-D empirical mode decomposition (EMD). We propose a two-step unsupervised clustering technique, which is adaptive and based on two criterion. In this context, the detected points are initially clustered based on the stable local features related to the brightness and color, which are extracted using Gabor filter. Then points belonging to each partition are reclustered depending on the dispersion of the points in the initial cluster using position feature. This two-step hierarchical clustering yields the possible candidate road signs or the region of interests (ROIs). Finally, a fringe-adjusted joint transform correlation (JTC) technique is used for matching the unknown signs with the existing known reference road signs stored in the database. The presented framework provides a novel way to detect a road sign from the natural scenes and the results demonstrate the efficacy of the proposed technique, which yields a very low false hit rate.
NASA Astrophysics Data System (ADS)
Seetha, D.; Velraj, G.
2015-10-01
The ancient materials characterization will bring back the more evidence of the ancient people life styles. In this study, the archaeological pottery shards recently excavated from Kodumanal, Erode District in Tamilnadu, South India were investigated. The experimental results enlighten us to the elemental and the mineral composition of the pottery shards. The FT-IR technique tells that the mineralogy and the firing temperature of the samples are less than 800 °C, in the oxidizing/reducing atmosphere and the XRD was used as a complementary technique for the mineralogy. A thorough scientific study of SEM-EDS with the help of statistical approach done to find the provenance of the selected pot shards has not yet been performed. EDS and XRF results revealed that the investigated samples have the elements O, Si, Al, Fe, Mn, Mg, Ca, Ti, K and Na are in different compositions. For establishing the provenance (same or different origin) of pottery samples, Al and Si concentration ratio as well as hierarchical cluster analysis (HCA) was used and the results are correlated.
ToF-SIMS observation for evaluating the interaction between amyloid β and lipid membranes.
Aoyagi, Satoka; Shimanouchi, Toshinori; Kawashima, Tomoko; Iwai, Hideo
2015-04-01
The adsorption behaviour of amyloid beta (Aβ), thought to be a key peptide for understanding Alzheimer's disease, was investigated by means of time-of-flight secondary ion mass spectrometry (ToF-SIMS). Aβ aggregates depending on the lipid membrane condition though it has not been fully understood yet. In this study, Aβ samples on different lipid membranes, 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) and 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), were observed with ToF-SIMS and the complex ToF-SIMS data of the Aβ samples was interpreted using data analysis techniques such as principal component analysis (PCA), gentle-SIMS (G-SIMS) and g-ogram. DOPC and DMPC are liquid crystal at room temperature, while DPPC is gel at room temperature. As primary ion beams, Bi3(+) and Ar cluster ion beams were used and the effect of an Ar cluster ion for evaluating biomolecules was also studied. The secondary ion images of the peptide fragment ions indicated by G-SIMS and g-ogram were consistent with the PCA results. It is suggested that Aβ is adsorbed homogeneously on the liquid-crystalline-phase lipid membranes, while it aggregates along the lipid on the gel-phase lipid membrane. Moreover, in the results using the Ar cluster, the influence of contamination was reduced.
El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz
2017-10-01
Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock-water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cluster Masses Derived from X-ray and Sunyaev-Zeldovich Effect Measurements
NASA Technical Reports Server (NTRS)
Laroque, S.; Joy, Marshall; Bonamente, M.; Carlstrom, J.; Dawson, K.
2003-01-01
We infer the gas mass and total gravitational mass of 11 clusters using two different methods; analysis of X-ray data from the Chandra X-ray Observatory and analysis of centimeter-wave Sunyaev-Zel'dovich Effect (SZE) data from the BIMA and OVRO interferometers. This flux-limited sample of clusters from the BCS cluster catalogue was chosen so as to be well above the surface brightness limit of the ROSAT All Sky Survey; this is therefore an orientation unbiased sample. The gas mass fraction, f_g, is calculated for each cluster using both X-ray and SZE data, and the results are compared at a fiducial radius of r_500. Comparison of the X-ray and SZE results for this orientation unbiased sample allows us to constrain cluster systematics, such as clumping of the intracluster medium. We derive an upper limit on Omega_M assuming that the mass composition of clusters within r_500 reflects the universal mass composition Omega_M h_100 is greater than Omega _B / f-g. We also demonstrate how the mean f_g derived from the sample can be used to estimate the masses of clusters discovered by upcoming deep SZE surveys.
Physical properties of star clusters in the outer LMC as observed by the DES
Pieres, A.; Santiago, B.; Balbinot, E.; ...
2016-05-26
The Large Magellanic Cloud (LMC) harbors a rich and diverse system of star clusters, whose ages, chemical abundances, and positions provide information about the LMC history of star formation. We use Science Verification imaging data from the Dark Energy Survey to increase the census of known star clusters in the outer LMC and to derive physical parameters for a large sample of such objects using a spatially and photometrically homogeneous data set. Our sample contains 255 visually identified cluster candidates, of which 109 were not listed in any previous catalog. We quantify the crowding effect for the stellar sample producedmore » by the DES Data Management pipeline and conclude that the stellar completeness is < 10% inside typical LMC cluster cores. We therefore develop a pipeline to sample and measure stellar magnitudes and positions around the cluster candidates using DAOPHOT. We also implement a maximum-likelihood method to fit individual density profiles and colour-magnitude diagrams. For 117 (from a total of 255) of the cluster candidates (28 uncatalogued clusters), we obtain reliable ages, metallicities, distance moduli and structural parameters, confirming their nature as physical systems. The distribution of cluster metallicities shows a radial dependence, with no clusters more metal-rich than [Fe/H] ~ -0.7 beyond 8 kpc from the LMC center. Furthermore, the age distribution has two peaks at ≃ 1.2 Gyr and ≃ 2.7 Gyr.« less
Physical properties of star clusters in the outer LMC as observed by the DES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pieres, A.; Santiago, B.; Balbinot, E.
The Large Magellanic Cloud (LMC) harbors a rich and diverse system of star clusters, whose ages, chemical abundances, and positions provide information about the LMC history of star formation. We use Science Verification imaging data from the Dark Energy Survey to increase the census of known star clusters in the outer LMC and to derive physical parameters for a large sample of such objects using a spatially and photometrically homogeneous data set. Our sample contains 255 visually identified cluster candidates, of which 109 were not listed in any previous catalog. We quantify the crowding effect for the stellar sample producedmore » by the DES Data Management pipeline and conclude that the stellar completeness is < 10% inside typical LMC cluster cores. We therefore develop a pipeline to sample and measure stellar magnitudes and positions around the cluster candidates using DAOPHOT. We also implement a maximum-likelihood method to fit individual density profiles and colour-magnitude diagrams. For 117 (from a total of 255) of the cluster candidates (28 uncatalogued clusters), we obtain reliable ages, metallicities, distance moduli and structural parameters, confirming their nature as physical systems. The distribution of cluster metallicities shows a radial dependence, with no clusters more metal-rich than [Fe/H] ~ -0.7 beyond 8 kpc from the LMC center. Furthermore, the age distribution has two peaks at ≃ 1.2 Gyr and ≃ 2.7 Gyr.« less
Novikov, Alexey; Caroff, Martine; Della-Negra, Serge; Depauw, Joël; Fallavier, Mireille; Le Beyec, Yvon; Pautrat, Michèle; Schultz, J Albert; Tempez, Agnès; Woods, Amina S
2005-01-01
A Au-Si liquid metal ion source which produces Au(n) clusters over a large range of sizes was used to study the dependence of both the molecular ion desorption yield and the damage cross-section on the size (n = 1 to 400) and on the kinetic energy (E = 10 to 500 keV) of the clusters used to bombard bioorganic surfaces. Three pure peptides with molecular masses between 750 and 1200 Da were used without matrix. [M+H](+) and [M+cation](+) ion emission yields were enhanced by as much as three orders of magnitude when bombarding with Au(400) (4+) instead of monatomic Au(+), yet very little damage was induced in the samples. A 100-fold increase in the molecular ion yield was observed when the incident energy of Au(9) (+) was varied from 10 to 180 keV. Values of emission yields and damage cross-sections are presented as a function of cluster size and energy. The possibility to adjust both cluster size and energy, depending on the application, makes the analysis of biomolecules by secondary ion mass spectrometry an extremely powerful and flexible technique, particularly when combined with orthogonal time-of-flight mass spectrometry that then allows fast measurements using small primary ion beam currents. Copyright (c) 2005 John Wiley & Sons, Ltd.
Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks
Mall, Raghvendra; Langone, Rocco; Suykens, Johan A. K.
2014-01-01
Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks. PMID:24949877
X-ray morphological study of galaxy cluster catalogues
NASA Astrophysics Data System (ADS)
Democles, Jessica; Pierre, Marguerite; Arnaud, Monique
2016-07-01
Context : The intra-cluster medium distribution as probed by X-ray morphology based analysis gives good indication of the system dynamical state. In the race for the determination of precise scaling relations and understanding their scatter, the dynamical state offers valuable information. Method : We develop the analysis of the centroid-shift so that it can be applied to characterize galaxy cluster surveys such as the XXL survey or high redshift cluster samples. We use it together with the surface brightness concentration parameter and the offset between X-ray peak and brightest cluster galaxy in the context of the XXL bright cluster sample (Pacaud et al 2015) and a set of high redshift massive clusters detected by Planck and SPT and observed by both XMM-Newton and Chandra observatories. Results : Using the wide redshift coverage of the XXL sample, we see no trend between the dynamical state of the systems with the redshift.
Mining the National Career Assessment Examination Result Using Clustering Algorithm
NASA Astrophysics Data System (ADS)
Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.
2018-03-01
Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
Planck/SDSS cluster mass and gas scaling relations for a volume-complete redMaPPer sample
NASA Astrophysics Data System (ADS)
Jimeno, Pablo; Diego, Jose M.; Broadhurst, Tom; De Martino, I.; Lazkoz, Ruth
2018-07-01
Using Planck satellite data, we construct Sunyaev-Zel'dovich (SZ) gas pressure profiles for a large, volume-complete sample of optically selected clusters. We have defined a sample of over 8000 redMaPPer clusters from the Sloan Digital Sky Survey, within the volume-complete redshift region 0.100
Toward An Understanding of Cluster Evolution: A Deep X-Ray Selected Cluster Catalog from ROSAT
NASA Technical Reports Server (NTRS)
Jones, Christine; Oliversen, Ronald (Technical Monitor)
2002-01-01
In the past year, we have focussed on studying individual clusters found in this sample with Chandra, as well as using Chandra to measure the luminosity-temperature relation for a sample of distant clusters identified through the ROSAT study, and finally we are continuing our study of fossil groups. For the luminosity-temperature study, we compared a sample of nearby clusters with a sample of distant clusters and, for the first time, measured a significant change in the relation as a function of redshift (Vikhlinin et al. in final preparation for submission to Cape). We also used our ROSAT analysis to select and propose for Chandra observations of individual clusters. We are now analyzing the Chandra observations of the distant cluster A520, which appears to have undergone a recent merger. Finally, we have completed the analysis of the fossil groups identified in ROM observations. In the past few months, we have derived X-ray fluxes and luminosities as well as X-ray extents for an initial sample of 89 objects. Based on the X-ray extents and the lack of bright galaxies, we have identified 16 fossil groups. We are comparing their X-ray and optical properties with those of optically rich groups. A paper is being readied for submission (Jones, Forman, and Vikhlinin in preparation).
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia
NASA Astrophysics Data System (ADS)
Novak, Vibor; Renema, Willem
2018-01-01
To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
Rosa, Marta; Micciarelli, Marco; Laio, Alessandro; Baroni, Stefano
2016-09-13
We introduce a method to evaluate the relative populations of different conformers of molecular species in solution, aiming at quantum mechanical accuracy, while keeping the computational cost at a nearly molecular-mechanics level. This goal is achieved by combining long classical molecular-dynamics simulations to sample the free-energy landscape of the system, advanced clustering techniques to identify the most relevant conformers, and thermodynamic perturbation theory to correct the resulting populations, using quantum-mechanical energies from density functional theory. A quantitative criterion for assessing the accuracy thus achieved is proposed. The resulting methodology is demonstrated in the specific case of cyanin (cyanidin-3-glucoside) in water solution.
Quantitative analysis of nano-pore geomaterials and representative sampling for digital rock physics
NASA Astrophysics Data System (ADS)
Yoon, H.; Dewers, T. A.
2014-12-01
Geomaterials containing nano-pores (e.g., shales and carbonate rocks) have become increasingly important for emerging problems such as unconventional gas and oil resources, enhanced oil recovery, and geologic storage of CO2. Accurate prediction of coupled geophysical and chemical processes at the pore scale requires realistic representation of pore structure and topology. This is especially true for chalk materials, where pore networks are small and complex, and require characterization at sub-micron scale. In this work, we apply laser scanning confocal microscopy to characterize pore structures and microlithofacies at micron- and greater scales and dual focused ion beam-scanning electron microscopy (FIB-SEM) for 3D imaging of nanometer-to-micron scale microcracks and pore distributions. With imaging techniques advanced for nano-pore characterization, a problem of scale with FIB-SEM images is how to take nanometer scale information and apply it to the thin-section or larger scale. In this work, several texture characterization techniques including graph-based spectral segmentation, support vector machine, and principal component analysis are applied for segmentation clusters represented by 1-2 FIB-SEM samples per each cluster. Geometric and topological properties are analyzed and lattice-Boltzmann method (LBM) is used to obtain permeability at several different scales. Upscaling of permeability to the Darcy scale (e.g., the thin-section scale) with image dataset will be discussed with emphasis on understanding microfracture-matrix interaction, representative volume for FIB-SEM sampling, and multiphase flow and reactive transport. Funding from the DOE Basic Energy Sciences Geosciences Program is gratefully acknowledged. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Within-Cluster and Across-Cluster Matching with Observational Multilevel Data
ERIC Educational Resources Information Center
Kim, Jee-Seon; Steiner, Peter M.; Hall, Courtney; Thoemmes, Felix
2013-01-01
When randomized experiments cannot be conducted in practice, propensity score (PS) techniques for matching treated and control units are frequently used for estimating causal treatment effects from observational data. Despite the popularity of PS techniques, they are not yet well studied for matching multilevel data where selection into treatment…
NASA Astrophysics Data System (ADS)
Burns, Jack
Galaxy clusters are assembled through large and small mergers which are the most energetic events ( bangs ) since the Big Bang. Cluster mergers stir the ICM creating shocks and turbulence which are illuminated by Mpc-sized radio features called relics and halos. These shocks heat the ICM and are detected in x-rays via thermal emission. Disturbed morphologies in x-ray surface brightness and temperatures are direct evidence for cluster mergers. In the radio, relics (in the outskirts of the clusters) and halos (located near the cluster core) are clear signposts of recent mergers. Our recent cosmological simulations suggest that around a merger event, radio emission peaks very sharply (and briefly) while the x-ray emission rises and decays slowly. Hence, a sample of galaxy clusters that shows both luminous x-ray and radio relics/halos are clear candidates for very recent mergers. We propose to analyze a unique sample of 48 galaxy clusters with (i) known radio relics and/or halos and (ii) significant archival x-ray observations (e 50 ksec) from Chandra and/or XMM. We will use a new x-ray data analysis pipeline, implemented on a parallelprocessor supercomputer, to create x-ray surface brightness, high fidelity temperature, and pressure maps of these clusters in order to study merging activity. In addition, we will use a control sample of clusters from the HIFLUGCS catalog which do not show radio relics/halos or any significant x-ray surface brightness substructure, thus devoid of recent mergers. The temperature maps will be made using 3 different map-making techniques: Weighted Voronoi Tessellation, Adaptive Circular Binning, and Contour Binning. We also plan to use archival Suzaku data for 22 clusters in our sample and study the x-ray temperatures at the outskirts of the clusters. All 48 clusters have archival radio data at d1.4 GHz which will be re-analyzed using advanced algorithms in NRAO s CASA software. We also have new radio data on a subset of these clusters and have proposed to observe more of them with the increased sensitivity of the JVLA and GMRT at 0.25-1.4 GHz. Using the systematically analyzed x-ray and radio data, we propose to pursue the detailed link between cluster mergers and the formation of radio relics/halos. (a) How do radio relics form? Radio relics are believed to be created via re-acceleration of cosmic ray electrons through diffusive shock acceleration, a 1st order Fermi mechanism. Hence, there should be a correlation between shocks detected in the x-ray and radio. We plan to use our newly developed 2-D shock-finder using jumps within xray temperature maps, and complement the results with radio Mach numbers derived from radio spectral indices. Shocks detected in our simulations using a 3-D shock-finder will be used to understand the effects of projections in observations. (b) How do radio halos form? It is not clear if the formation of radio halos is due to turbulent acceleration (2nd order Fermi process) or due to more efficient 1st order Fermi mechanism via distributed small-scale shocks. Since radio halos reside in merging clusters, the x-ray temperature structure should show the un-relaxed nature of the cluster. We will study this through temperature asymmetry and power ratios (between two multipoles). We also propose to use pressure maps to derive a 2-D power spectrum of pressure fluctuations and deduce the turbulent velocity field. We will then derive the associated radio power and spectral indices to compare with the radio observations. We will test our results using clusters with and without radio halos. We will make these high fidelity temperature, surface brightness, pressure and entropy maps available to the astronomical community via the National Virtual Observatory. We will also make our x-ray temperature map-making scripts implemented on parallel supercomputers available for community use.
NASA Technical Reports Server (NTRS)
Carvalho, L. M. V.; Rickenbach, T.
1999-01-01
Satellite infrared (IR) and visible (VIS) images from the Tropical Ocean Global Atmosphere - Coupled Ocean Atmosphere Response Experiment (TOGA-COARE) experiment are investigated through the use of Clustering Analysis. The clusters are obtained from the values of IR and VIS counts and the local variance for both channels. The clustering procedure is based on the standardized histogram of each variable obtained from 179 pairs of images. A new approach to classify high clouds using only IR and the clustering technique is proposed. This method allows the separation of the enhanced convection in two main classes: convective tops, more closely related to the most active core of the storm, and convective systems, which produce regions of merged, thick anvil clouds. The resulting classification of different portions of cloudiness is compared to the radar reflectivity field for intensive events. Convective Systems and Convective Tops are followed during their life cycle using the IR clustering method. The areal coverage of precipitation and features related to convective and stratiform rain is obtained from the radar for each stage of the evolving Mesoscale Convective Systems (MCS). In order to compare the IR clustering method with a simple threshold technique, two IR thresholds (Tir) were used to identify different portions of cloudiness, Tir=240K which roughly defines the extent of all cloudiness associated with the MCS, and Tir=220K which indicates the presence of deep convection. It is shown that the IR clustering technique can be used as a simple alternative to identify the actual portion of convective and stratiform rainfall.
NASA Astrophysics Data System (ADS)
Deshpande, Amruta J.; Hughes, John P.; Wittman, David
2017-04-01
We continue the study of the first sample of shear-selected clusters from the initial 8.6 square degrees of the Deep Lens Survey (DLS); a sample with well-defined selection criteria corresponding to the highest ranked shear peaks in the survey area. We aim to characterize the weak lensing selection by examining the sample’s X-ray properties. There are multiple X-ray clusters associated with nearly all the shear peaks: 14 X-ray clusters corresponding to seven DLS shear peaks. An additional three X-ray clusters cannot be definitively associated with shear peaks, mainly due to large positional offsets between the X-ray centroid and the shear peak. Here we report on the XMM-Newton properties of the 17 X-ray clusters. The X-ray clusters display a wide range of luminosities and temperatures; the L X -T X relation we determine for the shear-associated X-ray clusters is consistent with X-ray cluster samples selected without regard to dynamical state, while it is inconsistent with self-similarity. For a subset of the sample, we measure X-ray masses using temperature as a proxy, and compare to weak lensing masses determined by the DLS team. The resulting mass comparison is consistent with equality. The X-ray and weak lensing masses show considerable intrinsic scatter (˜48%), which is consistent with X-ray selected samples when their X-ray and weak lensing masses are independently determined. Some of the data presented herein were obtained at the W.M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California, and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W. M. Keck Foundation.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Infrared Multiple Photon Dissociation Spectroscopy Of Metal Cluster-Adducts
NASA Astrophysics Data System (ADS)
Cox, D. M.; Kaldor, A.; Zakin, M. R.
1987-01-01
Recent development of the laser vaporization technique combined with mass-selective detection has made possible new studies of the fundamental chemical and physical properties of unsupported transition metal clusters as a function of the number of constituent atoms. A variety of experimental techniques have been developed in our laboratory to measure ionization threshold energies, magnetic moments, and gas phase reactivity of clusters. However, studies have so far been unable to determine the cluster structure or the chemical state of chemisorbed species on gas phase clusters. The application of infrared multiple photon dissociation IRMPD to obtain the IR absorption properties of metal cluster-adsorbate species in a molecular beam is described here. Specifically using a high power, pulsed CO2 laser as the infrared source, the IRMPD spectrum for methanol chemisorbed on small iron clusters is measured as a function of the number of both iron atoms and methanols in the complex for different methanol isotopes. Both the feasibility and potential utility of IRMPD for characterizing metal cluster-adsorbate interactions are demonstrated. The method is generally applicable to any cluster or cluster-adsorbate system dependent only upon the availability of appropriate high power infrared sources.
Mao, J.; Fang, X.; Lan, Y.; Schimmelmann, A.; Mastalerz, Maria; Xu, L.; Schmidt-Rohr, K.
2010-01-01
We have used advanced and quantitative solid-state nuclear magnetic resonance (NMR) techniques to investigate structural changes in a series of type II kerogen samples from the New Albany Shale across a range of maturity (vitrinite reflectance R0 from 0.29% to 1.27%). Specific functional groups such as CH3, CH2, alkyl CH, aromatic CH, aromatic C-O, and other nonprotonated aromatics, as well as "oil prone" and "gas prone" carbons, have been quantified by 13C NMR; atomic H/C and O/C ratios calculated from the NMR data agree with elemental analysis. Relationships between NMR structural parameters and vitrinite reflectance, a proxy for thermal maturity, were evaluated. The aromatic cluster size is probed in terms of the fraction of aromatic carbons that are protonated (???30%) and the average distance of aromatic C from the nearest protons in long-range H-C dephasing, both of which do not increase much with maturation, in spite of a great increase in aromaticity. The aromatic clusters in the most mature sample consist of ???30 carbons, and of ???20 carbons in the least mature samples. Proof of many links between alkyl chains and aromatic rings is provided by short-range and long-range 1H-13C correlation NMR. The alkyl segments provide most H in the samples; even at a carbon aromaticity of 83%, the fraction of aromatic H is only 38%. While aromaticity increases with thermal maturity, most other NMR structural parameters, including the aromatic C-O fractions, decrease. Aromaticity is confirmed as an excellent NMR structural parameter for assessing thermal maturity. In this series of samples, thermal maturation mostly increases aromaticity by reducing the length of the alkyl chains attached to the aromatic cores, not by pronounced growth of the size of the fused aromatic ring clusters. ?? 2010 Elsevier Ltd. All rights reserved.
2012-01-01
Background Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. Methods We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. Results VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Conclusions Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes. PMID:23057445
Clustering behavior in microbial communities from acute endodontic infections.
Montagner, Francisco; Jacinto, Rogério C; Signoretti, Fernanda G C; Sanches, Paula F; Gomes, Brenda P F A
2012-02-01
Acute endodontic infections harbor heterogeneous microbial communities in both the root canal (RC) system and apical tissues. Data comparing the microbial structure and diversity in endodontic infections in related ecosystems, such as RC with necrotic pulp and acute apical abscess (AAA), are scarce in the literature. The aim of this study was to examine the presence of selected endodontic pathogens in paired samples from necrotic RC and AAA using polymerase chain reaction (PCR) followed by the construction of cluster profiles. Paired samples of RC and AAA exudates were collected from 20 subjects and analyzed by PCR for the presence of selected strict and facultative anaerobic strains. The frequency of species was compared between the RC and the AAA samples. A stringent neighboring clustering algorithm was applied to investigate the existence of similar high-order groups of samples. A dendrogram was constructed to show the arrangement of the sample groups produced by the hierarchical clustering. All samples harbored bacterial DNA. Porphyromonas endodontalis, Prevotella nigrescens, Filifactor alocis, and Tannerela forsythia were frequently detected in both RC and AAA samples. The selected anaerobic species were distributed in diverse small bacteria consortia. The samples of RC and AAA that presented at least one of the targeted microorganisms were grouped in small clusters. Anaerobic species were frequently detected in acute endodontic infections and heterogeneous microbial communities with low clustering behavior were observed in paired samples of RC and AAA. Copyright © 2012. Published by Elsevier Inc.
Radcliffe, Jon N; Comfort, Paul; Fawcett, Tom
2015-09-01
This study provided the basis by which professional development needs can be addressed and add to the applied sport psychology literature from an underresearched sport domain. This study endeavored to use qualitative methods to explore the specific techniques applied by the strength and conditioning professional. Eighteen participants were recruited for interview, through convenience sampling, drawn from a previously obtained sample. Included in the study were 10 participants working within the United Kingdom, 3 within the United States, and 5 within Australia offering a cross section of experience from ranging sport disciplines and educational backgrounds. Participants were interviewed using semistructured interviews. Thematic clustering was used by interpretative phonological analysis to identify common themes. The practitioners referred to a wealth of psychological skills and strategies that are used within strength and conditioning. Through thematic clustering, it was evident that a significant emphasis is on the development or maintenance of athlete self-confidence specifically with a large focus on goal setting. Similarly, albeit to a lesser extent, there was a notable attention on skill acquisition and arousal management strategies. The strategies used by the practitioners consisted of a combination of cognitive strategies and behavioral strategies. It is important to highlight the main psychological strategies that are suggested by strength and conditioning coaches themselves to guide professional development toward specific areas. Such development should strive to develop coaches' awareness of strategies to develop confidence, regulate arousal, and facilitate skill and technique development.
Ståhlberg, Anders; Elbing, Karin; Andrade-Garda, José Manuel; Sjögreen, Björn; Forootan, Amin; Kubista, Mikael
2008-04-16
The large sensitivity, high reproducibility and essentially unlimited dynamic range of real-time PCR to measure gene expression in complex samples provides the opportunity for powerful multivariate and multiway studies of biological phenomena. In multiway studies samples are characterized by their expression profiles to monitor changes over time, effect of treatment, drug dosage etc. Here we perform a multiway study of the temporal response of four yeast Saccharomyces cerevisiae strains with different glucose uptake rates upon altered metabolic conditions. We measured the expression of 18 genes as function of time after addition of glucose to four strains of yeast grown in ethanol. The data are analyzed by matrix-augmented PCA, which is a generalization of PCA for 3-way data, and the results are confirmed by hierarchical clustering and clustering by Kohonen self-organizing map. Our approach identifies gene groups that respond similarly to the change of nutrient, and genes that behave differently in mutant strains. Of particular interest is our finding that ADH4 and ADH6 show a behavior typical of glucose-induced genes, while ADH3 and ADH5 are repressed after glucose addition. Multiway real-time PCR gene expression profiling is a powerful technique which can be utilized to characterize functions of new genes by, for example, comparing their temporal response after perturbation in different genetic variants of the studied subject. The technique also identifies genes that show perturbed expression in specific strains.
Ståhlberg, Anders; Elbing, Karin; Andrade-Garda, José Manuel; Sjögreen, Björn; Forootan, Amin; Kubista, Mikael
2008-01-01
Background The large sensitivity, high reproducibility and essentially unlimited dynamic range of real-time PCR to measure gene expression in complex samples provides the opportunity for powerful multivariate and multiway studies of biological phenomena. In multiway studies samples are characterized by their expression profiles to monitor changes over time, effect of treatment, drug dosage etc. Here we perform a multiway study of the temporal response of four yeast Saccharomyces cerevisiae strains with different glucose uptake rates upon altered metabolic conditions. Results We measured the expression of 18 genes as function of time after addition of glucose to four strains of yeast grown in ethanol. The data are analyzed by matrix-augmented PCA, which is a generalization of PCA for 3-way data, and the results are confirmed by hierarchical clustering and clustering by Kohonen self-organizing map. Our approach identifies gene groups that respond similarly to the change of nutrient, and genes that behave differently in mutant strains. Of particular interest is our finding that ADH4 and ADH6 show a behavior typical of glucose-induced genes, while ADH3 and ADH5 are repressed after glucose addition. Conclusion Multiway real-time PCR gene expression profiling is a powerful technique which can be utilized to characterize functions of new genes by, for example, comparing their temporal response after perturbation in different genetic variants of the studied subject. The technique also identifies genes that show perturbed expression in specific strains. PMID:18412983
NASA Astrophysics Data System (ADS)
Barbera, Agustin; Zamora, Martin; Domenech, Marisa; Vega-Becerra, Andres; Castro-Franco, Mauricio
2017-04-01
The cultivation of transgenic glyphosate-resistant crops has been the most rapidly adopted crop technology in Argentina since 1997. Thus, more than 180 million liters of the broad-spectrum herbicide glyphosate (N - phosphonomethylglicine) are applied every year. The intensive use of glyphosate combined with geomorphometrical characteristics of the Pampa region is a matter of environmental concern. An integral component of assessing the risk of soil contamination in farm fields is to describe the spatial distribution of the levels of contaminant agent. Application of pedometric techniques for this purpose has been scarcely demonstrated. These techniques could provide an estimate of the concentration at a given unsampled location, as well as the probability that concentration will exceed the critical threshold concentration. In this work, a pedometric technique for assessing the spatial distribution of glyphosate in farm fields was developed. A field located at INTA Barrow, Argentina (Lat: -38.322844, Lon: -60.25572) which has a great soil spatial variability, was divided by soil-specific zones using a pedometric technique. This was developed integrating INTA Soil Survey information and a digital elevation model (DEM) obtained from a DGPS. Firstly, 10 topographic indices derived from a DEM were computed in a Random Forest algorithm to obtain a classification model for soil map units (SMU). Secondly, a classification model was applied to those topographic indices but at a scale higher than 1:1000. Finally, a spatial principal component analysis and a clustering using Fuzzy K-means were used into each SMU. From this clustering, three soil-specific zones were determined which were also validated through apparent electrical conductivity (CEa) measurements. Three soil sample points were determined by zone. In each one, samples from 0-10, 10-20 and 20-40cm depth were taken. Glyphosate content and AMPA in each soil sample were analyzed using de UPLC-MS/MS ESI (+/-). Only AMPA at 10-20 cm depth had significant difference among soil-specific zones. However, marked trends for glyphosate content and AMPA were clearly shown among zones. These results suggest that (i) the presence of glyphosate and AMPA has spatial patterns distribution related to soil properties at field scale; and (ii) the proposed technique allowed to determine soil-specific zones related to the spatial distribution of glyphosate and AMPA fast, cost-effective and accurately. In further works, we would suggest adding new soil information sources to improve soil-specific zone delimitation.
Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis
Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.
2012-01-01
Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360
The Measurement of Sulfur Oxidation Products and Their Role in Homogeneous Nucleation
NASA Technical Reports Server (NTRS)
Eisele, F. L.
1999-01-01
An improved version of a transverse ion source was developed which uses selected ion chemical ionization mass spectrometry techniques inside of a particle nucleation flow tube. These new techniques are very unique, in that the chemical ionization is done inside of the flow tube rather than by having to remove the compounds and clusters of interest which are lost on first contact,with any surfaces. The transverse source is also unique because it allows the ion reaction time to be varied over more than an order of magnitude, which in turn makes possible the separation of ion induced cluster growth from the charging of preexisting molecular clusters. As a result of combining these unique capabilities, the first ever measurements of prenucleation molecular clusters were performed. These clusters are the intermediate stage of growth in the gas-to-particle conversion process. This new technique provides a means of observing clusters containing 2, 3, 4, ... and up to about 8 sulfuric acid molecules, where the critical cluster size under these measurement conditions was about 4 or 5. Thus, the nucleation process can now be directly observed and even growth beyond the critical cluster size can be investigated. The details of this investigation are discussed in a recently submitted paper, which is included as Appendix A. Measurements of the diffusion coefficient of sulfuric acid and sulfuric acid clustered with a water molecule have also been performed. The measurements are also discussed in more detail in another recently submitted paper which is included as Appendix B. The empirical results discussed in both of these papers provide a critical test of present nucleation theories. They also provide new hope for resolving many of the huge discrepancies between field observation and model prediction of particle nucleation. The second part of the research conducted under this project was directed towards the development of new chemical ionization techniques for measuring sulfur oxidation products.
Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue
2016-01-01
The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines.
Liu, Wei; Wang, Dongmei; Liu, Jianjun; Li, Dengwu; Yin, Dongxue
2016-01-01
The present study was performed to assess the quality of Potentilla fruticosa L. sampled from distinct regions of China using high performance liquid chromatography (HPLC) fingerprinting coupled with a suite of chemometric methods. For this quantitative analysis, the main active phytochemical compositions and the antioxidant activity in P. fruticosa were also investigated. Considering the high percentages and antioxidant activities of phytochemicals, P. fruticosa samples from Kangding, Sichuan were selected as the most valuable raw materials. Similarity analysis (SA) of HPLC fingerprints, hierarchical cluster analysis (HCA), principle component analysis (PCA), and discriminant analysis (DA) were further employed to provide accurate classification and quality estimates of P. fruticosa. Two principal components (PCs) were collected by PCA. PC1 separated samples from Kangding, Sichuan, capturing 57.64% of the variance, whereas PC2 contributed to further separation, capturing 18.97% of the variance. Two kinds of discriminant functions with a 100% discrimination ratio were constructed. The results strongly supported the conclusion that the eight samples from different regions were clustered into three major groups, corresponding with their morphological classification, for which HPLC analysis confirmed the considerable variation in phytochemical compositions and that P. fruticosa samples from Kangding, Sichuan were of high quality. The results of SA, HCA, PCA, and DA were in agreement and performed well for the quality assessment of P. fruticosa. Consequently, HPLC fingerprinting coupled with chemometric techniques provides a highly flexible and reliable method for the quality evaluation of traditional Chinese medicines. PMID:26890416
Tempia, S; Salman, M D; Keefe, T; Morley, P; Freier, J E; DeMartini, J C; Wamwayi, H M; Njeumi, F; Soumaré, B; Abdi, A M
2010-12-01
A cross-sectional sero-survey, using a two-stage cluster sampling design, was conducted between 2002 and 2003 in ten administrative regions of central and southern Somalia, to estimate the seroprevalence and geographic distribution of rinderpest (RP) in the study area, as well as to identify potential risk factors for the observed seroprevalence distribution. The study was also used to test the feasibility of the spatially integrated investigation technique in nomadic and semi-nomadic pastoral systems. In the absence of a systematic list of livestock holdings, the primary sampling units were selected by generating random map coordinates. A total of 9,216 serum samples were collected from cattle aged 12 to 36 months at 562 sampling sites. Two apparent clusters of RP seroprevalence were detected. Four potential risk factors associated with the observed seroprevalence were identified: the mobility of cattle herds, the cattle population density, the proximity of cattle herds to cattle trade routes and cattle herd size. Risk maps were then generated to assist in designing more targeted surveillance strategies. The observed seroprevalence in these areas declined over time. In subsequent years, similar seroprevalence studies in neighbouring areas of Kenya and Ethiopia also showed a very low seroprevalence of RP or the absence of antibodies against RP. The progressive decline in RP antibody prevalence is consistent with virus extinction. Verification of freedom from RP infection in the Somali ecosystem is currently in progress.
Rutterford, Clare; Taljaard, Monica; Dixon, Stephanie; Copas, Andrew; Eldridge, Sandra
2015-06-01
To assess the quality of reporting and accuracy of a priori estimates used in sample size calculations for cluster randomized trials (CRTs). We reviewed 300 CRTs published between 2000 and 2008. The prevalence of reporting sample size elements from the 2004 CONSORT recommendations was evaluated and a priori estimates compared with those observed in the trial. Of the 300 trials, 166 (55%) reported a sample size calculation. Only 36 of 166 (22%) reported all recommended descriptive elements. Elements specific to CRTs were the worst reported: a measure of within-cluster correlation was specified in only 58 of 166 (35%). Only 18 of 166 articles (11%) reported both a priori and observed within-cluster correlation values. Except in two cases, observed within-cluster correlation values were either close to or less than a priori values. Even with the CONSORT extension for cluster randomization, the reporting of sample size elements specific to these trials remains below that necessary for transparent reporting. Journal editors and peer reviewers should implement stricter requirements for authors to follow CONSORT recommendations. Authors should report observed and a priori within-cluster correlation values to enable comparisons between these over a wider range of trials. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Testing prediction methods: Earthquake clustering versus the Poisson model
Michael, A.J.
1997-01-01
Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.
Dynamic multifactor clustering of financial networks
NASA Astrophysics Data System (ADS)
Ross, Gordon J.
2014-02-01
We investigate the tendency for financial instruments to form clusters when there are multiple factors influencing the correlation structure. Specifically, we consider a stock portfolio which contains companies from different industrial sectors, located in several different countries. Both sector membership and geography combine to create a complex clustering structure where companies seem to first be divided based on sector, with geographical subclusters emerging within each industrial sector. We argue that standard techniques for detecting overlapping clusters and communities are not able to capture this type of structure and show how robust regression techniques can instead be used to remove the influence of both sector and geography from the correlation matrix separately. Our analysis reveals that prior to the 2008 financial crisis, companies did not tend to form clusters based on geography. This changed immediately following the crisis, with geography becoming a more important determinant of clustering structure.
NASA Astrophysics Data System (ADS)
Sellaiyan, S.; Uedono, A.; Sivaji, K.; Janet Priscilla, S.; Sivasankari, J.; Selvalakshmi, T.
2016-10-01
Pure and alkali metal ion (Li, Na, and K)-doped MgO nanocrystallites synthesized by solution combustion technique have been studied by positron lifetime and Doppler broadening spectroscopy methods. Positron lifetime analysis exhibits four characteristic lifetime components for all the samples. Doping reduces the Mg vacancy after annealing to 800 °C. It was observed that Li ion migrates to the vacancy site to recover Mg vacancy-type defects, reducing cluster vacancies and micropores. For Na- and K-doped MgO, the aforementioned defects are reduced and immobile at 800 °C. Coincidence Doppler broadening studies show the positron trapping sites as vacancy clusters. The decrease in the S parameter is due to the particle growth and reduction in the defect concentration at 800 °C. Photoluminescence study shows an emission peak at 445 nm and 498 nm, associated with F2 2+ and recombination of higher-order vacancy complexes. Further, annealing process is likely to dissociate F2 2+ to F+ and this F+ is converted into F centers at 416 nm.
Galaxy clusters, type Ia supernovae and the fine structure constant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holanda, R.F.L.; Busti, V.C.; Colaço, L.R.
2016-08-01
As is well known, measurements of the Sunyaev-Zeldovich effect can be combined with observations of the X-ray surface brightness of galaxy clusters to estimate the angular diameter distance to these structures. In this paper, we show that this technique depends on the fine structure constant, α. Therefore, if α is a time-dependent quantity, e.g., α = α{sub 0}φ( z ), where φ is a function of redshift, we argue that current data do not provide the real angular diameter distance, D {sub A}( z ), to the cluster, but instead D {sub A}{sup data}( z ) = φ( z ){supmore » 2} D {sub A}( z ). We use this result to derive constraints on a possible variation of α for a class of dilaton runaway models considering a sample of 25 measurements of D {sub A}{sup data}( z ) in redshift range 0.023 < z < 0.784 and estimates of D {sub A}( z ) from current type Ia supernovae observations. We find no significant indication of variation of α with the present data.« less
Clustering Categorical Data Using Community Detection Techniques
2017-01-01
With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases. PMID:29430249
Advanced electronics for the CTF MEG system.
McCubbin, J; Vrba, J; Spear, P; McKenzie, D; Willis, R; Loewen, R; Robinson, S E; Fife, A A
2004-11-30
Development of the CTF MEG system has been advanced with the introduction of a computer processing cluster between the data acquisition electronics and the host computer. The advent of fast processors, memory, and network interfaces has made this innovation feasible for large data streams at high sampling rates. We have implemented tasks including anti-alias filter, sample rate decimation, higher gradient balancing, crosstalk correction, and optional filters with a cluster consisting of 4 dual Intel Xeon processors operating on up to 275 channel MEG systems at 12 kHz sample rate. The architecture is expandable with additional processors to implement advanced processing tasks which may include e.g., continuous head localization/motion correction, optional display filters, coherence calculations, or real time synthetic channels (via beamformer). We also describe an electronics configuration upgrade to provide operator console access to the peripheral interface features such as analog signal and trigger I/O. This allows remote location of the acoustically noisy electronics cabinet and fitting of the cabinet with doors for improved EMI shielding. Finally, we present the latest performance results available for the CTF 275 channel MEG system including an unshielded SEF (median nerve electrical stimulation) measurement enhanced by application of an adaptive beamformer technique (SAM) which allows recognition of the nominal 20-ms response in the unaveraged signal.
NASA Astrophysics Data System (ADS)
Wallace, A. F.; DeYoreo, J.; Banfield, J. F.
2011-12-01
The carbonate mineral constituents of many biomineralized products, formed both in and ex vivo, grow by a multi-stage crystallization process that involves the nucleation and structural reorganization of transient amorphous phases. The existence of transient phases and cluster species has significant implications for carbonate nucleation and growth in natural and engineered environments, both modern and ancient. The structure of these intermediate phases remains elusive, as does the nature of the disorder to order transition, however, these process details may strongly influence the interpretation of elemental and isotopic climate proxy data obtained from authigenic and biogenic carbonates. While molecular simulations have been applied to certain aspects of crystal growth, studies of metal carbonate nucleation are strongly inhibited by the presence of kinetic traps that prevent adequate sampling of the potential landscape upon which the growing clusters reside within timescales accessible by simulation. This research addresses this challenge by marrying the recent Kawska-Zahn (KZ) approach to simulation of crystal nucleation and growth from solution with replica-exchange molecular dynamics (REMD) techniques. REMD has been used previously to enhance sampling of protein conformations that occupy energy wells that are separated by sizable thermodynamic and kinetic barriers, and is used here to probe the initial formation and onset of order within hydrated calcium and iron carbonate cluster species during nucleation. Results to date suggest that growing clusters initiate as short linear ion chains that evolve into two- and three-dimensional structures with continued growth. The planar structures exhibit an obvious 2d lattice, while establishment of a 3d lattice is hindered by incomplete ion desolvation. The formation of a dehydrated core consisting of a single carbonate ion is observed when the clusters are ~0.75 nm. At the same size a distorted, but discernible calcite-type lattice is also apparent. Continued growth results in expansion of the dehydrated core, however, complete desolvation and incorporation of cations into the growing carbonate phase is not achieved until the cluster grows to ~1.2 nm. Exploration of the system free energy along the crystallization path reveals "special" cluster sizes that correlate with ion desolvation milestones. The formation of these species comprise critical bottlenecks on the energy landscape and for the establishment of order within the growing clusters.
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods.
Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H
2010-07-01
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
NASA Technical Reports Server (NTRS)
Spruce, Joe
2001-01-01
Yellowstone National Park (YNP) contains a diversity of land cover. YNP managers need site-specific land cover maps, which may be produced more effectively using high-resolution hyperspectral imagery. ISODATA clustering techniques have aided operational multispectral image classification and may benefit certain hyperspectral data applications if optimally applied. In response, a study was performed for an area in northeast YNP using 11 select bands of low-altitude AVIRIS data calibrated to ground reflectance. These data were subjected to ISODATA clustering and Maximum Likelihood Classification techniques to produce a moderately detailed land cover map. The latter has good apparent overall agreement with field surveys and aerial photo interpretation.
X-ray emission from a complete sample of Abell clusters of galaxies
NASA Astrophysics Data System (ADS)
Briel, Ulrich G.; Henry, J. Patrick
1993-11-01
The ROSAT All-Sky Survey (RASS) is used to investigate the X-ray properties of a complete sample of Abell clusters with measured redshifts and accurate positions. The sample comprises the 145 clusters within a 561 square degree region at high galactic latitude. The mean redshift is 0.17. This sample is especially well suited to be studied within the RASS since the mean exposure time is higher than average and the mean galactic column density is very low. These together produce a flux limit of about 4.2 x 10-13 erg/sq cm/s in the 0.5 to 2.5 keV energy band. Sixty-six (46%) individual clusters are detected at a significance level higher than 99.7% of which 7 could be chance coincidences of background or foreground sources. At redshifts greater than 0.3 six clusters out of seven (86%) are detected at the same significance level. The detected objects show a clear X-ray luminosity -- galaxy count relation with a dispersion consistent with other external estimates of the error in the counts. By analyzing the excess of positive fluctuations of the X-ray flux at the cluster positions, compared with the fluctuations of randomly drawn background fields, it is possible to extend these results below the nominal flux limit. We find 80% of richness R greater than or = 0 and 86% of R greater than or = 1 clusters are X-ray emitters with fluxes above 1 x 10-13 erg/sq cm/s. Nearly 90% of the clusters meeting the requirements to be in Abell's statistical sample emit above the same level. We therefore conclude that almost all Abell clusters are real clusters and the Abell catalog is not strongly contaminated by projection effects. We use the Kaplan-Meier product limit estimator to calculate the cumulative X-ray luminosity function. We show that the shape of the luminosity functions are similiar for different richness classes, but the characteristic luminosities of richness 2 clusters are about twice those of richness 1 clusters which are in turn about twice those of richness 0 clusters. This result is another manifestation of the luminosity -- richness elation for Abell clusters.
Chapter 7. Cloning and analysis of natural product pathways.
Gust, Bertolt
2009-01-01
The identification of gene clusters of natural products has lead to an enormous wealth of information about their biosynthesis and its regulation, and about self-resistance mechanisms. Well-established routine techniques are now available for the cloning and sequencing of gene clusters. The subsequent functional analysis of the complex biosynthetic machinery requires efficient genetic tools for manipulation. Until recently, techniques for the introduction of defined changes into Streptomyces chromosomes were very time-consuming. In particular, manipulation of large DNA fragments has been challenging due to the absence of suitable restriction sites for restriction- and ligation-based techniques. The homologous recombination approach called recombineering (referred to as Red/ET-mediated recombination in this chapter) has greatly facilitated targeted genetic modifications of complex biosynthetic pathways from actinomycetes by eliminating many of the time-consuming and labor-intensive steps. This chapter describes techniques for the cloning and identification of biosynthetic gene clusters, for the generation of gene replacements within such clusters, for the construction of integrative library clones and their expression in heterologous hosts, and for the assembly of entire biosynthetic gene clusters from the inserts of individual library clones. A systematic approach toward insertional mutation of a complete Streptomyces genome is shown by the use of an in vitro transposon mutagenesis procedure.
Underdetermined blind separation of three-way fluorescence spectra of PAHs in water
NASA Astrophysics Data System (ADS)
Yang, Ruifang; Zhao, Nanjing; Xiao, Xue; Zhu, Wei; Chen, Yunan; Yin, Gaofang; Liu, Jianguo; Liu, Wenqing
2018-06-01
In this work, underdetermined blind decomposition method is developed to recognize individual components from the three-way fluorescent spectra of their mixtures by using sparse component analysis (SCA). The mixing matrix is estimated from the mixtures using fuzzy data clustering algorithm together with the scatters corresponding to local energy maximum value in the time-frequency domain, and the spectra of object components are recovered by pseudo inverse technique. As an example, using this method three and four pure components spectra can be blindly extracted from two samples of their mixture, with similarities between resolved and reference spectra all above 0.80. This work opens a new and effective path to realize monitoring PAHs in water by three-way fluorescence spectroscopy technique.
Oluwadare, Oluwatosin; Cheng, Jianlin
2017-11-14
With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .
The Use of Cluster Analysis in Typological Research on Community College Students
ERIC Educational Resources Information Center
Bahr, Peter Riley; Bielby, Rob; House, Emily
2011-01-01
One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…
Gholami, Mohammad; Brennan, Robert W
2016-01-06
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.
Gholami, Mohammad; Brennan, Robert W.
2016-01-01
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency. PMID:26751447
A Study on Regional Frequency Analysis using Artificial Neural Network - the Sumjin River Basin
NASA Astrophysics Data System (ADS)
Jeong, C.; Ahn, J.; Ahn, H.; Heo, J. H.
2017-12-01
Regional frequency analysis means to make up for shortcomings in the at-site frequency analysis which is about a lack of sample size through the regional concept. Regional rainfall quantile depends on the identification of hydrologically homogeneous regions, hence the regional classification based on hydrological homogeneous assumption is very important. For regional clustering about rainfall, multidimensional variables and factors related geographical features and meteorological figure are considered such as mean annual precipitation, number of days with precipitation in a year and average maximum daily precipitation in a month. Self-Organizing Feature Map method which is one of the artificial neural network algorithm in the unsupervised learning techniques solves N-dimensional and nonlinear problems and be shown results simply as a data visualization technique. In this study, for the Sumjin river basin in South Korea, cluster analysis was performed based on SOM method using high-dimensional geographical features and meteorological factor as input data. then, for the results, in order to evaluate the homogeneity of regions, the L-moment based discordancy and heterogeneity measures were used. Rainfall quantiles were estimated as the index flood method which is one of regional rainfall frequency analysis. Clustering analysis using SOM method and the consequential variation in rainfall quantile were analyzed. This research was supported by a grant(2017-MPSS31-001) from Supporting Technology Development Program for Disaster Management funded by Ministry of Public Safety and Security(MPSS) of the Korean government.
NASA Astrophysics Data System (ADS)
Banerjee, P.; Szabo, T.; Pierpaoli, E.; Franco, G.; Ortiz, M.; Oramas, A.; Tornello, B.
2018-01-01
We present a new galaxy cluster catalog constructed from the Sloan Digital Sky Survey Data Release 9 (SDSS DR9) using an Adaptive Matched Filter (AMF) technique. Our catalog has 46,479 galaxy clusters with richness Λ200 > 20 in the redshift range 0.045 ≤ z < 0.641 in ∼11,500 deg2 of the sky. Angular position, richness, core and virial radii and redshift estimates for these clusters, as well as their error analysis, are provided as part of this catalog. In addition to the main version of the catalog, we also provide an extended version with a lower richness cut, containing 79,368 clusters. This version, in addition to the clusters in the main catalog, also contains those clusters (with richness 10 < Λ200 < 20) which have a one-to-one match in the DR8 catalog developed by Wen et al.(WHL). We obtain probabilities for cluster membership for each galaxy and implement several procedures for the identification and removal of false cluster detections. We cross-correlate the main AMF DR9 catalog with a number of cluster catalogs in different wavebands (Optical, X-ray). We compare our catalog with other SDSS-based ones such as the redMaPPer (26,350 clusters) and the Wen et al. (WHL) (132,684 clusters) in the same area of the sky and in the overlapping redshift range. We match 97% of the richest Abell clusters (Richness group 3), the same as WHL, while redMaPPer matches ∼ 90% of these clusters. Considering AMF DR9 richness bins, redMaPPer does not have one-to-one matches for 70% of our lowest richness clusters (20 < Λ200 < 40), while WHL matches 54% of these missed clusters (not present in redMaPPer). redMaPPer consistently does not possess one-to-one matches for ∼ 20% AMF DR9 clusters with Λ200 > 40, while WHL matches ≥ 70% of these missed clusters on average. For comparisons with X-ray clusters, we match the AMF catalog with BAX, MCXC and a combined catalog from NORAS and REFLEX. We consistently obtain a greater number of one-to-one matches for X-ray clusters across higher luminosity bins (Lx > 6 × 1044 ergs/sec) than redMaPPer while WHL matches the most clusters overall. For the most luminous clusters (Lx > 8), our catalog performs equivalently to WHL. This new catalog provides a wider sample than redMaPPer while retaining many fewer objects than WHL.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, Mirko; Puzia, Thomas H., E-mail: msimunov@astro.puc.cl, E-mail: tpuzia@astro.puc.cl
2014-02-10
We present the first dynamical study of blue straggler stars (BSSs) in three Galactic globular clusters, NGC 3201, NGC 5139 (ω Cen), and NGC 6218, based on medium-resolution spectroscopy (R ≈ 10, 000) obtained with the Inamori-Magellan Areal Camera and Spectrograph mounted at the 6.5 m Baade Magellan telescope. Our BSS candidate selection technique uses HST/ACS and ESO/WFI photometric data out to >4.5 r{sub c} . We use radial velocity measurements to discard non-members and achieve a success rate of ∼93%, which yields a sample of 116 confirmed BSSs. Using the penalized pixel-fitting method (pPXF), we measure the vsin (i)more » values of the sample BSSs and find their distribution functions peaked at slow velocities with a long tail toward fast velocities in each globular cluster. About 90% of the BSS population in NGC 3201 and NGC 6218 exhibits values in the range 10-50 km s{sup –1}, while about 80% of the BSSs in ω Cen show vsin (i) values between 20 and 70 km s{sup –1}. We find that the BSSs in NGC 3201 and NGC 6218 that show vsin (i) > 50 km s{sup –1} are all found in the central cluster regions, inside a projected 2r{sub c} , of their parent clusters. We find a similar result in ω Cen for BSSs with vsin (i) > 70 km s{sup –1}, which are all, except for two, concentrated inside 2r{sub c} . In all globular clusters, we find rapidly rotating BSSs that have relatively high differential radial velocities that likely put them on hyperbolic orbits, suggestive of strong dynamical interactions in the past. Based on stellar spin-down and dynamical crossing timescales, we estimate that all the observed rapidly rotating BSSs are likely to form in their central cluster regions no longer than ∼300 Myr ago and may be subsequently ejected from their host globular clusters. Using dereddened V – I colors of our photometric selection, we show that blue BSSs in ω Cen with (V – I){sub 0} ≲ 0.25 mag show a significantly increased vsin (i) dispersion compared with their red counterparts and all other BSSs in our sample, therefore strongly implying that fast-rotating BSSs in ω Cen are preferentially bluer, i.e., more massive. This may indicate that this particular blue BSS population was formed in a unique formation event and/or through a unique mechanism.« less
Boese, A Daniel; Forbert, Harald; Masia, Marco; Tekin, Adem; Marx, Dominik; Jansen, Georg
2011-08-28
The infrared spectroscopy of molecules, complexes, and molecular aggregates dissolved in superfluid helium clusters, commonly called HElium NanoDroplet Isolation (HENDI) spectroscopy, is an established, powerful experimental technique for extracting high resolution ro-vibrational spectra at ultra-low temperatures. Realistic quantum simulations of such systems, in particular in cases where the solute is undergoing a chemical reaction, require accurate solute-helium potentials which are also simple enough to be efficiently evaluated over the vast number of steps required in typical Monte Carlo or molecular dynamics sampling. This precludes using global potential energy surfaces as often parameterized for small complexes in the realm of high-resolution spectroscopic investigations that, in view of the computational effort imposed, are focused on the intermolecular interaction of rigid molecules with helium. Simple Lennard-Jones-like pair potentials, on the other hand, fall short in providing the required flexibility and accuracy in order to account for chemical reactions of the solute molecule. Here, a general scheme of constructing sufficiently accurate site-site potentials for use in typical quantum simulations is presented. This scheme employs atom-based grids, accounts for local and global minima, and is applied to the special case of a HCl(H(2)O)(4) cluster solvated by helium. As a first step, accurate interaction energies of a helium atom with a set of representative configurations sampled from a trajectory following the dissociation of the HCl(H(2)O)(4) cluster were computed using an efficient combination of density functional theory and symmetry-adapted perturbation theory, i.e. the DFT-SAPT approach. For each of the sampled cluster configurations, a helium atom was placed at several hundred positions distributed in space, leading to an overall number of about 400,000 such quantum chemical calculations. The resulting total interaction energies, decomposed into several energetic contributions, served to fit a site-site potential, where the sites are located at the atomic positions and, additionally, pseudo-sites are distributed along the lines joining pairs of atom sites within the molecular cluster. This approach ensures that this solute-helium potential is able to describe both undissociated molecular and dissociated (zwitter-) ionic configurations, as well as the interconnecting reaction pathway without re-adjusting partial charges or other parameters depending on the particular configuration. Test calculations of the larger HCl(H(2)O)(5) cluster interacting with helium demonstrate the transferability of the derived site-site potential. This specific potential can be readily used in quantum simulations of such HCl/water clusters in bulk helium or helium nanodroplets, whereas the underlying construction procedure can be generalized to other molecular solutes in other atomic solvents such as those encountered in rare gas matrix isolation spectroscopy.
Methods for sample size determination in cluster randomized trials
Rutterford, Clare; Copas, Andrew; Eldridge, Sandra
2015-01-01
Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515
Spatially explicit population estimates for black bears based on cluster sampling
Humm, J.; McCown, J. Walter; Scheick, B.K.; Clark, Joseph D.
2017-01-01
We estimated abundance and density of the 5 major black bear (Ursus americanus) subpopulations (i.e., Eglin, Apalachicola, Osceola, Ocala-St. Johns, Big Cypress) in Florida, USA with spatially explicit capture-mark-recapture (SCR) by extracting DNA from hair samples collected at barbed-wire hair sampling sites. We employed a clustered sampling configuration with sampling sites arranged in 3 × 3 clusters spaced 2 km apart within each cluster and cluster centers spaced 16 km apart (center to center). We surveyed all 5 subpopulations encompassing 38,960 km2 during 2014 and 2015. Several landscape variables, most associated with forest cover, helped refine density estimates for the 5 subpopulations we sampled. Detection probabilities were affected by site-specific behavioral responses coupled with individual capture heterogeneity associated with sex. Model-averaged bear population estimates ranged from 120 (95% CI = 59–276) bears or a mean 0.025 bears/km2 (95% CI = 0.011–0.44) for the Eglin subpopulation to 1,198 bears (95% CI = 949–1,537) or 0.127 bears/km2 (95% CI = 0.101–0.163) for the Ocala-St. Johns subpopulation. The total population estimate for our 5 study areas was 3,916 bears (95% CI = 2,914–5,451). The clustered sampling method coupled with information on land cover was efficient and allowed us to estimate abundance across extensive areas that would not have been possible otherwise. Clustered sampling combined with spatially explicit capture-recapture methods has the potential to provide rigorous population estimates for a wide array of species that are extensive and heterogeneous in their distribution.
Salo, Hanna; Berisha, Anna-Kaisa; Mäkinen, Joni
2016-03-01
This is the first study seasonally applying Sphagnum papillosum moss bags and vertical snow samples for monitoring atmospheric pollution. Moss bags, exposed in January, were collected together with snow samples by early March 2012 near the Harjavalta Industrial Park in southwest Finland. Magnetic, chemical, scanning electron microscopy-energy dispersive X-ray spectroscopy (SEM-EDX), K-means clustering, and Tomlinson pollution load index (PLI) data showed parallel spatial trends of pollution dispersal for both materials. Results strengthen previous findings that concentrate and slag handling activities were important (dust) emission sources while the impact from Cu-Ni smelter's pipe remained secondary at closer distances. Statistically significant correlations existed between the variables of snow and moss bags. As a summary, both methods work well for sampling and are efficient pollutant accumulators. Moss bags can be used also in winter conditions and they provide more homogeneous and better controlled sampling method than snow samples. Copyright © 2015. Published by Elsevier B.V.
Tracing Large Scale Structure with a Redshift Survey of Rich Clusters of Galaxies
NASA Astrophysics Data System (ADS)
Batuski, D.; Slinglend, K.; Haase, S.; Hill, J. M.
1993-12-01
Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and hold promise of confirming the existence of structure in the more immediate universe on scales corresponding to COBE results (i.e., on the order of 10% or more of the horizon size of the universe). However, most Abell clusters do not as yet have measured redshifts (or, in the case of most low redshift clusters, have only one or two galaxies measured), so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters, perhaps even to the point of spurious identifications of some of the clusters themselves. Our approach in this effort has been to use the MX multifiber spectrometer to measure redshifts of at least ten galaxies in each of about 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8. This work will result in a somewhat deeper, much more complete (and reliable) sample of positions of rich clusters. Our primary use for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 40 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.
The Mass Function in h+(chi) Persei
NASA Astrophysics Data System (ADS)
Bragg, Ann; Kenyon, Scott
2000-08-01
Knowledge of the stellar initial mass function (IMF) is critical to understanding star formation and galaxy evolution. Past studies of the IMF in open clusters have primarily used luminosity functions to determine mass functions, frequently in relatively sparse clusters. Our goal with this project is to derive a reliable, well- sampled IMF for a pair of very dense young clusters (h+(chi) Persei) with ages, 1-2 × 10^7 yr (e.g., Vogt A& A 11:359), where stellar evolution theory is robust. We will construct the HR diagram using both photometry and spectral types to derive more accurate stellar masses and ages than are possible using photometry alone. Results from the two clusters will be compared to examine the universality of the IMF. We currently have a spectroscopic sample covering an area within 9 arc-minutes of the center of each cluster taken with the FAST Spectrograph. The sample is complete to V=15.4 and contains ~ 1000 stars. We request 2 nights at WIYN/HYDRA to extend this sample to deeper magnitudes, allowing us to determine the IMF of the clusters to a lower limiting mass and to search for a pre-main sequence, theoretically predicted to be present for clusters of this age. Note that both clusters are contained within a single HYDRA field.
NASA Astrophysics Data System (ADS)
Liu, Jianjun; Kan, Jianquan
2018-04-01
In this paper, based on the terahertz spectrum, a new identification method of genetically modified material by support vector machine (SVM) based on affinity propagation clustering is proposed. This algorithm mainly uses affinity propagation clustering algorithm to make cluster analysis and labeling on unlabeled training samples, and in the iterative process, the existing SVM training data are continuously updated, when establishing the identification model, it does not need to manually label the training samples, thus, the error caused by the human labeled samples is reduced, and the identification accuracy of the model is greatly improved.
Open star clusters and Galactic structure
NASA Astrophysics Data System (ADS)
Joshi, Yogesh C.
2018-04-01
In order to understand the Galactic structure, we perform a statistical analysis of the distribution of various cluster parameters based on an almost complete sample of Galactic open clusters yet available. The geometrical and physical characteristics of a large number of open clusters given in the MWSC catalogue are used to study the spatial distribution of clusters in the Galaxy and determine the scale height, solar offset, local mass density and distribution of reddening material in the solar neighbourhood. We also explored the mass-radius and mass-age relations in the Galactic open star clusters. We find that the estimated parameters of the Galactic disk are largely influenced by the choice of cluster sample.
Declustering of clustered preferential sampling for histogram and semivariogram inference
Olea, R.A.
2007-01-01
Measurements of attributes obtained more as a consequence of business ventures than sampling design frequently result in samplings that are preferential both in location and value, typically in the form of clusters along the pay. Preferential sampling requires preprocessing for the purpose of properly inferring characteristics of the parent population, such as the cumulative distribution and the semivariogram. Consideration of the distance to the nearest neighbor allows preparation of resampled sets that produce comparable results to those from previously proposed methods. Clustered sampling of size 140, taken from an exhaustive sampling, is employed to illustrate this approach. ?? International Association for Mathematical Geology 2007.