clusters improved estimates: Topics by Science.gov

Sample records for clusters improved estimates

Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

NASA Astrophysics Data System (ADS)

Yen, Chi-Fu; Sivasankar, Sanjeevi

2018-03-01

Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.
Improving cluster-based missing value estimation of DNA microarray data.

PubMed

Brás, Lígia P; Menezes, José C

2007-06-01

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.
Optimizing weak lensing mass estimates for cluster profile uncertainty

DOE PAGES

Gruen, D.; Bernstein, G. M.; Lam, T. Y.; ...

2011-09-11

Weak lensing measurements of cluster masses are necessary for calibrating mass-observable relations (MORs) to investigate the growth of structure and the properties of dark energy. However, the measured cluster shear signal varies at fixed mass M 200m due to inherent ellipticity of background galaxies, intervening structures along the line of sight, and variations in the cluster structure due to scatter in concentrations, asphericity and substructure. We use N-body simulated halos to derive and evaluate a weak lensing circular aperture mass measurement M ap that minimizes the mass estimate variance <(M ap - M 200m) 2> in the presence of allmore » these forms of variability. Depending on halo mass and observational conditions, the resulting mass estimator improves on M ap filters optimized for circular NFW-profile clusters in the presence of uncorrelated large scale structure (LSS) about as much as the latter improve on an estimator that only minimizes the influence of shape noise. Optimizing for uncorrelated LSS while ignoring the variation of internal cluster structure puts too much weight on the profile near the cores of halos, and under some circumstances can even be worse than not accounting for LSS at all. As a result, we discuss the impact of variability in cluster structure and correlated structures on the design and performance of weak lensing surveys intended to calibrate cluster MORs.« less
Estimating the concrete compressive strength using hard clustering and fuzzy clustering based regression techniques.

PubMed

Nagwani, Naresh Kumar; Deo, Shirish V

2014-01-01

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Estimating the Concrete Compressive Strength Using Hard Clustering and Fuzzy Clustering Based Regression Techniques

PubMed Central

Nagwani, Naresh Kumar; Deo, Shirish V.

2014-01-01

Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
A nonparametric clustering technique which estimates the number of clusters

NASA Technical Reports Server (NTRS)

Ramey, D. B.

1983-01-01

In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
Cluster-based analysis improves predictive validity of spike-triggered receptive field estimates

PubMed Central

Malone, Brian J.

2017-01-01

Spectrotemporal receptive field (STRF) characterization is a central goal of auditory physiology. STRFs are often approximated by the spike-triggered average (STA), which reflects the average stimulus preceding a spike. In many cases, the raw STA is subjected to a threshold defined by gain values expected by chance. However, such correction methods have not been universally adopted, and the consequences of specific gain-thresholding approaches have not been investigated systematically. Here, we evaluate two classes of statistical correction techniques, using the resulting STRF estimates to predict responses to a novel validation stimulus. The first, more traditional technique eliminated STRF pixels (time-frequency bins) with gain values expected by chance. This correction method yielded significant increases in prediction accuracy, including when the threshold setting was optimized for each unit. The second technique was a two-step thresholding procedure wherein clusters of contiguous pixels surviving an initial gain threshold were then subjected to a cluster mass threshold based on summed pixel values. This approach significantly improved upon even the best gain-thresholding techniques. Additional analyses suggested that allowing threshold settings to vary independently for excitatory and inhibitory subfields of the STRF resulted in only marginal additional gains, at best. In summary, augmenting reverse correlation techniques with principled statistical correction choices increased prediction accuracy by over 80% for multi-unit STRFs and by over 40% for single-unit STRFs, furthering the interpretational relevance of the recovered spectrotemporal filters for auditory systems analysis. PMID:28877194
Fuzzy C-mean clustering on kinetic parameter estimation with generalized linear least square algorithm in SPECT

NASA Astrophysics Data System (ADS)

Choi, Hon-Chit; Wen, Lingfeng; Eberl, Stefan; Feng, Dagan

2006-03-01

Dynamic Single Photon Emission Computed Tomography (SPECT) has the potential to quantitatively estimate physiological parameters by fitting compartment models to the tracer kinetics. The generalized linear least square method (GLLS) is an efficient method to estimate unbiased kinetic parameters and parametric images. However, due to the low sensitivity of SPECT, noisy data can cause voxel-wise parameter estimation by GLLS to fail. Fuzzy C-Mean (FCM) clustering and modified FCM, which also utilizes information from the immediate neighboring voxels, are proposed to improve the voxel-wise parameter estimation of GLLS. Monte Carlo simulations were performed to generate dynamic SPECT data with different noise levels and processed by general and modified FCM clustering. Parametric images were estimated by Logan and Yokoi graphical analysis and GLLS. The influx rate (K I), volume of distribution (V d) were estimated for the cerebellum, thalamus and frontal cortex. Our results show that (1) FCM reduces the bias and improves the reliability of parameter estimates for noisy data, (2) GLLS provides estimates of micro parameters (K I-k 4) as well as macro parameters, such as volume of distribution (Vd) and binding potential (BP I & BP II) and (3) FCM clustering incorporating neighboring voxel information does not improve the parameter estimates, but improves noise in the parametric images. These findings indicated that it is desirable for pre-segmentation with traditional FCM clustering to generate voxel-wise parametric images with GLLS from dynamic SPECT data.
Memory color assisted illuminant estimation through pixel clustering

NASA Astrophysics Data System (ADS)

Zhang, Heng; Quan, Shuxue

2010-01-01

The under constrained nature of illuminant estimation determines that in order to resolve the problem, certain assumptions are needed, such as the gray world theory. Including more constraints in this process may help explore the useful information in an image and improve the accuracy of the estimated illuminant, providing that the constraints hold. Based on the observation that most personal images have contents of one or more of the following categories: neutral objects, human beings, sky, and plants, we propose a method for illuminant estimation through the clustering of pixels of gray and three dominant memory colors: skin tone, sky blue, and foliage green. Analysis shows that samples of the above colors cluster around small areas under different illuminants and their characteristics can be used to effectively detect pixels falling into each of the categories. The algorithm requires the knowledge of the spectral sensitivity response of the camera, and a spectral database consisted of the CIE standard illuminants and reflectance or radiance database of samples of the above colors.
the-wizz: clustering redshift estimation for everyone

NASA Astrophysics Data System (ADS)

Morrison, C. B.; Hildebrandt, H.; Schmidt, S. J.; Baldry, I. K.; Bilicki, M.; Choi, A.; Erben, T.; Schneider, P.

2017-05-01

We present the-wizz, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of the-wizz is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an 'expert'. It allows the end user of a given survey to select any subsample of photometric galaxies with unknown redshifts, match this sample's catalogue indices into a value-added data file and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly survey and the Sloan Digital Sky Survey. The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. the-wizz can be downloaded at http://github.com/morriscb/The-wiZZ/.
Attitude Estimation in Fractionated Spacecraft Cluster Systems

NASA Technical Reports Server (NTRS)

Hadaegh, Fred Y.; Blackmore, James C.

2011-01-01

An attitude estimation was examined in fractioned free-flying spacecraft. Instead of a single, monolithic spacecraft, a fractionated free-flying spacecraft uses multiple spacecraft modules. These modules are connected only through wireless communication links and, potentially, wireless power links. The key advantage of this concept is the ability to respond to uncertainty. For example, if a single spacecraft module in the cluster fails, a new one can be launched at a lower cost and risk than would be incurred with onorbit servicing or replacement of the monolithic spacecraft. In order to create such a system, however, it is essential to know what the navigation capabilities of the fractionated system are as a function of the capabilities of the individual modules, and to have an algorithm that can perform estimation of the attitudes and relative positions of the modules with fractionated sensing capabilities. Looking specifically at fractionated attitude estimation with startrackers and optical relative attitude sensors, a set of mathematical tools has been developed that specify the set of sensors necessary to ensure that the attitude of the entire cluster ( cluster attitude ) can be observed. Also developed was a navigation filter that can estimate the cluster attitude if these conditions are satisfied. Each module in the cluster may have either a startracker, a relative attitude sensor, or both. An extended Kalman filter can be used to estimate the attitude of all modules. A range of estimation performances can be achieved depending on the sensors used and the topology of the sensing network.
State estimation and prediction using clustered particle filters.

PubMed

Lee, Yoonsang; Majda, Andrew J

2016-12-20

Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.
State estimation and prediction using clustered particle filters

PubMed Central

Lee, Yoonsang; Majda, Andrew J.

2016-01-01

Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332
Re-estimating sample size in cluster randomised trials with active recruitment within clusters.

PubMed

van Schie, S; Moerbeek, M

2014-08-30

Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.
Improved optical mass tracer for galaxy clusters calibrated using weak lensing measurements

NASA Astrophysics Data System (ADS)

Reyes, R.; Mandelbaum, R.; Hirata, C.; Bahcall, N.; Seljak, U.

2008-11-01

We develop an improved mass tracer for clusters of galaxies from optically observed parameters, and calibrate the mass relation using weak gravitational lensing measurements. We employ a sample of ~13000 optically selected clusters from the Sloan Digital Sky Survey (SDSS) maxBCG catalogue, with photometric redshifts in the range 0.1-0.3. The optical tracers we consider are cluster richness, cluster luminosity, luminosity of the brightest cluster galaxy (BCG) and combinations of these parameters. We measure the weak lensing signal around stacked clusters as a function of the various tracers, and use it to determine the tracer with the least amount of scatter. We further use the weak lensing data to calibrate the mass normalization. We find that the best mass estimator for massive clusters is a combination of cluster richness, N200, and the luminosity of the BCG, LBCG: , where is the observed mean BCG luminosity at a given richness. This improved mass tracer will enable the use of galaxy clusters as a more powerful tool for constraining cosmological parameters.
The relative impact of baryons and cluster shape on weak lensing mass estimates of galaxy clusters

NASA Astrophysics Data System (ADS)

Lee, B. E.; Le Brun, A. M. C.; Haq, M. E.; Deering, N. J.; King, L. J.; Applegate, D.; McCarthy, I. G.

2018-05-01

Weak gravitational lensing depends on the integrated mass along the line of sight. Baryons contribute to the mass distribution of galaxy clusters and the resulting mass estimates from lensing analysis. We use the cosmo-OWLS suite of hydrodynamic simulations to investigate the impact of baryonic processes on the bias and scatter of weak lensing mass estimates of clusters. These estimates are obtained by fitting NFW profiles to mock data using MCMC techniques. In particular, we examine the difference in estimates between dark matter-only runs and those including various prescriptions for baryonic physics. We find no significant difference in the mass bias when baryonic physics is included, though the overall mass estimates are suppressed when feedback from AGN is included. For lowest-mass systems for which a reliable mass can be obtained (M200 ≈ 2 × 1014M⊙), we find a bias of ≈-10 per cent. The magnitude of the bias tends to decrease for higher mass clusters, consistent with no bias for the most massive clusters which have masses comparable to those found in the CLASH and HFF samples. For the lowest mass clusters, the mass bias is particularly sensitive to the fit radii and the limits placed on the concentration prior, rendering reliable mass estimates difficult. The scatter in mass estimates between the dark matter-only and the various baryonic runs is less than between different projections of individual clusters, highlighting the importance of triaxiality.
Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials.

PubMed

Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B

2017-04-01

Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Cluster mass estimators from CMB temperature and polarization lensing

NASA Astrophysics Data System (ADS)

Hu, Wayne; DeDeo, Simon; Vale, Chris

2007-12-01

Upcoming Sunyaev Zel'dovich surveys are expected to return ~104 intermediate mass clusters at high redshift. Their average masses must be known to the same accuracy as desired for the dark energy properties. Internal to the surveys, the cosmic microwave background (CMB) potentially provides a source for lensing mass measurements whose distance is precisely known and behind all clusters. We develop statistical mass estimators from six quadratic combinations of CMB temperature and polarization fields that can simultaneously recover large-scale structure and cluster mass profiles. The performance of these estimators on idealized Navarro Frenk White (NFW) clusters suggests that surveys with a ~1' beam and 10\\,\\muK^{\\prime} noise in uncontaminated temperature maps can make a ~10σ detection, or equivalently a ~10% mass measurement for each 103 set of clusters. With internal or external acoustic scale E-polarization measurements, the ET cross-correlation estimator can provide a stringent test for contaminants on a first detection at ~1/3 the significance. For surveys that reach below 3\\,\\muK^{\\prime}, the EB cross-correlation estimator should provide the most precise measurements and potentially the strongest control over contaminants.
Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

DOE PAGES

Old, L.; Wojtak, R.; Pearce, F. R.; ...

2017-12-20

With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
Galaxy Cluster Mass Reconstruction Project – III. The impact of dynamical substructure on cluster mass estimates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Old, L.; Wojtak, R.; Pearce, F. R.

With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less

Properties of star clusters - I. Automatic distance and extinction estimates

NASA Astrophysics Data System (ADS)

Buckner, Anne S. M.; Froebrich, Dirk

2013-12-01

Determining star cluster distances is essential to analyse their properties and distribution in the Galaxy. In particular, it is desirable to have a reliable, purely photometric distance estimation method for large samples of newly discovered cluster candidates e.g. from the Two Micron All Sky Survey, the UK Infrared Deep Sky Survey Galactic Plane Survey and VVV. Here, we establish an automatic method to estimate distances and reddening from near-infrared photometry alone, without the use of isochrone fitting. We employ a decontamination procedure of JHK photometry to determine the density of stars foreground to clusters and a galactic model to estimate distances. We then calibrate the method using clusters with known properties. This allows us to establish distance estimates with better than 40 per cent accuracy. We apply our method to determine the extinction and distance values to 378 known open clusters and 397 cluster candidates from the list of Froebrich, Scholz & Raftery. We find that the sample is biased towards clusters of a distance of approximately 3 kpc, with typical distances between 2 and 6 kpc. Using the cluster distances and extinction values, we investigate how the average extinction per kiloparsec distance changes as a function of the Galactic longitude. We find a systematic dependence that can be approximated by AH(l) [mag kpc-1] = 0.10 + 0.001 × |l - 180°|/° for regions more than 60° from the Galactic Centre.
Improvements in Ionized Cluster-Beam Deposition

NASA Technical Reports Server (NTRS)

Fitzgerald, D. J.; Compton, L. E.; Pawlik, E. V.

1986-01-01

Lower temperatures result in higher purity and fewer equipment problems. In cluster-beam deposition, clusters of atoms formed by adiabatic expansion nozzle and with proper nozzle design, expanding vapor cools sufficiently to become supersaturated and form clusters of material deposited. Clusters are ionized and accelerated in electric field and then impacted on substrate where films form. Improved cluster-beam technique useful for deposition of refractory metals.
Improved infrared precipitation estimation approaches based on k-means clustering: Application to north Algeria using MSG-SEVIRI satellite data

NASA Astrophysics Data System (ADS)

Mokdad, Fatiha; Haddad, Boualem

2017-06-01

In this paper, two new infrared precipitation estimation approaches based on the concept of k-means clustering are first proposed, named the NAW-Kmeans and the GPI-Kmeans methods. Then, they are adapted to the southern Mediterranean basin, where the subtropical climate prevails. The infrared data (10.8 μm channel) acquired by MSG-SEVIRI sensor in winter and spring 2012 are used. Tests are carried out in eight areas distributed over northern Algeria: Sebra, El Bordj, Chlef, Blida, Bordj Menael, Sidi Aich, Beni Ourthilane, and Beni Aziz. The validation is performed by a comparison of the estimated rainfalls to rain gauges observations collected by the National Office of Meteorology in Dar El Beida (Algeria). Despite the complexity of the subtropical climate, the obtained results indicate that the NAW-Kmeans and the GPI-Kmeans approaches gave satisfactory results for the considered rain rates. Also, the proposed schemes lead to improvement in precipitation estimation performance when compared to the original algorithms NAW (Nagri, Adler, and Wetzel) and GPI (GOES Precipitation Index).
The impact of baryons on massive galaxy clusters: halo structure and cluster mass estimates

NASA Astrophysics Data System (ADS)

Henson, Monique A.; Barnes, David J.; Kay, Scott T.; McCarthy, Ian G.; Schaye, Joop

2017-03-01

We use the BAHAMAS (BAryons and HAloes of MAssive Systems) and MACSIS (MAssive ClusterS and Intercluster Structures) hydrodynamic simulations to quantify the impact of baryons on the mass distribution and dynamics of massive galaxy clusters, as well as the bias in X-ray and weak lensing mass estimates. These simulations use the subgrid physics models calibrated in the BAHAMAS project, which include feedback from both supernovae and active galactic nuclei. They form a cluster population covering almost two orders of magnitude in mass, with more than 3500 clusters with masses greater than 1014 M⊙ at z = 0. We start by characterizing the clusters in terms of their spin, shape and density profile, before considering the bias in both weak lensing and hydrostatic mass estimates. Whilst including baryonic effects leads to more spherical, centrally concentrated clusters, the median weak lensing mass bias is unaffected by the presence of baryons. In both the dark matter only and hydrodynamic simulations, the weak lensing measurements underestimate cluster masses by ≈10 per cent for clusters with M200 ≤ 1015 M⊙ and this bias tends to zero at higher masses. We also consider the hydrostatic bias when using both the true density and temperature profiles, and those derived from X-ray spectroscopy. When using spectroscopic temperatures and densities, the hydrostatic bias decreases as a function of mass, leading to a bias of ≈40 per cent for clusters with M500 ≥ 1015 M⊙. This is due to the presence of cooler gas in the cluster outskirts. Using mass weighted temperatures and the true density profile reduces this bias to 5-15 per cent.
Slope angle estimation method based on sparse subspace clustering for probe safe landing

NASA Astrophysics Data System (ADS)

Li, Haibo; Cao, Yunfeng; Ding, Meng; Zhuang, Likui

2018-06-01

To avoid planetary probes landing on steep slopes where they may slip or tip over, a new method of slope angle estimation based on sparse subspace clustering is proposed to improve accuracy. First, a coordinate system is defined and established to describe the measured data of light detection and ranging (LIDAR). Second, this data is processed and expressed with a sparse representation. Third, on this basis, the data is made to cluster to determine which subspace it belongs to. Fourth, eliminating outliers in subspace, the correct data points are used for the fitting planes. Finally, the vectors normal to the planes are obtained using the plane model, and the angle between the normal vectors is obtained through calculation. Based on the geometric relationship, this angle is equal in value to the slope angle. The proposed method was tested in a series of experiments. The experimental results show that this method can effectively estimate the slope angle, can overcome the influence of noise and obtain an exact slope angle. Compared with other methods, this method can minimize the measuring errors and further improve the estimation accuracy of the slope angle.
Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models.

PubMed

Liu, Jingxia; Colditz, Graham A

2018-05-01

There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Accounting for One-Group Clustering in Effect-Size Estimation

ERIC Educational Resources Information Center

Citkowicz, Martyna; Hedges, Larry V.

2013-01-01

In some instances, intentionally or not, study designs are such that there is clustering in one group but not in the other. This paper describes methods for computing effect size estimates and their variances when there is clustering in only one group and the analysis has not taken that clustering into account. The authors provide the effect size…
Estimating the intra-cluster correlation coefficient for evaluating an educational intervention program to improve rabies awareness and dog bite prevention among children in Sikkim, India: A pilot study.

PubMed

Auplish, Aashima; Clarke, Alison S; Van Zanten, Trent; Abel, Kate; Tham, Charmaine; Bhutia, Thinlay N; Wilks, Colin R; Stevenson, Mark A; Firestone, Simon M

2017-05-01

Educational initiatives targeting at-risk populations have long been recognized as a mainstay of ongoing rabies control efforts. Cluster-based studies are often utilized to assess levels of knowledge, attitudes and practices of a population in response to education campaigns. The design of cluster-based studies requires estimates of intra-cluster correlation coefficients obtained from previous studies. This study estimates the school-level intra-cluster correlation coefficient (ICC) for rabies knowledge change following an educational intervention program. A cross-sectional survey was conducted with 226 students from 7 schools in Sikkim, India, using cluster sampling. In order to assess knowledge uptake, rabies education sessions with pre- and post-session questionnaires were administered. Paired differences of proportions were estimated for questions answered correctly. A mixed effects logistic regression model was developed to estimate school-level and student-level ICCs and to test for associations between gender, age, school location and educational level. The school- and student-level ICCs for rabies knowledge and awareness were 0.04 (95% CI: 0.01, 0.19) and 0.05 (95% CI: 0.2, 0.09), respectively. These ICCs suggest design effect multipliers of 5.45 schools and 1.05 students per school, will be required when estimating sample sizes and designing future cluster randomized trials. There was a good baseline level of rabies knowledge (mean pre-session score 71%), however, key knowledge gaps were identified in understanding appropriate behavior around scared dogs, potential sources of rabies and how to correctly order post rabies exposure precaution steps. After adjusting for the effect of gender, age, school location and education level, school and individual post-session test scores improved by 19%, with similar performance amongst boys and girls attending schools in urban and rural regions. The proportion of participants that were able to correctly order post
Improving clustering with metabolic pathway data.

PubMed

Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando

2014-04-10

It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a
Cluster Stability Estimation Based on a Minimal Spanning Trees Approach

NASA Astrophysics Data System (ADS)

Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora

2009-08-01

Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.
Estimation of rank correlation for clustered data.

PubMed

Rosner, Bernard; Glynn, Robert J

2017-06-30

It is well known that the sample correlation coefficient (R xy ) is the maximum likelihood estimator of the Pearson correlation (ρ xy ) for independent and identically distributed (i.i.d.) bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the maximum likelihood estimator of ρ xy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U_ of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (i) converting ranks of both X and Y to the probit scale, (ii) estimating the Pearson correlation between probit scores for X and Y, and (iii) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Spatially explicit population estimates for black bears based on cluster sampling

USGS Publications Warehouse

Humm, J.; McCown, J. Walter; Scheick, B.K.; Clark, Joseph D.

2017-01-01

We estimated abundance and density of the 5 major black bear (Ursus americanus) subpopulations (i.e., Eglin, Apalachicola, Osceola, Ocala-St. Johns, Big Cypress) in Florida, USA with spatially explicit capture-mark-recapture (SCR) by extracting DNA from hair samples collected at barbed-wire hair sampling sites. We employed a clustered sampling configuration with sampling sites arranged in 3 × 3 clusters spaced 2 km apart within each cluster and cluster centers spaced 16 km apart (center to center). We surveyed all 5 subpopulations encompassing 38,960 km2 during 2014 and 2015. Several landscape variables, most associated with forest cover, helped refine density estimates for the 5 subpopulations we sampled. Detection probabilities were affected by site-specific behavioral responses coupled with individual capture heterogeneity associated with sex. Model-averaged bear population estimates ranged from 120 (95% CI = 59–276) bears or a mean 0.025 bears/km2 (95% CI = 0.011–0.44) for the Eglin subpopulation to 1,198 bears (95% CI = 949–1,537) or 0.127 bears/km2 (95% CI = 0.101–0.163) for the Ocala-St. Johns subpopulation. The total population estimate for our 5 study areas was 3,916 bears (95% CI = 2,914–5,451). The clustered sampling method coupled with information on land cover was efficient and allowed us to estimate abundance across extensive areas that would not have been possible otherwise. Clustered sampling combined with spatially explicit capture-recapture methods has the potential to provide rigorous population estimates for a wide array of species that are extensive and heterogeneous in their distribution.
Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system

PubMed Central

Xing, Jian; Burkom, Howard; Moniz, Linda; Edgerton, James; Leuze, Michael; Tokars, Jerome

2009-01-01

Background The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. Methods The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Results Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. Conclusion The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution
Estimation of Rank Correlation for Clustered Data

PubMed Central

Rosner, Bernard; Glynn, Robert

2017-01-01

It is well known that the sample correlation coefficient (Rxy) is the maximum likelihood estimator (MLE) of the Pearson correlation (ρxy) for i.i.d. bivariate normal data. However, this is not true for ophthalmologic data where X (e.g., visual acuity) and Y (e.g., visual field) are available for each eye and there is positive intraclass correlation for both X and Y in fellow eyes. In this paper, we provide a regression-based approach for obtaining the MLE of ρxy for clustered data, which can be implemented using standard mixed effects model software. This method is also extended to allow for estimation of partial correlation by controlling both X and Y for a vector U of other covariates. In addition, these methods can be extended to allow for estimation of rank correlation for clustered data by (a) converting ranks of both X and Y to the probit scale, (b) estimating the Pearson correlation between probit scores for X and Y, and (c) using the relationship between Pearson and rank correlation for bivariate normally distributed data. The validity of the methods in finite-sized samples is supported by simulation studies. Finally, two examples from ophthalmology and analgesic abuse are used to illustrate the methods. PMID:28399615
A hierarchical clustering methodology for the estimation of toxicity.

PubMed

Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M

2008-01-01

ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.
High-Resolution Spatial Distribution and Estimation of Access to Improved Sanitation in Kenya.

PubMed

Jia, Peng; Anderson, John D; Leitner, Michael; Rheingans, Richard

2016-01-01

Access to sanitation facilities is imperative in reducing the risk of multiple adverse health outcomes. A distinct disparity in sanitation exists among different wealth levels in many low-income countries, which may hinder the progress across each of the Millennium Development Goals. The surveyed households in 397 clusters from 2008-2009 Kenya Demographic and Health Surveys were divided into five wealth quintiles based on their national asset scores. A series of spatial analysis methods including excess risk, local spatial autocorrelation, and spatial interpolation were applied to observe disparities in coverage of improved sanitation among different wealth categories. The total number of the population with improved sanitation was estimated by interpolating, time-adjusting, and multiplying the surveyed coverage rates by high-resolution population grids. A comparison was then made with the annual estimates from United Nations Population Division and World Health Organization /United Nations Children's Fund Joint Monitoring Program for Water Supply and Sanitation. The Empirical Bayesian Kriging interpolation produced minimal root mean squared error for all clusters and five quintiles while predicting the raw and spatial coverage rates of improved sanitation. The coverage in southern regions was generally higher than in the north and east, and the coverage in the south decreased from Nairobi in all directions, while Nyanza and North Eastern Province had relatively poor coverage. The general clustering trend of high and low sanitation improvement among surveyed clusters was confirmed after spatial smoothing. There exists an apparent disparity in sanitation among different wealth categories across Kenya and spatially smoothed coverage rates resulted in a closer estimation of the available statistics than raw coverage rates. Future intervention activities need to be tailored for both different wealth categories and nationally where there are areas of greater needs when
Bootstrap-based methods for estimating standard errors in Cox's regression analyses of clustered event times.

PubMed

Xiao, Yongling; Abrahamowicz, Michal

2010-03-30

We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Estimating metallicities with isochrone fits to photometric data of open clusters

NASA Astrophysics Data System (ADS)

Monteiro, H.; Oliveira, A. F.; Dias, W. S.; Caetano, T. C.

2014-10-01

The metallicity is a critical parameter that affects the correct determination of stellar cluster's fundamental characteristics and has important implications in Galactic and Stellar evolution research. Fewer than 10% of the 2174 currently catalogued open clusters have their metallicity determined in the literature. In this work we present a method for estimating the metallicity of open clusters via non-subjective isochrone fitting using the cross-entropy global optimization algorithm applied to UBV photometric data. The free parameters distance, reddening, age, and metallicity are simultaneously determined by the fitting method. The fitting procedure uses weights for the observational data based on the estimation of membership likelihood for each star, which considers the observational magnitude limit, the density profile of stars as a function of radius from the center of the cluster, and the density of stars in multi-dimensional magnitude space. We present results of [Fe/H] for well-studied open clusters based on distinct UBV data sets. The [Fe/H] values obtained in the ten cases for which spectroscopic determinations were available in the literature agree, indicating that our method provides a good alternative to estimating [Fe/H] by using an objective isochrone fitting. Our results show that the typical precision is about 0.1 dex.
Return period estimates for European windstorm clusters: a multi-model perspective

NASA Astrophysics Data System (ADS)

Renggli, Dominik; Zimmerli, Peter

2017-04-01

Clusters of storms over Europe can lead to very large aggregated losses. Realistic return period estimates for such cluster are therefore of vital interest to the (re)insurance industry. Such return period estimates are usually derived from historical storm activity statistics of the last 30 to 40 years. However, climate models provide an alternative source, potentially representing thousands of simulated storm seasons. In this study, we made use of decadal hindcast data from eight different climate models in the CMIP5 archive. We used an objective tracking algorithm to identify individual windstorms in the climate model data. The algorithm also computes a (population density weighted) Storm Severity Index (SSI) for each of the identified storms (both on a continental and more regional basis). We derived return period estimates for the cluster seasons 1990, 1999, 2013/2014 and 1884 in the following way: For each climate model, we extracted two different exceedance frequency curves. The first describes the exceedance frequency (or the return period as the inverse of it) of a given SSI level due to an individual storm occurrence. The second describes the exceedance frequency of the seasonally aggregated SSI level (i.e. the sum of the SSI values of all storms in a given season). Starting from appropriate return period assumptions for each individual storm of a historical cluster (e.g. Anatol, Lothar and Martin in 1999) and using the first curve, we extracted the SSI levels at the corresponding return periods. Summing these SSI values results in the seasonally aggregated SSI value. Combining this with the second (aggregated) exceedance frequency curve results in return period estimate of the historical cluster season. Since we do this for each model separately, we obtain eight different return period estimates for each historical cluster. In this way, we obtained the following return period estimates: 50 to 80 years for the 1990 season, 20 to 45 years for the 1999
Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali.

PubMed

Minetti, Andrea; Riera-Montes, Margarita; Nackers, Fabienne; Roederer, Thomas; Koudika, Marie Hortense; Sekkenes, Johanne; Taconet, Aurore; Fermon, Florence; Touré, Albouhary; Grais, Rebecca F; Checchi, Francesco

2012-10-12

Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes.

Measuring galaxy cluster masses with CMB lensing using a Maximum Likelihood estimator: statistical and systematic error budgets for future experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Raghunathan, Srinivasan; Patil, Sanjaykumar; Baxter, Eric J.

We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, wemore » examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment’s beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.« less
Measuring galaxy cluster masses with CMB lensing using a Maximum Likelihood estimator: statistical and systematic error budgets for future experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Raghunathan, Srinivasan; Patil, Sanjaykumar; Bianchini, Federico

We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, wemore » examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment's beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.« less
Measuring galaxy cluster masses with CMB lensing using a Maximum Likelihood estimator: statistical and systematic error budgets for future experiments

DOE PAGES

Raghunathan, Srinivasan; Patil, Sanjaykumar; Baxter, Eric J.; ...

2017-08-25

We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, wemore » examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment’s beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.« less
Measuring galaxy cluster masses with CMB lensing using a Maximum Likelihood estimator: statistical and systematic error budgets for future experiments

NASA Astrophysics Data System (ADS)

Raghunathan, Srinivasan; Patil, Sanjaykumar; Baxter, Eric J.; Bianchini, Federico; Bleem, Lindsey E.; Crawford, Thomas M.; Holder, Gilbert P.; Manzotti, Alessandro; Reichardt, Christian L.

2017-08-01

We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, we examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment's beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.
Improving local clustering based top-L link prediction methods via asymmetric link clustering information

NASA Astrophysics Data System (ADS)

Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

2018-02-01

Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.
An improved clustering algorithm based on reverse learning in intelligent transportation

NASA Astrophysics Data System (ADS)

Qiu, Guoqing; Kou, Qianqian; Niu, Ting

2017-05-01

With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
Calibrating First-Order Strong Lensing Mass Estimates in Clusters of Galaxies

NASA Astrophysics Data System (ADS)

Reed, Brendan; Remolian, Juan; Sharon, Keren; Li, Nan; SPT Clusters Cooperation

2018-01-01

We investigate methods to reduce the statistical and systematic errors inherent to using the Einstein Radius as a first-order mass estimate in strong lensing galaxy clusters. By finding an empirical universal calibration function, we aim to enable a first-order mass estimate of large cluster data sets in a fraction of the time and effort of full-scale strong lensing mass modeling. We use 74 simulated cluster data from the Argonne National Laboratory in a lens redshift slice of [0.159, 0.667] with various source redshifts in the range of [1.23, 2.69]. From the simulated density maps, we calculate the exact mass enclosed within the Einstein Radius. We find that the mass inferred from the Einstein Radius alone produces an error width of ~39% with respect to the true mass. We explore an array of polynomial and exponential correction functions with dependence on cluster redshift and projected radii of the lensed images, aiming to reduce the statistical and systematic uncertainty. We find that the error on the the mass inferred from the Einstein Radius can be reduced significantly by using a universal correction function. Our study has implications for current and future large galaxy cluster surveys aiming to measure cluster mass, and the mass-concentration relation.
Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

PubMed

Yelland, Lisa N; Salter, Amy B; Ryan, Philip

2011-10-15

Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Estimating carbon cluster binding energies from measured Cn distributions, n <= 10

NASA Astrophysics Data System (ADS)

Pargellis, A. N.

1990-08-01

Experimental data are presented for the cluster distribution of sputtered negative carbon clusters, C-n, with n≤10. Additionally, clusters have been observed with masses indicating they are CsC-2n, with n≤4. The C-n data are compared with the data obtained by other groups, for neutral and charged clusters, using a variety of sources such as evaporation, sputtering, and laser ablation. The data are used to estimate the cluster binding energies En, using the universal relation, En=(n-1)ΔHn+RTe [ln(Jn/J1)+0.5 ln(n)-α-(ΔSn-ΔS1)/R], derived from basic kinetic and thermodynamic relations. The estimated values agree astonishingly well with values from the literature, varying from published values by at most a few percent. In this equation, Jn is the observed current of n-atom clusters, ΔHn is the heat of vaporization, ΔH1=7.41 eV, and Te ≊0.25 eV (2900 K) is the effective source temperature. The relative change in cluster entropy during sublimation from the solid to vapor phase is approximated to first order by the relation (ΔSn-ΔS1)/R =3.1+0.9(n-2), and is fit to published data for n between 2 and 5 and temperatures between 2000 and 4000 K. The parameter α is empirical, obtained by fitting the data to known binding energies for Cn≤5 clusters. For evaporation sources, α must be zero, but α˜7 when sputtering with Cs+ ions, indicating the sputtered clusters appear to be in thermodynamic equilibrium, but not the atoms. Several possible mechanisms for the formation of clusters during sputtering are examined. One plausible mechanism is that atoms diffuse on the graphite surface to form clusters which are then desorbed by energetic, recoil atoms created in subsequent sputtering events.
Performance of small cluster surveys and the clustered LQAS design to estimate local-level vaccination coverage in Mali

PubMed Central

2012-01-01

Background Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. Methods We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. Results VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Conclusions Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes. PMID:23057445
SEMIPARAMETRIC EFFICIENT ESTIMATION FOR SHARED-FRAILTY MODELS WITH DOUBLY-CENSORED CLUSTERED DATA

PubMed Central

Wang, Jane-Ling

2018-01-01

In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left censoring. Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left censoring feature of the data. The new algorithm not only resolves this challenge but also accommodate the additional frailty variable effectively. Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data. PMID:29527068
Improving performance through concept formation and conceptual clustering

NASA Technical Reports Server (NTRS)

Fisher, Douglas H.

1992-01-01

Research from June 1989 through October 1992 focussed on concept formation, clustering, and supervised learning for purposes of improving the efficiency of problem-solving, planning, and diagnosis. These projects resulted in two dissertations on clustering, explanation-based learning, and means-ends planning, and publications in conferences and workshops, several book chapters, and journals; a complete Bibliography of NASA Ames supported publications is included. The following topics are studied: clustering of explanations and problem-solving experiences; clustering and means-end planning; and diagnosis of space shuttle and space station operating modes.
Estimating global distribution of boreal, temperate, and tropical tree plant functional types using clustering techniques

NASA Astrophysics Data System (ADS)

Wang, Audrey; Price, David T.

2007-03-01

A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
Open-Source Sequence Clustering Methods Improve the State Of the Art.

PubMed

Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob

2016-01-01

Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http
Hierarchical clustering method for improved prostate cancer imaging in diffuse optical tomography

NASA Astrophysics Data System (ADS)

Kavuri, Venkaiah C.; Liu, Hanli

2013-03-01

We investigate the feasibility of trans-rectal near infrared (NIR) based diffuse optical tomography (DOT) for early detection of prostate cancer using a transrectal ultrasound (TRUS) compatible imaging probe. For this purpose, we designed a TRUS-compatible, NIR-based image system (780nm), in which the photo diodes were placed on the trans-rectal probe. DC signals were recorded and used for estimating the absorption coefficient. We validated the system using laboratory phantoms. For further improvement, we also developed a hierarchical clustering method (HCM) to improve the accuracy of image reconstruction with limited prior information. We demonstrated the method using computer simulations laboratory phantom experiments.
An improved method to detect correct protein folds using partial clustering.

PubMed

Zhou, Jianjun; Wishart, David S

2013-01-16

Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
An improved method to detect correct protein folds using partial clustering

PubMed Central

2013-01-01

Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835
Kappa statistic for clustered matched-pair data.

PubMed

Yang, Zhao; Zhou, Ming

2014-07-10

Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Breast Cancer Symptom Clusters Derived from Social Media and Research Study Data Using Improved K-Medoid Clustering.

PubMed

Ping, Qing; Yang, Christopher C; Marshall, Sarah A; Avis, Nancy E; Ip, Edward H

2016-06-01

Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients' functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. The present study seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved K-Medoid clustering. A total of 50,426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared to that of the research study data, making the social media data easier to partition. The proposed revised K-Medoid clustering helps to improve the clustering performance by re-assigning some of the negative-ASW (average silhouette width) symptoms to other clusters after initial K-Medoid clustering. This retains an overall non-decreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest
Breast Cancer Symptom Clusters Derived from Social Media and Research Study Data Using Improved K-Medoid Clustering

PubMed Central

Ping, Qing; Yang, Christopher C.; Marshall, Sarah A.; Avis, Nancy E.; Ip, Edward H.

2017-01-01

Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients’ functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. The present study seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved K-Medoid clustering. A total of 50,426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared to that of the research study data, making the social media data easier to partition. The proposed revised K-Medoid clustering helps to improve the clustering performance by re-assigning some of the negative-ASW (average silhouette width) symptoms to other clusters after initial K-Medoid clustering. This retains an overall non-decreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest

A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set

PubMed Central

Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong

2012-01-01

Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Estimating regression coefficients from clustered samples: Sampling errors and optimum sample allocation

NASA Technical Reports Server (NTRS)

Kalton, G.

1983-01-01

A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.
Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

PubMed

Austin, Peter C

2010-04-22

Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Interrupted time-series analysis yielded an effect estimate concordant with the cluster-randomized controlled trial result.

PubMed

Fretheim, Atle; Soumerai, Stephen B; Zhang, Fang; Oxman, Andrew D; Ross-Degnan, Dennis

2013-08-01

We reanalyzed the data from a cluster-randomized controlled trial (C-RCT) of a quality improvement intervention for prescribing antihypertensive medication. Our objective was to estimate the effectiveness of the intervention using both interrupted time-series (ITS) and RCT methods, and to compare the findings. We first conducted an ITS analysis using data only from the intervention arm of the trial because our main objective was to compare the findings from an ITS analysis with the findings from the C-RCT. We used segmented regression methods to estimate changes in level or slope coincident with the intervention, controlling for baseline trend. We analyzed the C-RCT data using generalized estimating equations. Last, we estimated the intervention effect by including data from both study groups and by conducting a controlled ITS analysis of the difference between the slope and level changes in the intervention and control groups. The estimates of absolute change resulting from the intervention were ITS analysis, 11.5% (95% confidence interval [CI]: 9.5, 13.5); C-RCT, 9.0% (95% CI: 4.9, 13.1); and the controlled ITS analysis, 14.0% (95% CI: 8.6, 19.4). ITS analysis can provide an effect estimate that is concordant with the results of a cluster-randomized trial. A broader range of comparisons from other RCTs would help to determine whether these are generalizable results. Copyright © 2013 Elsevier Inc. All rights reserved.
Small Sample Performance of Bias-corrected Sandwich Estimators for Cluster-Randomized Trials with Binary Outcomes

PubMed Central

Li, Peng; Redden, David T.

2014-01-01

SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738
The improvement and simulation for LEACH clustering routing protocol

NASA Astrophysics Data System (ADS)

Ji, Ai-guo; Zhao, Jun-xiang

2017-01-01

An energy-balanced unequal multi-hop clustering routing protocol LEACH-EUMC is proposed in this paper. The candidate cluster head nodes are elected firstly, then they compete to be formal cluster head nodes by adding energy and distance factors, finally the date are transferred to sink through multi-hop. The results of simulation show that the improved algorithm is better than LEACH in network lifetime, energy consumption and the amount of data transmission.
Migration in the shearing sheet and estimates for young open cluster migration

NASA Astrophysics Data System (ADS)

Quillen, Alice C.; Nolting, Eric; Minchev, Ivan; De Silva, Gayandhi; Chiappini, Cristina

2018-04-01

Using tracer particles embedded in self-gravitating shearing sheet N-body simulations, we investigate the distance in guiding centre radius that stars or star clusters can migrate in a few orbital periods. The standard deviations of guiding centre distributions and maximum migration distances depend on the Toomre or critical wavelength and the contrast in mass surface density caused by spiral structure. Comparison between our simulations and estimated guiding radii for a few young supersolar metallicity open clusters, including NGC 6583, suggests that the contrast in mass surface density in the solar neighbourhood has standard deviation (in the surface density distribution) divided by mean of about 1/4 and larger than measured using COBE data by Drimmel and Spergel. Our estimate is consistent with a standard deviation of ˜0.07 dex in the metallicities measured from high-quality spectroscopic data for 38 young open clusters (<1 Gyr) with mean galactocentric radius 7-9 kpc.
Estimated Satellite Cluster Elements in Near Circular Orbit

DTIC Science & Technology

1988-12-01

cluster is investigated. TheAon-board estimator is the U-D covariance factor’xzatiion’filter with dynamics based on the Clohessy - Wiltshire equations...Appropriate values for the velocity vector vi can be found irom the Clohessy - Wiltshire equations [9] (these equations will be explained in detail in the...explained in this text is the f matrix. The state transition matrix was developed from the Clohessy - Wiltshire equations of motion [9:page 3] as i - 2qý
On the estimation of intracluster correlation for time-to-event outcomes in cluster randomized trials.

PubMed

Kalia, Sumeet; Klar, Neil; Donner, Allan

2016-12-30

Cluster randomized trials (CRTs) involve the random assignment of intact social units rather than independent subjects to intervention groups. Time-to-event outcomes often are endpoints in CRTs. Analyses of such data need to account for the correlation among cluster members. The intracluster correlation coefficient (ICC) is used to assess the similarity among binary and continuous outcomes that belong to the same cluster. However, estimating the ICC in CRTs with time-to-event outcomes is a challenge because of the presence of censored observations. The literature suggests that the ICC may be estimated using either censoring indicators or observed event times. A simulation study explores the effect of administrative censoring on estimating the ICC. Results show that ICC estimators derived from censoring indicators or observed event times are negatively biased. Analytic work further supports these results. Observed event times are preferred to estimate the ICC under minimum frequency of administrative censoring. To our knowledge, the existing literature provides no practical guidance on the estimation of ICC when substantial amount of administrative censoring is present. The results from this study corroborate the need for further methodological research on estimating the ICC for correlated time-to-event outcomes. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Research on the method of information system risk state estimation based on clustering particle filter

NASA Astrophysics Data System (ADS)

Cui, Jia; Hong, Bei; Jiang, Xuepeng; Chen, Qinghua

2017-05-01

With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering

PubMed Central

Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

2012-01-01

Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
An improved K-means clustering algorithm in agricultural image segmentation

NASA Astrophysics Data System (ADS)

Cheng, Huifeng; Peng, Hui; Liu, Shanmei

Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
Improved phase arrival estimate and location for local earthquakes in South Korea

NASA Astrophysics Data System (ADS)

Morton, E. A.; Rowe, C. A.; Begnaud, M. L.

2012-12-01

The Korean Institute of Geoscience and Mineral Resources (KIGAM) and the Korean Meteorological Agency (KMA) regularly report local (distance < ~1200 km) seismicity recorded with their networks; we obtain preliminary event location estimates as well as waveform data, but no phase arrivals are reported, so the data are not immediately useful for earthquake location. Our goal is to identify seismic events that are sufficiently well-located to provide accurate seismic travel-time information for events within the KIGAM and KMA networks, and also recorded by some regional stations. Toward that end, we are using a combination of manual phase identification and arrival-time picking, with waveform cross-correlation, to cluster events that have occurred in close proximity to one another, which allows for improved phase identification by comparing the highly correlating waveforms. We cross-correlate the known events with one another on 5 seismic stations and cluster events that correlate above a correlation coefficient threshold of 0.7, which reveals few clusters containing few events each. The small number of repeating events suggests that the online catalogs have had mining and quarry blasts removed before publication, as these can contribute significantly to repeating seismic sources in relatively aseismic regions such as South Korea. The dispersed source locations in our catalog, however, are ideal for seismic velocity modeling by providing superior sampling through the dense seismic station arrangement, which produces favorable event-to-station ray path coverage. Following careful manual phase picking on 104 events chosen to provide adequate ray coverage, we re-locate the events to obtain improved source coordinates. The re-located events are used with Thurber's Simul2000 pseudo-bending local tomography code to estimate the crustal structure on the Korean Peninsula, which is an important contribution to ongoing calibration for events of interest in the region.
A clustering approach to segmenting users of internet-based risk calculators.

PubMed

Harle, C A; Downs, J S; Padman, R

2011-01-01

Risk calculators are widely available Internet applications that deliver quantitative health risk estimates to consumers. Although these tools are known to have varying effects on risk perceptions, little is known about who will be more likely to accept objective risk estimates. To identify clusters of online health consumers that help explain variation in individual improvement in risk perceptions from web-based quantitative disease risk information. A secondary analysis was performed on data collected in a field experiment that measured people's pre-diabetes risk perceptions before and after visiting a realistic health promotion website that provided quantitative risk information. K-means clustering was performed on numerous candidate variable sets, and the different segmentations were evaluated based on between-cluster variation in risk perception improvement. Variation in responses to risk information was best explained by clustering on pre-intervention absolute pre-diabetes risk perceptions and an objective estimate of personal risk. Members of a high-risk overestimater cluster showed large improvements in their risk perceptions, but clusters of both moderate-risk and high-risk underestimaters were much more muted in improving their optimistically biased perceptions. Cluster analysis provided a unique approach for segmenting health consumers and predicting their acceptance of quantitative disease risk information. These clusters suggest that health consumers were very responsive to good news, but tended not to incorporate bad news into their self-perceptions much. These findings help to quantify variation among online health consumers and may inform the targeted marketing of and improvements to risk communication tools on the Internet.
Estimation of multiple accelerated motions using chirp-Fourier transform and clustering.

PubMed

Alexiadis, Dimitrios S; Sergiadis, George D

2007-01-01

Motion estimation in the spatiotemporal domain has been extensively studied and many methodologies have been proposed, which, however, cannot handle both time-varying and multiple motions. Extending previously published ideas, we present an efficient method for estimating multiple, linearly time-varying motions. It is shown that the estimation of accelerated motions is equivalent to the parameter estimation of superpositioned chirp signals. From this viewpoint, one can exploit established signal processing tools such as the chirp-Fourier transform. It is shown that accelerated motion results in energy concentration along planes in the 4-D space: spatial frequencies-temporal frequency-chirp rate. Using fuzzy c-planes clustering, we estimate the plane/motion parameters. The effectiveness of our method is verified on both synthetic as well as real sequences and its advantages are highlighted.
Unsupervised, Robust Estimation-based Clustering for Multispectral Images

NASA Technical Reports Server (NTRS)

Netanyahu, Nathan S.

1997-01-01

To prepare for the challenge of handling the archiving and querying of terabyte-sized scientific spatial databases, the NASA Goddard Space Flight Center's Applied Information Sciences Branch (AISB, Code 935) developed a number of characterization algorithms that rely on supervised clustering techniques. The research reported upon here has been aimed at continuing the evolution of some of these supervised techniques, namely the neural network and decision tree-based classifiers, plus extending the approach to incorporating unsupervised clustering algorithms, such as those based on robust estimation (RE) techniques. The algorithms developed under this task should be suited for use by the Intelligent Information Fusion System (IIFS) metadata extraction modules, and as such these algorithms must be fast, robust, and anytime in nature. Finally, so that the planner/schedule module of the IlFS can oversee the use and execution of these algorithms, all information required by the planner/scheduler must be provided to the IIFS development team to ensure the timely integration of these algorithms into the overall system.
An improved K-means clustering method for cDNA microarray image segmentation.

PubMed

Wang, T N; Li, T J; Shao, G F; Wu, S X

2015-07-14

Microarray technology is a powerful tool for human genetic research and other biomedical applications. Numerous improvements to the standard K-means algorithm have been carried out to complete the image segmentation step. However, most of the previous studies classify the image into two clusters. In this paper, we propose a novel K-means algorithm, which first classifies the image into three clusters, and then one of the three clusters is divided as the background region and the other two clusters, as the foreground region. The proposed method was evaluated on six different data sets. The analyses of accuracy, efficiency, expression values, special gene spots, and noise images demonstrate the effectiveness of our method in improving the segmentation quality.
Trust estimation of the semantic web using semantic web clustering

NASA Astrophysics Data System (ADS)

Shirgahi, Hossein; Mohsenzadeh, Mehran; Haj Seyyed Javadi, Hamid

2017-05-01

Development of semantic web and social network is undeniable in the Internet world these days. Widespread nature of semantic web has been very challenging to assess the trust in this field. In recent years, extensive researches have been done to estimate the trust of semantic web. Since trust of semantic web is a multidimensional problem, in this paper, we used parameters of social network authority, the value of pages links authority and semantic authority to assess the trust. Due to the large space of semantic network, we considered the problem scope to the clusters of semantic subnetworks and obtained the trust of each cluster elements as local and calculated the trust of outside resources according to their local trusts and trust of clusters to each other. According to the experimental result, the proposed method shows more than 79% Fscore that is about 11.9% in average more than Eigen, Tidal and centralised trust methods. Mean of error in this proposed method is 12.936, that is 9.75% in average less than Eigen and Tidal trust methods.
An improved approximate-Bayesian model-choice method for estimating shared evolutionary history

PubMed Central

2014-01-01

Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937

Spatio-temporal clustering and density estimation of lightning data for the tracking of convective events

NASA Astrophysics Data System (ADS)

Strauss, Cesar; Rosa, Marcelo Barbio; Stephany, Stephan

2013-12-01

Convective cells are cloud formations whose growth, maturation and dissipation are of great interest among meteorologists since they are associated with severe storms with large precipitation structures. Some works suggest a strong correlation between lightning occurrence and convective cells. The current work proposes a new approach to analyze the correlation between precipitation and lightning, and to identify electrically active cells. Such cells may be employed for tracking convective events in the absence of weather radar coverage. This approach employs a new spatio-temporal clustering technique based on a temporal sliding-window and a standard kernel density estimation to process lightning data. Clustering allows the identification of the cells from lightning data and density estimation bounds the contours of the cells. The proposed approach was evaluated for two convective events in Southeast Brazil. Image segmentation of radar data was performed to identify convective precipitation structures using the Steiner criteria. These structures were then compared and correlated to the electrically active cells in particular instants of time for both events. It was observed that most precipitation structures have associated cells, by comparing the ground tracks of their centroids. In addition, for one particular cell of each event, its temporal evolution was compared to that of the associated precipitation structure. Results show that the proposed approach may improve the use of lightning data for tracking convective events in countries that lack weather radar coverage.
Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.

PubMed

Zhu, Lin; Chung, Fu-Lai; Wang, Shitong

2009-06-01

The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.
Dynamical evolution of stellar mass black holes in dense stellar clusters: estimate for merger rate of binary black holes originating from globular clusters

NASA Astrophysics Data System (ADS)

Tanikawa, A.

2013-10-01

We have performed N-body simulations of globular clusters (GCs) in order to estimate a detection rate of mergers of binary stellar mass black holes (BBHs) by means of gravitational wave (GW) observatories. For our estimate, we have only considered mergers of BBHs which escape from GCs (BBH escapers). BBH escapers merge more quickly than BBHs inside GCs because of their small semimajor axes. N-body simulation cannot deal with a GC with the number of stars N ˜ 106 due to its high calculation cost. We have simulated dynamical evolution of small N clusters (104 ≲ N ≲ 105), and have extrapolated our simulation results to large N clusters. From our simulation results, we have found the following dependence of BBH properties on N. BBHs escape from a cluster at each two-body relaxation time at a rate proportional to N. Semimajor axes of BBH escapers are inversely proportional to N, if initial mass densities of clusters are fixed. Eccentricities, primary masses and mass ratios of BBH escapers are independent of N. Using this dependence of BBH properties, we have artificially generated a population of BBH escapers from a GC with N ˜ 106, and have estimated a detection rate of mergers of BBH escapers by next-generation GW observatories. We have assumed that all the GCs are formed 10 or 12 Gyr ago with their initial numbers of stars Ni = 5 × 105-2 × 106 and their initial stellar mass densities inside their half-mass radii ρh,i = 6 × 103-106 M⊙ pc-3. Then, the detection rate of BBH escapers is 0.5-20 yr-1 for a BH retention fraction RBH = 0.5. A few BBH escapers are components of hierarchical triple systems, although we do not consider secular perturbation on such BBH escapers for our estimate. Our simulations have shown that BHs are still inside some of GCs at the present day. These BHs may marginally contribute to BBH detection.
Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

PubMed

Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

2018-06-01

Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Testing the accuracy of clustering redshifts with simulations

NASA Astrophysics Data System (ADS)

Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

2018-03-01

We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure.

PubMed

Zhang, Wen; Xiao, Fan; Li, Bin; Zhang, Siguang

2016-01-01

Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.
Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

PubMed Central

Xiao, Fan; Li, Bin; Zhang, Siguang

2016-01-01

Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods. PMID:27579031
Estimators for Clustered Education RCTs Using the Neyman Model for Causal Inference

ERIC Educational Resources Information Center

Schochet, Peter Z.

2013-01-01

This article examines the estimation of two-stage clustered designs for education randomized control trials (RCTs) using the nonparametric Neyman causal inference framework that underlies experiments. The key distinction between the considered causal models is whether potential treatment and control group outcomes are considered to be fixed for…
A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure.

PubMed

Balzer, Laura B; Zheng, Wenjing; van der Laan, Mark J; Petersen, Maya L

2018-01-01

We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
Cluster-cluster clustering

NASA Technical Reports Server (NTRS)

Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.

1985-01-01

The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.
Cosmological parameter estimation from CMB and X-ray cluster after Planck

NASA Astrophysics Data System (ADS)

Hu, Jian-Wei; Cai, Rong-Gen; Guo, Zong-Kuan; Hu, Bin

2014-05-01

We investigate constraints on cosmological parameters in three 8-parameter models with the summed neutrino mass as a free parameter, by a joint analysis of CCCP X-ray cluster data, the newly released Planck CMB data as well as some external data sets including baryon acoustic oscillation measurements from the 6dFGS, SDSS DR7 and BOSS DR9 surveys, and Hubble Space Telescope H0 measurement. We find that the combined data strongly favor a non-zero neutrino masses at more than 3σ confidence level in these non-vanilla models. Allowing the CMB lensing amplitude AL to vary, we find AL > 1 at 3σ confidence level. For dark energy with a constant equation of state w, we obtain w < -1 at 3σ confidence level. The estimate of the matter power spectrum amplitude σ8 is discrepant with the Planck value at 2σ confidence level, which reflects some tension between X-ray cluster data and Planck data in these non-vanilla models. The tension can be alleviated by adding a 9% systematic shift in the cluster mass function.
Estimating cougar predation rates from GPS location clusters

USGS Publications Warehouse

Anderson, C.R.; Lindzey, F.G.

2003-01-01

We examined cougar (Puma concolor) predation from Global Positioning System (GPS) location clusters (???2 locations within 200 m on the same or consecutive nights) of 11 cougars during September-May, 1999-2001. Location success of GPS averaged 2.4-5.0 of 6 location attempts/night/cougar. We surveyed potential predation sites during summer-fall 2000 and summer 2001 to identify prey composition (n = 74; 3-388 days post predation) and record predation-site variables (n = 97; 3-270 days post predation). We developed a model to estimate probability that a cougar killed a large mammal from data collected at GPS location clusters where the probability of predation increased with number of nights (defined as locations at 2200, 0200, or 0500 hr) of cougar presence within a 200-m radius (P < 0.001). Mean estimated cougar predation rates for large mammals were 7.3 days/kill for subadult females (1-2.5 yr; n = 3, 90% CI: 6.3 to 9.9), 7.0 days/kill for adult females (n = 2, 90% CI: 5.8 to 10.8), 5.4 days/kill for family groups (females with young; n = 3, 90% CI: 4.5 to 8.4), 9.5 days/kill for a subadult male (1-2.5 yr; n = 1, 90% CI: 6.9 to 16.4), and 7.8 days/kill for adult males (n = 2, 90% CI: 6.8 to 10.7). We may have slightly overestimated cougar predation rates due to our inability to separate scavenging from predation. We detected 45 deer (Odocoileus spp.), 15 elk (Cervus elaphus), 6 pronghorn (Antilocapra americana), 2 livestock, 1 moose (Alces alces), and 6 small mammals at cougar predation sites. Comparisons between cougar sexes suggested that females selected mule deer and males selected elk (P < 0.001). Cougars averaged 3.0 nights on pronghorn carcasses, 3.4 nights on deer carcasses, and 6.0 nights on elk carcasses. Most cougar predation (81.7%) occurred between 1901-0500 hr and peaked from 2201-0200 hr (31.7%). Applying GPS technology to identify predation rates and prey selection will allow managers to efficiently estimate the ability of an area's prey base to
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Motion estimation in the frequency domain using fuzzy c-planes clustering.

PubMed

Erdem, C E; Karabulut, G Z; Yanmaz, E; Anarim, E

2001-01-01

A recent work explicitly models the discontinuous motion estimation problem in the frequency domain where the motion parameters are estimated using a harmonic retrieval approach. The vertical and horizontal components of the motion are independently estimated from the locations of the peaks of respective periodogram analyses and they are paired to obtain the motion vectors using a procedure proposed. In this paper, we present a more efficient method that replaces the motion component pairing task and hence eliminates the problems of the pairing method described. The method described in this paper uses the fuzzy c-planes (FCP) clustering approach to fit planes to three-dimensional (3-D) frequency domain data obtained from the peaks of the periodograms. Experimental results are provided to demonstrate the effectiveness of the proposed method.
Improving Spectral Image Classification through Band-Ratio Optimization and Pixel Clustering

NASA Astrophysics Data System (ADS)

O'Neill, M.; Burt, C.; McKenna, I.; Kimblin, C.

2017-12-01

The Underground Nuclear Explosion Signatures Experiment (UNESE) seeks to characterize non-prompt observables from underground nuclear explosions (UNE). As part of this effort, we evaluated the ability of DigitalGlobe's WorldView-3 (WV3) to detect and map UNE signatures. WV3 is the current state-of-the-art, commercial, multispectral imaging satellite; however, it has relatively limited spectral and spatial resolutions. These limitations impede image classifiers from detecting targets that are spatially small and lack distinct spectral features. In order to improve classification results, we developed custom algorithms to reduce false positive rates while increasing true positive rates via a band-ratio optimization and pixel clustering front-end. The clusters resulting from these algorithms were processed with standard spectral image classifiers such as Mixture-Tuned Matched Filter (MTMF) and Adaptive Coherence Estimator (ACE). WV3 and AVIRIS data of Cuprite, Nevada, were used as a validation data set. These data were processed with a standard classification approach using MTMF and ACE algorithms. They were also processed using the custom front-end prior to the standard approach. A comparison of the results shows that the custom front-end significantly increases the true positive rate and decreases the false positive rate.This work was done by National Security Technologies, LLC, under Contract No. DE-AC52-06NA25946 with the U.S. Department of Energy. DOE/NV/25946-3283.
An Example of an Improvable Rao-Blackwell Improvement, Inefficient Maximum Likelihood Estimator, and Unbiased Generalized Bayes Estimator.

PubMed

Galili, Tal; Meilijson, Isaac

2016-01-02

The Rao-Blackwell theorem offers a procedure for converting a crude unbiased estimator of a parameter θ into a "better" one, in fact unique and optimal if the improvement is based on a minimal sufficient statistic that is complete. In contrast, behind every minimal sufficient statistic that is not complete, there is an improvable Rao-Blackwell improvement. This is illustrated via a simple example based on the uniform distribution, in which a rather natural Rao-Blackwell improvement is uniformly improvable. Furthermore, in this example the maximum likelihood estimator is inefficient, and an unbiased generalized Bayes estimator performs exceptionally well. Counterexamples of this sort can be useful didactic tools for explaining the true nature of a methodology and possible consequences when some of the assumptions are violated. [Received December 2014. Revised September 2015.].
Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA

NASA Astrophysics Data System (ADS)

Perren, G. I.; Piatti, A. E.; Vázquez, R. A.

2017-06-01

Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89
Estimating Ω from Galaxy Redshifts: Linear Flow Distortions and Nonlinear Clustering

NASA Astrophysics Data System (ADS)

Bromley, B. C.; Warren, M. S.; Zurek, W. H.

1997-02-01

We propose a method to determine the cosmic mass density Ω from redshift-space distortions induced by large-scale flows in the presence of nonlinear clustering. Nonlinear structures in redshift space, such as fingers of God, can contaminate distortions from linear flows on scales as large as several times the small-scale pairwise velocity dispersion σv. Following Peacock & Dodds, we work in the Fourier domain and propose a model to describe the anisotropy in the redshift-space power spectrum; tests with high-resolution numerical data demonstrate that the model is robust for both mass and biased galaxy halos on translinear scales and above. On the basis of this model, we propose an estimator of the linear growth parameter β = Ω0.6/b, where b measures bias, derived from sampling functions that are tuned to eliminate distortions from nonlinear clustering. The measure is tested on the numerical data and found to recover the true value of β to within ~10%. An analysis of IRAS 1.2 Jy galaxies yields β=0.8+0.4-0.3 at a scale of 1000 km s-1, which is close to optimal given the shot noise and finite size of the survey. This measurement is consistent with dynamical estimates of β derived from both real-space and redshift-space information. The importance of the method presented here is that nonlinear clustering effects are removed to enable linear correlation anisotropy measurements on scales approaching the translinear regime. We discuss implications for analyses of forthcoming optical redshift surveys in which the dispersion is more than a factor of 2 greater than in the IRAS data.
Estimating accuracy of land-cover composition from two-stage cluster sampling

USGS Publications Warehouse

Stehman, S.V.; Wickham, J.D.; Fattorini, L.; Wade, T.D.; Baffetta, F.; Smith, J.H.

2009-01-01

Land-cover maps are often used to compute land-cover composition (i.e., the proportion or percent of area covered by each class), for each unit in a spatial partition of the region mapped. We derive design-based estimators of mean deviation (MD), mean absolute deviation (MAD), root mean square error (RMSE), and correlation (CORR) to quantify accuracy of land-cover composition for a general two-stage cluster sampling design, and for the special case of simple random sampling without replacement (SRSWOR) at each stage. The bias of the estimators for the two-stage SRSWOR design is evaluated via a simulation study. The estimators of RMSE and CORR have small bias except when sample size is small and the land-cover class is rare. The estimator of MAD is biased for both rare and common land-cover classes except when sample size is large. A general recommendation is that rare land-cover classes require large sample sizes to ensure that the accuracy estimators have small bias. ?? 2009 Elsevier Inc.
Improved Estimates of Thermodynamic Parameters

NASA Technical Reports Server (NTRS)

Lawson, D. D.

1982-01-01

Techniques refined for estimating heat of vaporization and other parameters from molecular structure. Using parabolic equation with three adjustable parameters, heat of vaporization can be used to estimate boiling point, and vice versa. Boiling points and vapor pressures for some nonpolar liquids were estimated by improved method and compared with previously reported values. Technique for estimating thermodynamic parameters should make it easier for engineers to choose among candidate heat-exchange fluids for thermochemical cycles.

Infant immunization coverage in Italy: estimates by simultaneous EPI cluster surveys of regions. ICONA Study Group.

PubMed Central

Salmaso, S.; Rota, M. C.; Ciofi Degli Atti, M. L.; Tozzi, A. E.; Kreidl, P.

1999-01-01

In 1998, a series of regional cluster surveys (the ICONA Study) was conducted simultaneously in 19 out of the 20 regions in Italy to estimate the mandatory immunization coverage of children aged 12-24 months with oral poliovirus (OPV), diphtheria-tetanus (DT) and viral hepatitis B (HBV) vaccines, as well as optional immunization coverage with pertussis, measles and Haemophilus influenzae b (Hib) vaccines. The study children were born in 1996 and selected from birth registries using the Expanded Programme of Immunization (EPI) cluster sampling technique. Interviews with parents were conducted to determine each child's immunization status and the reasons for any missed or delayed vaccinations. The study population comprised 4310 children aged 12-24 months. Coverage for both mandatory and optional vaccinations differed by region. The overall coverage for mandatory vaccines (OPV, DT and HBV) exceeded 94%, but only 79% had been vaccinated in accord with the recommended schedule (i.e. during the first year of life). Immunization coverage for pertussis increased from 40% (1993 survey) to 88%, but measles coverage (56%) remained inadequate for controlling the disease; Hib coverage was 20%. These results confirm that in Italy the coverage of only mandatory immunizations is satisfactory. Pertussis immunization coverage has improved dramatically since the introduction of acellular vaccines. A greater effort to educate parents and physicians is still needed to improve the coverage of optional vaccinations in all regions. PMID:10593033
Infant immunization coverage in Italy: estimates by simultaneous EPI cluster surveys of regions. ICONA Study Group.

PubMed

Salmaso, S; Rota, M C; Ciofi Degli Atti, M L; Tozzi, A E; Kreidl, P

1999-01-01

In 1998, a series of regional cluster surveys (the ICONA Study) was conducted simultaneously in 19 out of the 20 regions in Italy to estimate the mandatory immunization coverage of children aged 12-24 months with oral poliovirus (OPV), diphtheria-tetanus (DT) and viral hepatitis B (HBV) vaccines, as well as optional immunization coverage with pertussis, measles and Haemophilus influenzae b (Hib) vaccines. The study children were born in 1996 and selected from birth registries using the Expanded Programme of Immunization (EPI) cluster sampling technique. Interviews with parents were conducted to determine each child's immunization status and the reasons for any missed or delayed vaccinations. The study population comprised 4310 children aged 12-24 months. Coverage for both mandatory and optional vaccinations differed by region. The overall coverage for mandatory vaccines (OPV, DT and HBV) exceeded 94%, but only 79% had been vaccinated in accord with the recommended schedule (i.e. during the first year of life). Immunization coverage for pertussis increased from 40% (1993 survey) to 88%, but measles coverage (56%) remained inadequate for controlling the disease; Hib coverage was 20%. These results confirm that in Italy the coverage of only mandatory immunizations is satisfactory. Pertussis immunization coverage has improved dramatically since the introduction of acellular vaccines. A greater effort to educate parents and physicians is still needed to improve the coverage of optional vaccinations in all regions.
Application of adaptive cluster sampling to low-density populations of freshwater mussels

USGS Publications Warehouse

Smith, D.R.; Villella, R.F.; Lemarie, D.P.

2003-01-01

Freshwater mussels appear to be promising candidates for adaptive cluster sampling because they are benthic macroinvertebrates that cluster spatially and are frequently found at low densities. We applied adaptive cluster sampling to estimate density of freshwater mussels at 24 sites along the Cacapon River, WV, where a preliminary timed search indicated that mussels were present at low density. Adaptive cluster sampling increased yield of individual mussels and detection of uncommon species; however, it did not improve precision of density estimates. Because finding uncommon species, collecting individuals of those species, and estimating their densities are important conservation activities, additional research is warranted on application of adaptive cluster sampling to freshwater mussels. However, at this time we do not recommend routine application of adaptive cluster sampling to freshwater mussel populations. The ultimate, and currently unanswered, question is how to tell when adaptive cluster sampling should be used, i.e., when is a population sufficiently rare and clustered for adaptive cluster sampling to be efficient and practical? A cost-effective procedure needs to be developed to identify biological populations for which adaptive cluster sampling is appropriate.
Combining optimization methods with response spectra curve-fitting toward improved damping ratio estimation

NASA Astrophysics Data System (ADS)

Brewick, Patrick T.; Smyth, Andrew W.

2016-12-01

The authors have previously shown that many traditional approaches to operational modal analysis (OMA) struggle to properly identify the modal damping ratios for bridges under traffic loading due to the interference caused by the driving frequencies of the traffic loads. This paper presents a novel methodology for modal parameter estimation in OMA that overcomes the problems presented by driving frequencies and significantly improves the damping estimates. This methodology is based on finding the power spectral density (PSD) of a given modal coordinate, and then dividing the modal PSD into separate regions, left- and right-side spectra. The modal coordinates were found using a blind source separation (BSS) algorithm and a curve-fitting technique was developed that uses optimization to find the modal parameters that best fit each side spectra of the PSD. Specifically, a pattern-search optimization method was combined with a clustering analysis algorithm and together they were employed in a series of stages in order to improve the estimates of the modal damping ratios. This method was used to estimate the damping ratios from a simulated bridge model subjected to moving traffic loads. The results of this method were compared to other established OMA methods, such as Frequency Domain Decomposition (FDD) and BSS methods, and they were found to be more accurate and more reliable, even for modes that had their PSDs distorted or altered by driving frequencies.
Cosmological parameter estimation from CMB and X-ray cluster after Planck

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hu, Jian-Wei; Cai, Rong-Gen; Guo, Zong-Kuan

We investigate constraints on cosmological parameters in three 8-parameter models with the summed neutrino mass as a free parameter, by a joint analysis of CCCP X-ray cluster data, the newly released Planck CMB data as well as some external data sets including baryon acoustic oscillation measurements from the 6dFGS, SDSS DR7 and BOSS DR9 surveys, and Hubble Space Telescope H{sub 0} measurement. We find that the combined data strongly favor a non-zero neutrino masses at more than 3σ confidence level in these non-vanilla models. Allowing the CMB lensing amplitude A{sub L} to vary, we find A{sub L} > 1 atmore » 3σ confidence level. For dark energy with a constant equation of state w, we obtain w < −1 at 3σ confidence level. The estimate of the matter power spectrum amplitude σ{sub 8} is discrepant with the Planck value at 2σ confidence level, which reflects some tension between X-ray cluster data and Planck data in these non-vanilla models. The tension can be alleviated by adding a 9% systematic shift in the cluster mass function.« less
A smart checkpointing scheme for improving the reliability of clustering routing protocols.

PubMed

Min, Hong; Jung, Jinman; Kim, Bongjae; Cho, Yookun; Heo, Junyoung; Yi, Sangho; Hong, Jiman

2010-01-01

In wireless sensor networks, system architectures and applications are designed to consider both resource constraints and scalability, because such networks are composed of numerous sensor nodes with various sensors and actuators, small memories, low-power microprocessors, radio modules, and batteries. Clustering routing protocols based on data aggregation schemes aimed at minimizing packet numbers have been proposed to meet these requirements. In clustering routing protocols, the cluster head plays an important role. The cluster head collects data from its member nodes and aggregates the collected data. To improve reliability and reduce recovery latency, we propose a checkpointing scheme for the cluster head. In the proposed scheme, backup nodes monitor and checkpoint the current state of the cluster head periodically. We also derive the checkpointing interval that maximizes reliability while using the same amount of energy consumed by clustering routing protocols that operate without checkpointing. Experimental comparisons with existing non-checkpointing schemes show that our scheme reduces both energy consumption and recovery latency.
A Smart Checkpointing Scheme for Improving the Reliability of Clustering Routing Protocols

PubMed Central

Min, Hong; Jung, Jinman; Kim, Bongjae; Cho, Yookun; Heo, Junyoung; Yi, Sangho; Hong, Jiman

2010-01-01

In wireless sensor networks, system architectures and applications are designed to consider both resource constraints and scalability, because such networks are composed of numerous sensor nodes with various sensors and actuators, small memories, low-power microprocessors, radio modules, and batteries. Clustering routing protocols based on data aggregation schemes aimed at minimizing packet numbers have been proposed to meet these requirements. In clustering routing protocols, the cluster head plays an important role. The cluster head collects data from its member nodes and aggregates the collected data. To improve reliability and reduce recovery latency, we propose a checkpointing scheme for the cluster head. In the proposed scheme, backup nodes monitor and checkpoint the current state of the cluster head periodically. We also derive the checkpointing interval that maximizes reliability while using the same amount of energy consumed by clustering routing protocols that operate without checkpointing. Experimental comparisons with existing non-checkpointing schemes show that our scheme reduces both energy consumption and recovery latency. PMID:22163389
An effective fuzzy kernel clustering analysis approach for gene expression data.

PubMed

Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

2015-01-01

Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Distributed Noise Generation for Density Estimation Based Clustering without Trusted Third Party

NASA Astrophysics Data System (ADS)

Su, Chunhua; Bao, Feng; Zhou, Jianying; Takagi, Tsuyoshi; Sakurai, Kouichi

The rapid growth of the Internet provides people with tremendous opportunities for data collection, knowledge discovery and cooperative computation. However, it also brings the problem of sensitive information leakage. Both individuals and enterprises may suffer from the massive data collection and the information retrieval by distrusted parties. In this paper, we propose a privacy-preserving protocol for the distributed kernel density estimation-based clustering. Our scheme applies random data perturbation (RDP) technique and the verifiable secret sharing to solve the security problem of distributed kernel density estimation in [4] which assumed a mediate party to help in the computation.
Clustering ENTLN sferics to improve TGF temporal analysis

NASA Astrophysics Data System (ADS)

Pradhan, E.; Briggs, M. S.; Stanbro, M.; Cramer, E.; Heckman, S.; Roberts, O.

2017-12-01

Using TGFs detected with Fermi Gamma-ray Burst Monitor (GBM) and simultaneous radio sferics detected by Earth Network Total Lightning Network (ENTLN), we establish a temporal co-relation between them. The first step is to find ENTLN strokes that that are closely associated to GBM TGFs. We then identify all the related strokes in the lightning flash that the TGF-associated-stroke belongs to. After trying several algorithms, we found out that the DBSCAN clustering algorithm was best for clustering related ENTLN strokes into flashes. The operation of DBSCAN was optimized using a single seperation measure that combined time and distance seperation. Previous analysis found that these strokes show three timescales with respect to the gamma-ray time. We will use the improved identification of flashes to research this.
Uncertainties in the cluster-cluster correlation function

NASA Astrophysics Data System (ADS)

Ling, E. N.; Frenk, C. S.; Barrow, J. D.

1986-12-01

The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.
A Novel Tool Improves Existing Estimates of Recent Tuberculosis Transmission in Settings of Sparse Data Collection.

PubMed

Kasaie, Parastu; Mathema, Barun; Kelton, W David; Azman, Andrew S; Pennington, Jeff; Dowdy, David W

2015-01-01

In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission ("recent transmission proportion"), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional 'n-1' approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the 'n-1' technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the 'n-1' model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models' performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data.
A Novel Tool Improves Existing Estimates of Recent Tuberculosis Transmission in Settings of Sparse Data Collection

PubMed Central

Kasaie, Parastu; Mathema, Barun; Kelton, W. David; Azman, Andrew S.; Pennington, Jeff; Dowdy, David W.

2015-01-01

In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission (“recent transmission proportion”), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional ‘n-1’ approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the ‘n-1’ technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the ‘n-1’ model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models’ performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data. PMID:26679499
The observed clustering of damaging extra-tropical cyclones in Europe

NASA Astrophysics Data System (ADS)

Cusack, S.

2015-12-01

The clustering of severe European windstorms on annual timescales has substantial impacts on the re/insurance industry. Management of the risk is impaired by large uncertainties in estimates of clustering from historical storm datasets typically covering the past few decades. The uncertainties are unusually large because clustering depends on the variance of storm counts. Eight storm datasets are gathered for analysis in this study in order to reduce these uncertainties. Six of the datasets contain more than 100~years of severe storm information to reduce sampling errors, and the diversity of information sources and analysis methods between datasets sample observational errors. All storm severity measures used in this study reflect damage, to suit re/insurance applications. It is found that the shortest storm dataset of 42 years in length provides estimates of clustering with very large sampling and observational errors. The dataset does provide some useful information: indications of stronger clustering for more severe storms, particularly for southern countries off the main storm track. However, substantially different results are produced by removal of one stormy season, 1989/1990, which illustrates the large uncertainties from a 42-year dataset. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm datasets show a greater degree of clustering with increasing storm severity and suggest clustering of severe storms is much more material than weaker storms. Further, they contain signs of stronger clustering in areas off the main storm track, and weaker clustering for smaller-sized areas, though these signals are smaller than uncertainties in actual values. Both the improvement of existing storm records and development of new historical storm datasets would help to improve management of this risk.
Motion estimation using point cluster method and Kalman filter.

PubMed

Senesh, M; Wolf, A

2009-05-01

The most frequently used method in a three dimensional human gait analysis involves placing markers on the skin of the analyzed segment. This introduces a significant artifact, which strongly influences the bone position and orientation and joint kinematic estimates. In this study, we tested and evaluated the effect of adding a Kalman filter procedure to the previously reported point cluster technique (PCT) in the estimation of a rigid body motion. We demonstrated the procedures by motion analysis of a compound planar pendulum from indirect opto-electronic measurements of markers attached to an elastic appendage that is restrained to slide along the rigid body long axis. The elastic frequency is close to the pendulum frequency, as in the biomechanical problem, where the soft tissue frequency content is similar to the actual movement of the bones. Comparison of the real pendulum angle to that obtained by several estimation procedures--PCT, Kalman filter followed by PCT, and low pass filter followed by PCT--enables evaluation of the accuracy of the procedures. When comparing the maximal amplitude, no effect was noted by adding the Kalman filter; however, a closer look at the signal revealed that the estimated angle based only on the PCT method was very noisy with fluctuation, while the estimated angle based on the Kalman filter followed by the PCT was a smooth signal. It was also noted that the instantaneous frequencies obtained from the estimated angle based on the PCT method is more dispersed than those obtained from the estimated angle based on Kalman filter followed by the PCT method. Addition of a Kalman filter to the PCT method in the estimation procedure of rigid body motion results in a smoother signal that better represents the real motion, with less signal distortion than when using a digital low pass filter. Furthermore, it can be concluded that adding a Kalman filter to the PCT procedure substantially reduces the dispersion of the maximal and minimal
Propensity score matching with clustered data. An application to the estimation of the impact of caesarean section on the Apgar score.

PubMed

Arpino, Bruno; Cannas, Massimo

2016-05-30

This article focuses on the implementation of propensity score matching for clustered data. Different approaches to reduce bias due to cluster-level confounders are considered and compared using Monte Carlo simulations. We investigated methods that exploit the clustered structure of the data in two ways: in the estimation of the propensity score model (through the inclusion of fixed or random effects) or in the implementation of the matching algorithm. In addition to a pure within-cluster matching, we also assessed the performance of a new approach, 'preferential' within-cluster matching. This approach first searches for control units to be matched to treated units within the same cluster. If matching is not possible within-cluster, then the algorithm searches in other clusters. All considered approaches successfully reduced the bias due to the omission of a cluster-level confounder. The preferential within-cluster matching approach, combining the advantages of within-cluster and between-cluster matching, showed a relatively good performance both in the presence of big and small clusters, and it was often the best method. An important advantage of this approach is that it reduces the number of unmatched units as compared with a pure within-cluster matching. We applied these methods to the estimation of the effect of caesarean section on the Apgar score using birth register data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A review on cluster estimation methods and their application to neural spike data.

PubMed

Zhang, James; Nguyen, Thanh; Cogill, Steven; Bhatti, Asim; Luo, Lingkun; Yang, Samuel; Nahavandi, Saeid

2018-06-01

The extracellular action potentials recorded on an electrode result from the collective simultaneous electrophysiological activity of an unknown number of neurons. Identifying and assigning these action potentials to their firing neurons-'spike sorting'-is an indispensable step in studying the function and the response of an individual or ensemble of neurons to certain stimuli. Given the task of neural spike sorting, the determination of the number of clusters (neurons) is arguably the most difficult and challenging issue, due to the existence of background noise and the overlap and interactions among neurons in neighbouring regions. It is not surprising that some researchers still rely on visual inspection by experts to estimate the number of clusters in neural spike sorting. Manual inspection, however, is not suitable to processing the vast, ever-growing amount of neural data. To address this pressing need, in this paper, thirty-three clustering validity indices have been comprehensively reviewed and implemented to determine the number of clusters in neural datasets. To gauge the suitability of the indices to neural spike data, and inform the selection process, we then calculated the indices by applying k-means clustering to twenty widely used synthetic neural datasets and one empirical dataset, and compared the performance of these indices against pre-existing ground truth labels. The results showed that the top five validity indices work consistently well across variations in noise level, both for the synthetic datasets and the real dataset. Using these top performing indices provides strong support for the determination of the number of neural clusters, which is essential in the spike sorting process.
Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

PubMed Central

2014-01-01

Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations
The Effect of Mergers on Galaxy Cluster Mass Estimates

NASA Astrophysics Data System (ADS)

Johnson, Ryan E.; Zuhone, John A.; Thorsen, Tessa; Hinds, Andre

2015-08-01

At vertices within the filamentary structure that describes the universal matter distribution, clusters of galaxies grow hierarchically through merging with other clusters. As such, the most massive galaxy clusters should have experienced many such mergers in their histories. Though we cannot see them evolve over time, these mergers leave lasting, measurable effects in the cluster galaxies' phase space. By simulating several different galaxy cluster mergers here, we examine how the cluster galaxies kinematics are altered as a result of these mergers. Further, we also examine the effect of our line of sight viewing angle with respect to the merger axis. In projecting the 6-dimensional galaxy phase space onto a 3-dimensional plane, we are able to simulate how these clusters might actually appear to optical redshift surveys. We find that for those optical cluster statistics which are most often used as a proxy for the cluster mass (variants of σv), the uncertainty due to an inprecise or unknown line of sight may alter the derived cluster masses moreso than the kinematic disturbance of the merger itself. Finally, by examining these, and several other clustering statistics, we find that significant events (such as pericentric crossings) are identifiable over a range of merger initial conditions and from many different lines of sight.
The observed clustering of damaging extratropical cyclones in Europe

NASA Astrophysics Data System (ADS)

Cusack, Stephen

2016-04-01

The clustering of severe European windstorms on annual timescales has substantial impacts on the (re-)insurance industry. Our knowledge of the risk is limited by large uncertainties in estimates of clustering from typical historical storm data sets covering the past few decades. Eight storm data sets are gathered for analysis in this study in order to reduce these uncertainties. Six of the data sets contain more than 100 years of severe storm information to reduce sampling errors, and observational errors are reduced by the diversity of information sources and analysis methods between storm data sets. All storm severity measures used in this study reflect damage, to suit (re-)insurance applications. The shortest storm data set of 42 years provides indications of stronger clustering with severity, particularly for regions off the main storm track in central Europe and France. However, clustering estimates have very large sampling and observational errors, exemplified by large changes in estimates in central Europe upon removal of one stormy season, 1989/1990. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm data sets show increased clustering between more severe storms from return periods (RPs) of 0.5 years to the longest measured RPs of about 20 years. Further, they contain signs of stronger clustering off the main storm track, and weaker clustering for smaller-sized areas, though these signals are more uncertain as they are drawn from smaller data samples. These new ultra-long storm data sets provide new information on clustering to improve our management of this risk.

Analysis of partially observed clustered data using generalized estimating equations and multiple imputation

PubMed Central

Aloisio, Kathryn M.; Swanson, Sonja A.; Micali, Nadia; Field, Alison; Horton, Nicholas J.

2015-01-01

Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple–source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject’s symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non–monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991–92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness. PMID:25642154
A review on cluster estimation methods and their application to neural spike data

NASA Astrophysics Data System (ADS)

Zhang, James; Nguyen, Thanh; Cogill, Steven; Bhatti, Asim; Luo, Lingkun; Yang, Samuel; Nahavandi, Saeid

2018-06-01

The extracellular action potentials recorded on an electrode result from the collective simultaneous electrophysiological activity of an unknown number of neurons. Identifying and assigning these action potentials to their firing neurons—‘spike sorting’—is an indispensable step in studying the function and the response of an individual or ensemble of neurons to certain stimuli. Given the task of neural spike sorting, the determination of the number of clusters (neurons) is arguably the most difficult and challenging issue, due to the existence of background noise and the overlap and interactions among neurons in neighbouring regions. It is not surprising that some researchers still rely on visual inspection by experts to estimate the number of clusters in neural spike sorting. Manual inspection, however, is not suitable to processing the vast, ever-growing amount of neural data. To address this pressing need, in this paper, thirty-three clustering validity indices have been comprehensively reviewed and implemented to determine the number of clusters in neural datasets. To gauge the suitability of the indices to neural spike data, and inform the selection process, we then calculated the indices by applying k-means clustering to twenty widely used synthetic neural datasets and one empirical dataset, and compared the performance of these indices against pre-existing ground truth labels. The results showed that the top five validity indices work consistently well across variations in noise level, both for the synthetic datasets and the real dataset. Using these top performing indices provides strong support for the determination of the number of neural clusters, which is essential in the spike sorting process.
Large Crater Clustering tool

NASA Astrophysics Data System (ADS)

Laura, Jason; Skinner, James A.; Hunter, Marc A.

2017-08-01

In this paper we present the Large Crater Clustering (LCC) tool set, an ArcGIS plugin that supports the quantitative approximation of a primary impact location from user-identified locations of possible secondary impact craters or the long-axes of clustered secondary craters. The identification of primary impact craters directly supports planetary geologic mapping and topical science studies where the chronostratigraphic age of some geologic units may be known, but more distant features have questionable geologic ages. Previous works (e.g., McEwen et al., 2005; Dundas and McEwen, 2007) have shown that the source of secondary impact craters can be estimated from secondary impact craters. This work adapts those methods into a statistically robust tool set. We describe the four individual tools within the LCC tool set to support: (1) processing individually digitized point observations (craters), (2) estimating the directional distribution of a clustered set of craters, back projecting the potential flight paths (crater clusters or linearly approximated catenae or lineaments), (3) intersecting projected paths, and (4) intersecting back-projected trajectories to approximate the local of potential source primary craters. We present two case studies using secondary impact features mapped in two regions of Mars. We demonstrate that the tool is able to quantitatively identify primary impacts and supports the improved qualitative interpretation of potential secondary crater flight trajectories.
Integrated spectral properties of 7 galactic open clusters

NASA Astrophysics Data System (ADS)

Ahumada, A. V.; Clariá, J. J.; Bica, E.; Piatti, A. E.

2000-01-01

This paper presents flux-calibrated integrated spectra in the range 3600-9000 Ä for 7 concentrated, relatively populous Galactic open clusters. We perform simultaneous estimates of age and foreground interstellar reddening by comparing the continuum distribution and line strengths of the cluster spectra with those of template cluster spectra with known parameters. For five clusters these two parameters have been determined for the first time (Ruprecht 144, BH 132, Pismis 21, Lyng\\aa 11 and BH 217), while the results here derived for the remaining two clusters (Hogg 15 and Melotte 105) show very good agreement with previous studies based mainly on colour-magnitude diagrams. We also provide metallicity estimates for six clusters from the equivalent widths of CaII triplet and TiO features. The present cluster sample improves the age resolution around solar metal content in the cluster spectral library for population synthesis. We compare the properties of the present sample with those of clusters in similar directions. Hogg 15 and Pismis 21 are among the most reddened clusters in sectors centered at l = 270o and l = 0o, respectively. Besides, the present results would favour an important dissolution rate of star clusters in these zones. Based on observations made at Complejo Astronómico El Leoncito, which is operated under agreement between the Consejo Nacional de Investigaciones Científicas y Técnicas de la República Argentina and the National Universities of La Plata, Córdoba and San Juan, Argentina.
Reporting and methodological quality of sample size calculations in cluster randomized trials could be improved: a review.

PubMed

Rutterford, Clare; Taljaard, Monica; Dixon, Stephanie; Copas, Andrew; Eldridge, Sandra

2015-06-01

To assess the quality of reporting and accuracy of a priori estimates used in sample size calculations for cluster randomized trials (CRTs). We reviewed 300 CRTs published between 2000 and 2008. The prevalence of reporting sample size elements from the 2004 CONSORT recommendations was evaluated and a priori estimates compared with those observed in the trial. Of the 300 trials, 166 (55%) reported a sample size calculation. Only 36 of 166 (22%) reported all recommended descriptive elements. Elements specific to CRTs were the worst reported: a measure of within-cluster correlation was specified in only 58 of 166 (35%). Only 18 of 166 articles (11%) reported both a priori and observed within-cluster correlation values. Except in two cases, observed within-cluster correlation values were either close to or less than a priori values. Even with the CONSORT extension for cluster randomization, the reporting of sample size elements specific to these trials remains below that necessary for transparent reporting. Journal editors and peer reviewers should implement stricter requirements for authors to follow CONSORT recommendations. Authors should report observed and a priori within-cluster correlation values to enable comparisons between these over a wider range of trials. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Evolving Improvements to TRMM Ground Validation Rainfall Estimates

NASA Technical Reports Server (NTRS)

Robinson, M.; Kulie, M. S.; Marks, D. A.; Wolff, D. B.; Ferrier, B. S.; Amitai, E.; Silberstein, D. S.; Fisher, B. L.; Wang, J.; Einaudi, Franco (Technical Monitor)

2000-01-01

The primary function of the TRMM Ground Validation (GV) Program is to create GV rainfall products that provide basic validation of satellite-derived precipitation measurements for select primary sites. Since the successful 1997 launch of the TRMM satellite, GV rainfall estimates have demonstrated systematic improvements directly related to improved radar and rain gauge data, modified science techniques, and software revisions. Improved rainfall estimates have resulted in higher quality GV rainfall products and subsequently, much improved evaluation products for the satellite-based precipitation estimates from TRMM. This presentation will demonstrate how TRMM GV rainfall products created in a semi-automated, operational environment have evolved and improved through successive generations. Monthly rainfall maps and rainfall accumulation statistics for each primary site will be presented for each stage of GV product development. Contributions from individual product modifications involving radar reflectivity (Ze)-rain rate (R) relationship refinements, improvements in rain gauge bulk-adjustment and data quality control processes, and improved radar and gauge data will be discussed. Finally, it will be demonstrated that as GV rainfall products have improved, rainfall estimation comparisons between GV and satellite have converged, lending confidence to the satellite-derived precipitation measurements from TRMM.
Improving the Discipline of Cost Estimation and Analysis

NASA Technical Reports Server (NTRS)

Piland, William M.; Pine, David J.; Wilson, Delano M.

2000-01-01

The need to improve the quality and accuracy of cost estimates of proposed new aerospace systems has been widely recognized. The industry has done the best job of maintaining related capability with improvements in estimation methods and giving appropriate priority to the hiring and training of qualified analysts. Some parts of Government, and National Aeronautics and Space Administration (NASA) in particular, continue to need major improvements in this area. Recently, NASA recognized that its cost estimation and analysis capabilities had eroded to the point that the ability to provide timely, reliable estimates was impacting the confidence in planning man), program activities. As a result, this year the Agency established a lead role for cost estimation and analysis. The Independent Program Assessment Office located at the Langley Research Center was given this responsibility.
Estimates of cloud radiative forcing in contrail clusters using GOES imagery

NASA Astrophysics Data System (ADS)

Duda, David P.; Minnis, Patrick; Nguyen, Louis

2001-03-01

Using data from the Geostationary Operational Environmental Satellite (GOES), the evolution of solar and longwave radiative forcing in contrail clusters is presented in several case studies. The first study examines contrails developing over the midwestern United States in a region of upper tropospheric moisture enhanced by the remnants of Hurricane Nora on September 26, 1997. Two other cases involve contrail clusters that formed over the Chesapeake Bay and the Atlantic Ocean on February 11 and March 5, 1999, respectively. The last study includes contrails forming over the tropical Pacific near Hawaii. Observations of tropical contrails near Hawaii show that the contrail optical properties are similar to those measured from satellite in the midlatitudes, with visible optical depths between 0.3 and 0.5 and particle sizes between 30 and 60 μm as the contrails mature into diffuse cloudiness. Radiative transfer model simulations of the tropical contrail case suggest that ice crystal shape may have an important effect on radiative forcing in contrails. The magnitudes of the observed solar and longwave radiative forcings were 5.6 and 3.2 W m-2 less than those from the corresponding model simulations, and these differences are attributed to the subpixel scale low clouds and uncertainties in the anisotropic reflectance and limb-darkening models used to estimate the observed forcing. Since the broadband radiative forcing in contrails often changes rapidly, contrail forcing estimates based only on the polar orbiting advanced very high resolution radiometer (AVHRR) data could be inaccurate due to the lack of sufficient temporal sampling.
The X-ray cluster survey with eRosita: forecasts for cosmology, cluster physics and primordial non-Gaussianity

NASA Astrophysics Data System (ADS)

Pillepich, Annalisa; Porciani, Cristiano; Reiprich, Thomas H.

2012-05-01

Starting in late 2013, the eRosita telescope will survey the X-ray sky with unprecedented sensitivity. Assuming a detection limit of 50 photons in the (0.5-2.0) keV energy band with a typical exposure time of 1.6 ks, we predict that eRosita will detect ˜9.3 × 104 clusters of galaxies more massive than 5 × 1013 h-1 M⊙, with the currently planned all-sky survey. Their median redshift will be z≃ 0.35. We perform a Fisher-matrix analysis to forecast the constraining power of ? on the Λ cold dark matter (ΛCDM) cosmology and, simultaneously, on the X-ray scaling relations for galaxy clusters. Special attention is devoted to the possibility of detecting primordial non-Gaussianity. We consider two experimental probes: the number counts and the angular clustering of a photon-count limited sample of clusters. We discuss how the cluster sample should be split to optimize the analysis and we show that redshift information of the individual clusters is vital to break the strong degeneracies among the model parameters. For example, performing a 'tomographic' analysis based on photometric-redshift estimates and combining one- and two-point statistics will give marginal 1σ errors of Δσ8≃ 0.036 and ΔΩm≃ 0.012 without priors, and improve the current estimates on the slope of the luminosity-mass relation by a factor of 3. Regarding primordial non-Gaussianity, ? clusters alone will give ΔfNL≃ 9, 36 and 144 for the local, orthogonal and equilateral model, respectively. Measuring redshifts with spectroscopic accuracy would further tighten the constraints by nearly 40 per cent (barring fNL which displays smaller improvements). Finally, combining ? data with the analysis of temperature anisotropies in the cosmic microwave background by the Planck satellite should give sensational constraints on both the cosmology and the properties of the intracluster medium.
Estimating overall exposure effects for the clustered and censored outcome using random effect Tobit regression models.

PubMed

Wang, Wei; Griswold, Michael E

2016-11-30

The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Calibrating the Planck cluster mass scale with cluster velocity dispersions

NASA Astrophysics Data System (ADS)

Amodeo, S.; Mei, S.; Stanford, S. A.; Bartlett, J. G.; Lawrence, C. L.; Chary, R. R.; Shim, H.; Marleau, F.; Stern, D.

2017-12-01

The potential of galaxy clusters as cosmological probes critically depends on the capability to obtain accurate estimates of their mass. This will be a key measurement for the next generation of cosmological surveys, such as Euclid. The discrepancy between the cosmological parameters determined from anisotropies in the cosmic microwave background and those derived from cluster abundance measurements from the Planck satellite calls for careful evaluation of systematic biases in cluster mass estimates. For this purpose, it is crucial to use independent techniques, like analysis of the thermal emission of the intracluster medium (ICM), observed either in the X-rays or through the Sunyaev-Zeldovich (SZ) effect, dynamics of member galaxies or gravitational lensing. We discuss possible bias in the Planck SZ mass proxy, which is based on X-ray observations. Using optical spectroscopy from the Gemini Multi-Object Spectrograph of 17 Planck-selected clusters, we present new estimates of the cluster mass based on the velocity dispersion of the member galaxies and independently of the ICM properties. We show how the difference between the velocity dispersion of galaxy and dark matter particles in simulations is the primary factor limiting interpretation of dynamical cluster mass measurements at this time, and we give the first observational constraints on the velocity bias.
A First Estimate of the X-Ray Binary Frequency as a Function of Star Cluster Mass in a Single Galactic System

NASA Astrophysics Data System (ADS)

Clark, D. M.; Eikenberry, S. S.; Brandl, B. R.; Wilson, J. C.; Carson, J. C.; Henderson, C. P.; Hayward, T. L.; Barry, D. J.; Ptak, A. F.; Colbert, E. J. M.

2008-05-01

We use the previously identified 15 infrared star cluster counterparts to X-ray point sources in the interacting galaxies NGC 4038/4039 (the Antennae) to study the relationship between total cluster mass and X-ray binary number. This significant population of X-Ray/IR associations allows us to perform, for the first time, a statistical study of X-ray point sources and their environments. We define a quantity, η, relating the fraction of X-ray sources per unit mass as a function of cluster mass in the Antennae. We compute cluster mass by fitting spectral evolutionary models to Ks luminosity. Considering that this method depends on cluster age, we use four different age distributions to explore the effects of cluster age on the value of η and find it varies by less than a factor of 4. We find a mean value of η for these different distributions of η = 1.7 × 10-8 M-1⊙ with ση = 1.2 × 10-8 M-1⊙. Performing a χ2 test, we demonstrate η could exhibit a positive slope, but that it depends on the assumed distribution in cluster ages. While the estimated uncertainties in η are factors of a few, we believe this is the first estimate made of this quantity to "order of magnitude" accuracy. We also compare our findings to theoretical models of open and globular cluster evolution, incorporating the X-ray binary fraction per cluster.
Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

PubMed

Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

2017-04-01

Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Superframe Duration Allocation Schemes to Improve the Throughput of Cluster-Tree Wireless Sensor Networks

PubMed Central

Leão, Erico; Montez, Carlos; Moraes, Ricardo; Portugal, Paulo; Vasques, Francisco

2017-01-01

The use of Wireless Sensor Network (WSN) technologies is an attractive option to support wide-scale monitoring applications, such as the ones that can be found in precision agriculture, environmental monitoring and industrial automation. The IEEE 802.15.4/ZigBee cluster-tree topology is a suitable topology to build wide-scale WSNs. Despite some of its known advantages, including timing synchronisation and duty-cycle operation, cluster-tree networks may suffer from severe network congestion problems due to the convergecast pattern of its communication traffic. Therefore, the careful adjustment of transmission opportunities (superframe durations) allocated to the cluster-heads is an important research issue. This paper proposes a set of proportional Superframe Duration Allocation (SDA) schemes, based on well-defined protocol and timing models, and on the message load imposed by child nodes (Load-SDA scheme), or by number of descendant nodes (Nodes-SDA scheme) of each cluster-head. The underlying reasoning is to adequately allocate transmission opportunities (superframe durations) and parametrize buffer sizes, in order to improve the network throughput and avoid typical problems, such as: network congestion, high end-to-end communication delays and discarded messages due to buffer overflows. Simulation assessments show how proposed allocation schemes may clearly improve the operation of wide-scale cluster-tree networks. PMID:28134822
Improving the distinguishable cluster results: spin-component scaling

NASA Astrophysics Data System (ADS)

Kats, Daniel

2018-06-01

The spin-component scaling is employed in the energy evaluation to improve the distinguishable cluster approach. SCS-DCSD reaction energies reproduce reference values with a root-mean-squared deviation well below 1 kcal/mol, the interaction energies are three to five times more accurate than DCSD, and molecular systems with a large amount of static electron correlation are still described reasonably well. SCS-DCSD represents a pragmatic approach to achieve chemical accuracy with a simple method without triples, which can also be applied to multi-configurational molecular systems.
A fully nonparametric estimator of the marginal survival function based on case–control clustered age-at-onset data

PubMed Central

Gorfine, Malka; Bordo, Nadia; Hsu, Li

2017-01-01

Summary Consider a popular case–control family study where individuals with a disease under study (case probands) and individuals who do not have the disease (control probands) are randomly sampled from a well-defined population. Possibly right-censored age at onset and disease status are observed for both probands and their relatives. For example, case probands are men diagnosed with prostate cancer, control probands are men free of prostate cancer, and the prostate cancer history of the fathers of the probands is also collected. Inherited genetic susceptibility, shared environment, and common behavior lead to correlation among the outcomes within a family. In this article, a novel nonparametric estimator of the marginal survival function is provided. The estimator is defined in the presence of intra-cluster dependence, and is based on consistent smoothed kernel estimators of conditional survival functions. By simulation, it is shown that the proposed estimator performs very well in terms of bias. The utility of the estimator is illustrated by the analysis of case–control family data of early onset prostate cancer. To our knowledge, this is the first article that provides a fully nonparametric marginal survival estimator based on case–control clustered age-at-onset data. PMID:27436674
NASA Software Cost Estimation Model: An Analogy Based Estimation Model

NASA Technical Reports Server (NTRS)

Hihn, Jairus; Juster, Leora; Menzies, Tim; Mathew, George; Johnson, James

2015-01-01

The cost estimation of software development activities is increasingly critical for large scale integrated projects such as those at DOD and NASA especially as the software systems become larger and more complex. As an example MSL (Mars Scientific Laboratory) developed at the Jet Propulsion Laboratory launched with over 2 million lines of code making it the largest robotic spacecraft ever flown (Based on the size of the software). Software development activities are also notorious for their cost growth, with NASA flight software averaging over 50% cost growth. All across the agency, estimators and analysts are increasingly being tasked to develop reliable cost estimates in support of program planning and execution. While there has been extensive work on improving parametric methods there is very little focus on the use of models based on analogy and clustering algorithms. In this paper we summarize our findings on effort/cost model estimation and model development based on ten years of software effort estimation research using data mining and machine learning methods to develop estimation models based on analogy and clustering. The NASA Software Cost Model performance is evaluated by comparing it to COCOMO II, linear regression, and K- nearest neighbor prediction model performance on the same data set.
Comparative assessment of bone pose estimation using Point Cluster Technique and OpenSim.

PubMed

Lathrop, Rebecca L; Chaudhari, Ajit M W; Siston, Robert A

2011-11-01

Estimating the position of the bones from optical motion capture data is a challenge associated with human movement analysis. Bone pose estimation techniques such as the Point Cluster Technique (PCT) and simulations of movement through software packages such as OpenSim are used to minimize soft tissue artifact and estimate skeletal position; however, using different methods for analysis may produce differing kinematic results which could lead to differences in clinical interpretation such as a misclassification of normal or pathological gait. This study evaluated the differences present in knee joint kinematics as a result of calculating joint angles using various techniques. We calculated knee joint kinematics from experimental gait data using the standard PCT, the least squares approach in OpenSim applied to experimental marker data, and the least squares approach in OpenSim applied to the results of the PCT algorithm. Maximum and resultant RMS differences in knee angles were calculated between all techniques. We observed differences in flexion/extension, varus/valgus, and internal/external rotation angles between all approaches. The largest differences were between the PCT results and all results calculated using OpenSim. The RMS differences averaged nearly 5° for flexion/extension angles with maximum differences exceeding 15°. Average RMS differences were relatively small (< 1.08°) between results calculated within OpenSim, suggesting that the choice of marker weighting is not critical to the results of the least squares inverse kinematics calculations. The largest difference between techniques appeared to be a constant offset between the PCT and all OpenSim results, which may be due to differences in the definition of anatomical reference frames, scaling of musculoskeletal models, and/or placement of virtual markers within OpenSim. Different methods for data analysis can produce largely different kinematic results, which could lead to the misclassification
Improved Critical Eigenfunction Restriction Estimates on Riemannian Surfaces with Nonpositive Curvature

NASA Astrophysics Data System (ADS)

Xi, Yakun; Zhang, Cheng

2017-03-01

We show that one can obtain improved L 4 geodesic restriction estimates for eigenfunctions on compact Riemannian surfaces with nonpositive curvature. We achieve this by adapting Sogge's strategy in (Improved critical eigenfunction estimates on manifolds of nonpositive curvature, Preprint). We first combine the improved L 2 restriction estimate of Blair and Sogge (Concerning Toponogov's Theorem and logarithmic improvement of estimates of eigenfunctions, Preprint) and the classical improved {L^∞} estimate of Bérard to obtain an improved weak-type L 4 restriction estimate. We then upgrade this weak estimate to a strong one by using the improved Lorentz space estimate of Bak and Seeger (Math Res Lett 18(4):767-781, 2011). This estimate improves the L 4 restriction estimate of Burq et al. (Duke Math J 138:445-486, 2007) and Hu (Forum Math 6:1021-1052, 2009) by a power of {(log logλ)^{-1}}. Moreover, in the case of compact hyperbolic surfaces, we obtain further improvements in terms of {(logλ)^{-1}} by applying the ideas from (Chen and Sogge, Commun Math Phys 329(3):435-459, 2014) and (Blair and Sogge, Concerning Toponogov's Theorem and logarithmic improvement of estimates of eigenfunctions, Preprint). We are able to compute various constants that appeared in (Chen and Sogge, Commun Math Phys 329(3):435-459, 2014) explicitly, by proving detailed oscillatory integral estimates and lifting calculations to the universal cover H^2.
A strategy for analysis of (molecular) equilibrium simulations: Configuration space density estimation, clustering, and visualization

NASA Astrophysics Data System (ADS)

Hamprecht, Fred A.; Peter, Christine; Daura, Xavier; Thiel, Walter; van Gunsteren, Wilfred F.

2001-02-01

We propose an approach for summarizing the output of long simulations of complex systems, affording a rapid overview and interpretation. First, multidimensional scaling techniques are used in conjunction with dimension reduction methods to obtain a low-dimensional representation of the configuration space explored by the system. A nonparametric estimate of the density of states in this subspace is then obtained using kernel methods. The free energy surface is calculated from that density, and the configurations produced in the simulation are then clustered according to the topography of that surface, such that all configurations belonging to one local free energy minimum form one class. This topographical cluster analysis is performed using basin spanning trees which we introduce as subgraphs of Delaunay triangulations. Free energy surfaces obtained in dimensions lower than four can be visualized directly using iso-contours and -surfaces. Basin spanning trees also afford a glimpse of higher-dimensional topographies. The procedure is illustrated using molecular dynamics simulations on the reversible folding of peptide analoga. Finally, we emphasize the intimate relation of density estimation techniques to modern enhanced sampling algorithms.

Improved fuzzy clustering algorithms in segmentation of DC-enhanced breast MRI.

PubMed

Kannan, S R; Ramathilagam, S; Devi, Pandiyarajan; Sathya, A

2012-02-01

Segmentation of medical images is a difficult and challenging problem due to poor image contrast and artifacts that result in missing or diffuse organ/tissue boundaries. Many researchers have applied various techniques however fuzzy c-means (FCM) based algorithms is more effective compared to other methods. The objective of this work is to develop some robust fuzzy clustering segmentation systems for effective segmentation of DCE - breast MRI. This paper obtains the robust fuzzy clustering algorithms by incorporating kernel methods, penalty terms, tolerance of the neighborhood attraction, additional entropy term and fuzzy parameters. The initial centers are obtained using initialization algorithm to reduce the computation complexity and running time of proposed algorithms. Experimental works on breast images show that the proposed algorithms are effective to improve the similarity measurement, to handle large amount of noise, to have better results in dealing the data corrupted by noise, and other artifacts. The clustering results of proposed methods are validated using Silhouette Method.
IMPROVING BIOGENIC EMISSION ESTIMATES WITH SATELLITE IMAGERY

EPA Science Inventory

This presentation will review how existing and future applications of satellite imagery can improve the accuracy of biogenic emission estimates. Existing applications of satellite imagery to biogenic emission estimates have focused on characterizing land cover. Vegetation dat...
Clustering-based urbanisation to improve enterprise information systems agility

NASA Astrophysics Data System (ADS)

Imache, Rabah; Izza, Said; Ahmed-Nacer, Mohamed

2015-11-01

Enterprises are daily facing pressures to demonstrate their ability to adapt quickly to the unpredictable changes of their dynamic in terms of technology, social, legislative, competitiveness and globalisation. Thus, to ensure its place in this hard context, enterprise must always be agile and must ensure its sustainability by a continuous improvement of its information system (IS). Therefore, the agility of enterprise information systems (EISs) can be considered today as a primary objective of any enterprise. One way of achieving this objective is by the urbanisation of the EIS in the context of continuous improvement to make it a real asset servicing enterprise strategy. This paper investigates the benefits of EISs urbanisation based on clustering techniques as a driver for agility production and/or improvement to help managers and IT management departments to improve continuously the performance of the enterprise and make appropriate decisions in the scope of the enterprise objectives and strategy. This approach is applied to the urbanisation of a tour operator EIS.
Improved nonorthogonal tight-binding Hamiltonian for molecular-dynamics simulations of silicon clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ordejon, P.; Lebedenko, D.; Menon, M.

1994-08-15

We present an improvement over the nonorthogonal tight-binding molecular-dynamics scheme recently proposed by Menon and Subbaswamy [Phys. Rev. B 47, 12 754 (1993)]. The proper treatment of the nonorthogonality and its effect on the Hamiltonian matrix elements has been found to obviate the need for a bond-counting term, leaving only two adjustable parameters in the formalism. With the improved parametrization we obtain values of the energies and bonding distances which are in better agreement with the available [ital ab] [ital initio] results for clusters of size up to [ital N]=10. Additionally, we have identified a lowest energy structure for themore » Si[sub 9] cluster, which to our knowledge has not been considered to date. We show that this structure (a distorted tricapped trigonal prism with [ital C][sub 2[ital v
iGLASS: An Improvement to the GLASS Method for Estimating Species Trees from Gene Trees

PubMed Central

Rosenberg, Noah A.

2012-01-01

Abstract Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree. PMID:22216756
Improving The Discipline of Cost Estimation and Analysis

NASA Technical Reports Server (NTRS)

Piland, William M.; Pine, David J.; Wilson, Delano M.

2000-01-01

The need to improve the quality and accuracy of cost estimates of proposed new aerospace systems has been widely recognized. The industry has done the best job of maintaining related capability with improvements in estimation methods and giving appropriate priority to the hiring and training of qualified analysts. Some parts of Government, and National Aeronautics and Space Administration (NASA) in particular, continue to need major improvements in this area. Recently, NASA recognized that its cost estimation and analysis capabilities had eroded to the point that the ability to provide timely, reliable estimates was impacting the confidence in planning many program activities. As a result, this year the Agency established a lead role for cost estimation and analysis. The Independent Program Assessment Office located at the Langley Research Center was given this responsibility. This paper presents the plans for the newly established role. Described is how the Independent Program Assessment Office, working with all NASA Centers, NASA Headquarters, other Government agencies, and industry, is focused on creating cost estimation and analysis as a professional discipline that will be recognized equally with the technical disciplines needed to design new space and aeronautics activities. Investments in selected, new analysis tools, creating advanced training opportunities for analysts, and developing career paths for future analysts engaged in the discipline are all elements of the plan. Plans also include increasing the human resources available to conduct independent cost analysis of Agency programs during their formulation, to improve near-term capability to conduct economic cost-benefit assessments, to support NASA management's decision process, and to provide cost analysis results emphasizing "full-cost" and "full-life cycle" considerations. The Agency cost analysis improvement plan has been approved for implementation starting this calendar year. Adequate financial
A new estimate of the Hubble constant using the Virgo cluster distance

NASA Astrophysics Data System (ADS)

Visvanathan, N.

The Hubble constant, which defines the size and age of the universe, remains substantially uncertain. Attention is presently given to an improved distance to the Virgo Cluster obtained by means of the 1.05-micron luminosity-H I width relation of spirals. In order to improve the absolute calibration of the relation, accurate distances to the nearby SMC, LMC, N6822, SEX A and N300 galaxies have also been obtained, on the basis of the near-IR P-L relation of the Cepheids. A value for the global Hubble constant of 67 + or 4 km/sec per Mpc is obtained.
Multiple Regression Redshift Calibration for Clusters of Galaxies

NASA Astrophysics Data System (ADS)

Kalinkov, M.; Kuneva, I.; Valtchanov, I.

A new procedure for calibration of distances to ACO (Abell et al.1989) clusters of galaxies has been developed. In the previous version of the Reference Catalog of ACO Clusters of Galaxies (Kalinkov & Kuneva 1992) an attempt has been made to compare various calibration schemes. For the Version 93 we have made some refinements. Many improvements from the early days of the photometric calibration have been made --- from Rowan-Robinson (1972), Corwin (1974), Kalinkov & Kuneva (1975), Mills Hoskins (1977) to more complicated --- Leir & van den Bergh (1977), Postman et al.(1985), Kalinkov Kuneva (1985, 1986, 1990), Scaramella et al.(1991), Zucca et al. (1993). It was shown that it is impossible to use the same calibration relation for northern (A) and southern (ACO) clusters of galaxies. Therefore the calibration have to be made separately for both catalogs. Moreover it is better if one could find relations for the 274 A-clusters, studied by the authors of ACO. We use the luminosity distance for H0=100km/s/Mpc and q0 = 0.5 and we have 1200 clusters with measured redshifts. The first step is to fit log(z) on m10 (magnitude of the tenth rank galaxy) for A-clusters and on m1, m3 and m10 for ACO clusters. The second step is to take into account the K-correction and the Scott effect (Postman et al.1985) with iterative process. To avoid the initial errors of the redshift estimates in A- and ACO catalogs we adopt Hubble's law for the apparent radial distribution of galaxies in clusters. This enable us to calculate a new cluster richness from preliminary redshift estimate. This is the third step. Further continues the study of the correlation matrix between log(z) and prospective predictors --- new richness groups, BM, RS and A types, radio and X-ray fluxes, apparent separations between the first three brightest galaxies, mean population (gal/sq.deg), Multiple linear as well as nonlinear regression estimators are found. Many clusters that deviate by more than 2.5 sigmas are
A Fast Implementation of the ISODATA Clustering Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2005-01-01

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.

2007-01-01

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
New method for estimating clustering of DNA lesions induced by physical/chemical mutagens using fluorescence anisotropy.

PubMed

Akamatsu, Ken; Shikazono, Naoya; Saito, Takeshi

2017-11-01

We have developed a new method for estimating the localization of DNA damage such as apurinic/apyrimidinic sites (APs) on DNA using fluorescence anisotropy. This method is aimed at characterizing clustered DNA damage produced by DNA-damaging agents such as ionizing radiation and genotoxic chemicals. A fluorescent probe with an aminooxy group (AlexaFluor488) was used to label APs. We prepared a pUC19 plasmid with APs by heating under acidic conditions as a model for damaged DNA, and subsequently labeled the APs. We found that the observed fluorescence anisotropy (r obs ) decreases as averaged AP density (λ AP : number of APs per base pair) increases due to homo-FRET, and that the APs were randomly distributed. We applied this method to three DNA-damaging agents, 60 Co γ-rays, methyl methanesulfonate (MMS), and neocarzinostatin (NCS). We found that r obs -λ AP relationships differed significantly between MMS and NCS. At low AP density (λ AP < 0.001), the APs induced by MMS seemed to not be closely distributed, whereas those induced by NCS were remarkably clustered. In contrast, the AP clustering induced by 60 Co γ-rays was similar to, but potentially more likely to occur than, random distribution. This simple method can be used to estimate mutagenicity of ionizing radiation and genotoxic chemicals. Copyright © 2017 Elsevier Inc. All rights reserved.
Augmenting Satellite Precipitation Estimation with Lightning Information

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mahrooghy, Majid; Anantharaj, Valentine G; Younan, Nicolas H.

2013-01-01

We have used lightning information to augment the Precipitation Estimation from Remotely Sensed Imagery using an Artificial Neural Network - Cloud Classification System (PERSIANN-CCS). Co-located lightning data are used to segregate cloud patches, segmented from GOES-12 infrared data, into either electrified (EL) or non-electrified (NEL) patches. A set of features is extracted separately for the EL and NEL cloud patches. The features for the EL cloud patches include new features based on the lightning information. The cloud patches are classified and clustered using self-organizing maps (SOM). Then brightness temperature and rain rate (T-R) relationships are derived for the different clusters.more » Rain rates are estimated for the cloud patches based on their representative T-R relationship. The Equitable Threat Score (ETS) for daily precipitation estimates is improved by almost 12% for the winter season. In the summer, no significant improvements in ETS are noted.« less
An improved global dynamic routing strategy for scale-free network with tunable clustering

NASA Astrophysics Data System (ADS)

Sun, Lina; Huang, Ning; Zhang, Yue; Bai, Yannan

2016-08-01

An efficient routing strategy can deliver packets quickly to improve the network capacity. Node congestion and transmission path length are inevitable real-time factors for a good routing strategy. Existing dynamic global routing strategies only consider the congestion of neighbor nodes and the shortest path, which ignores other key nodes’ congestion on the path. With the development of detection methods and techniques, global traffic information is readily available and important for the routing choice. Reasonable use of this information can effectively improve the network routing. So, an improved global dynamic routing strategy is proposed, which considers the congestion of all nodes on the shortest path and incorporates the waiting time of the most congested node into the path. We investigate the effectiveness of the proposed routing for scale-free network with different clustering coefficients. The shortest path routing strategy and the traffic awareness routing strategy only considering the waiting time of neighbor node are analyzed comparatively. Simulation results show that network capacity is greatly enhanced compared with the shortest path; congestion state increase is relatively slow compared with the traffic awareness routing strategy. Clustering coefficient increase will not only reduce the network throughput, but also result in transmission average path length increase for scale-free network with tunable clustering. The proposed routing is favorable to ease network congestion and network routing strategy design.
Identification of symptom and functional domains that fibromyalgia patients would like to see improved: a cluster analysis.

PubMed

Bennett, Robert M; Russell, Jon; Cappelleri, Joseph C; Bushmakin, Andrew G; Zlateva, Gergana; Sadosky, Alesia

2010-06-28

The purpose of this study was to determine whether some of the clinical features of fibromyalgia (FM) that patients would like to see improved aggregate into definable clusters. Seven hundred and eighty-eight patients with clinically confirmed FM and baseline pain > or =40 mm on a 100 mm visual analogue scale ranked 5 FM clinical features that the subjects would most like to see improved after treatment (one for each priority quintile) from a list of 20 developed during focus groups. For each subject, clinical features were transformed into vectors with rankings assigned values 1-5 (lowest to highest ranking). Logistic analysis was used to create a distance matrix and hierarchical cluster analysis was applied to identify cluster structure. The frequency of cluster selection was determined, and cluster importance was ranked using cluster scores derived from rankings of the clinical features. Multidimensional scaling was used to visualize and conceptualize cluster relationships. Six clinical features clusters were identified and named based on their key characteristics. In order of selection frequency, the clusters were Pain (90%; 4 clinical features), Fatigue (89%; 4 clinical features), Domestic (42%; 4 clinical features), Impairment (29%; 3 functions), Affective (21%; 3 clinical features), and Social (9%; 2 functional). The "Pain Cluster" was ranked of greatest importance by 54% of subjects, followed by Fatigue, which was given the highest ranking by 28% of subjects. Multidimensional scaling mapped these clusters to two dimensions: Status (bounded by Physical and Emotional domains), and Setting (bounded by Individual and Group interactions). Common clinical features of FM could be grouped into 6 clusters (Pain, Fatigue, Domestic, Impairment, Affective, and Social) based on patient perception of relevance to treatment. Furthermore, these 6 clusters could be charted in the 2 dimensions of Status and Setting, thus providing a unique perspective for interpretation of
First estimates of the fundamental parameters of the relatively bright Galactic open cluster NGC 5288

NASA Astrophysics Data System (ADS)

Piatti, Andrés E.; Clariá, Juan J.; Ahumada, Andrea V.

2006-04-01

In this paper we present charge-coupled device (CCD) images in the Johnson B and V and Kron-Cousins I passbands for the previously unstudied open cluster NGC 5288. The sample consists of 15688 stars reaching down to V~ 20.5. The cluster appears to have a relatively small but conspicuous nucleus and a low-density extended coronal region. Star counts carried out in 25 × 25 pixel2 boxes distributed throughout the whole observed field allowed us to estimate the angular core and corona radii as ~1.3 and 6.3arcmin, respectively. Our analysis suggests that NGC 5288 is moderately young and probably more metal-rich than the Sun. Adopting the theoretical metal content Z= 0.040, which provides the best global fit, we derive an age of 130+40-30Myr. Simultaneously, we have obtained colour excesses E(B-V) = 0.75 and E(V-I) = 0.95 and an apparent distance modulus V-MV= 14.00. The law of interstellar extinction in the cluster direction is found to be normal. NGC 5288 is located at 2.1 +/- 0.3kpc from the Sun beyond the Carina spiral feature and ~7.4kpc from the Galactic Centre. The cluster metallicity seems to be compatible with the cluster position in the Galaxy, given the recognized radial abundance gradient in the disc. For the first time, in this paper we determine the basic parameters for the open cluster NGC 5381, situated in the same direction as NGC 5288. This determination was reached by using CCD VI data published almost a decade ago by Pietrzyński et al. (1997) for NGC 5381. The properties of some open clusters aligned along the line of sight of NGC 5288 are examined. The properties of clusters of similar ages to NGC 5288 are also looked into. Evidence is presented that these did not form mainly along the spiral arms but rather in the thin Galactic disc (Z~+/-100pc).
Weighing Galaxy Clusters with Gas. II. On the Origin of Hydrostatic Mass Bias in ΛCDM Galaxy Clusters

NASA Astrophysics Data System (ADS)

Nelson, Kaylea; Lau, Erwin T.; Nagai, Daisuke; Rudd, Douglas H.; Yu, Liang

2014-02-01

The use of galaxy clusters as cosmological probes hinges on our ability to measure their masses accurately and with high precision. Hydrostatic mass is one of the most common methods for estimating the masses of individual galaxy clusters, which suffer from biases due to departures from hydrostatic equilibrium. Using a large, mass-limited sample of massive galaxy clusters from a high-resolution hydrodynamical cosmological simulation, in this work we show that in addition to turbulent and bulk gas velocities, acceleration of gas introduces biases in the hydrostatic mass estimate of galaxy clusters. In unrelaxed clusters, the acceleration bias is comparable to the bias due to non-thermal pressure associated with merger-induced turbulent and bulk gas motions. In relaxed clusters, the mean mass bias due to acceleration is small (lsim 3%), but the scatter in the mass bias can be reduced by accounting for gas acceleration. Additionally, this acceleration bias is greater in the outskirts of higher redshift clusters where mergers are more frequent and clusters are accreting more rapidly. Since gas acceleration cannot be observed directly, it introduces an irreducible bias for hydrostatic mass estimates. This acceleration bias places limits on how well we can recover cluster masses from future X-ray and microwave observations. We discuss implications for cluster mass estimates based on X-ray, Sunyaev-Zel'dovich effect, and gravitational lensing observations and their impact on cluster cosmology.
Planck 2015 results. XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

NASA Astrophysics Data System (ADS)

Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Battye, R.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Weller, J.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.

2016-09-01

We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing of background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. Improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

DOE PAGES

Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...

2016-09-20

In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ade, P. A. R.; Aghanim, N.; Arnaud, M.

In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Regression analysis of clustered failure time data with informative cluster size under the additive transformation models.

PubMed

Chen, Ling; Feng, Yanqin; Sun, Jianguo

2017-10-01

This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.

Effects of additional data on Bayesian clustering.

PubMed

Yamazaki, Keisuke

2017-10-01

Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Improvements of Quantum Private Comparison Protocol Based on Cluster States

NASA Astrophysics Data System (ADS)

Zhou, Ming-Kuai

2018-01-01

Quantum private comparison aims to determine whether the secrets from two different users are equal or not by utilizing the laws of quantum mechanics. Recently, Sun and Long put forward a quantum private comparison (QPC) protocol by using four-particle cluster states (Int. J. Theor. Phys. 52, 212-218, 2013). In this paper, we investigate this protocol in depth, and suggest the corresponding improvements. Compared with the original protocol, the improved protocol has the following advantages: 1) it can release the requirements of authenticated classical channels and unitary operations; 2) it can prevent the malicious attack from the genuine semi-honest TP; 3) it can enhance the qubit efficiency.
Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

DTIC Science & Technology

1982-02-26

UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an
Spatial cluster detection using dynamic programming.

PubMed

Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

2012-03-25

The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for
Spatial cluster detection using dynamic programming

PubMed Central

2012-01-01

Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on
Locally Weighted Ensemble Clustering.

PubMed

Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang

2018-05-01

Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
A Hierarchical Clustering Methodology for the Estimation of Toxicity

EPA Science Inventory

A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...
Under What Circumstances Does External Knowledge about the Correlation Structure Improve Power in Cluster Randomized Designs?

ERIC Educational Resources Information Center

Rhoads, Christopher

2014-01-01

Recent publications have drawn attention to the idea of utilizing prior information about the correlation structure to improve statistical power in cluster randomized experiments. Because power in cluster randomized designs is a function of many different parameters, it has been difficult for applied researchers to discern a simple rule explaining…
Information Filtering via Clustering Coefficients of User-Object Bipartite Networks

NASA Astrophysics Data System (ADS)

Guo, Qiang; Leng, Rui; Shi, Kerui; Liu, Jian-Guo

The clustering coefficient of user-object bipartite networks is presented to evaluate the overlap percentage of neighbors rating lists, which could be used to measure interest correlations among neighbor sets. The collaborative filtering (CF) information filtering algorithm evaluates a given user's interests in terms of his/her friends' opinions, which has become one of the most successful technologies for recommender systems. In this paper, different from the object clustering coefficient, users' clustering coefficients of user-object bipartite networks are introduced to improve the user similarity measurement. Numerical results for MovieLens and Netflix data sets show that users' clustering effects could enhance the algorithm performance. For MovieLens data set, the algorithmic accuracy, measured by the average ranking score, can be improved by 12.0% and the diversity could be improved by 18.2% and reach 0.649 when the recommendation list equals to 50. For Netflix data set, the accuracy could be improved by 14.5% at the optimal case and the popularity could be reduced by 13.4% comparing with the standard CF algorithm. Finally, we investigate the sparsity effect on the performance. This work indicates the user clustering coefficients is an effective factor to measure the user similarity, meanwhile statistical properties of user-object bipartite networks should be investigated to estimate users' tastes.
Cluster management.

PubMed

Katz, R

1992-11-01

Cluster management is a management model that fosters decentralization of management, develops leadership potential of staff, and creates ownership of unit-based goals. Unlike shared governance models, there is no formal structure created by committees and it is less threatening for managers. There are two parts to the cluster management model. One is the formation of cluster groups, consisting of all staff and facilitated by a cluster leader. The cluster groups function for communication and problem-solving. The second part of the cluster management model is the creation of task forces. These task forces are designed to work on short-term goals, usually in response to solving one of the unit's goals. Sometimes the task forces are used for quality improvement or system problems. Clusters are groups of not more than five or six staff members, facilitated by a cluster leader. A cluster is made up of individuals who work the same shift. For example, people with job titles who work days would be in a cluster. There would be registered nurses, licensed practical nurses, nursing assistants, and unit clerks in the cluster. The cluster leader is chosen by the manager based on certain criteria and is trained for this specialized role. The concept of cluster management, criteria for choosing leaders, training for leaders, using cluster groups to solve quality improvement issues, and the learning process necessary for manager support are described.
Weighing galaxy clusters with gas. II. On the origin of hydrostatic mass bias in ΛCDM galaxy clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, Kaylea; Nagai, Daisuke; Yu, Liang

2014-02-20

The use of galaxy clusters as cosmological probes hinges on our ability to measure their masses accurately and with high precision. Hydrostatic mass is one of the most common methods for estimating the masses of individual galaxy clusters, which suffer from biases due to departures from hydrostatic equilibrium. Using a large, mass-limited sample of massive galaxy clusters from a high-resolution hydrodynamical cosmological simulation, in this work we show that in addition to turbulent and bulk gas velocities, acceleration of gas introduces biases in the hydrostatic mass estimate of galaxy clusters. In unrelaxed clusters, the acceleration bias is comparable to themore » bias due to non-thermal pressure associated with merger-induced turbulent and bulk gas motions. In relaxed clusters, the mean mass bias due to acceleration is small (≲ 3%), but the scatter in the mass bias can be reduced by accounting for gas acceleration. Additionally, this acceleration bias is greater in the outskirts of higher redshift clusters where mergers are more frequent and clusters are accreting more rapidly. Since gas acceleration cannot be observed directly, it introduces an irreducible bias for hydrostatic mass estimates. This acceleration bias places limits on how well we can recover cluster masses from future X-ray and microwave observations. We discuss implications for cluster mass estimates based on X-ray, Sunyaev-Zel'dovich effect, and gravitational lensing observations and their impact on cluster cosmology.« less
High–frequency cluster radio galaxies: Luminosity functions and implications for SZE–selected cluster samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gupta, Nikhel; Saro, A.; Mohr, J. J.

We study the overdensity of point sources in the direction of X-ray-selected galaxy clusters from the meta-catalogue of X-ray-detected clusters of galaxies (MCXC; < z > = 0.14) at South Pole Telescope (SPT) and Sydney University Molonglo Sky Survey (SUMSS) frequencies. Flux densities at 95, 150 and 220 GHz are extracted from the 2500 deg 2 SPT-SZ survey maps at the locations of SUMSS sources, producing a multifrequency catalogue of radio galaxies. In the direction of massive galaxy clusters, the radio galaxy flux densities at 95 and 150 GHz are biased low by the cluster Sunyaev–Zel’dovich Effect (SZE) signal, whichmore » is negative at these frequencies. We employ a cluster SZE model to remove the expected flux bias and then study these corrected source catalogues. We find that the high-frequency radio galaxies are centrally concentrated within the clusters and that their luminosity functions (LFs) exhibit amplitudes that are characteristically an order of magnitude lower than the cluster LF at 843 MHz. We use the 150 GHz LF to estimate the impact of cluster radio galaxies on an SPT-SZ like survey. The radio galaxy flux typically produces a small bias on the SZE signal and has negligible impact on the observed scatter in the SZE mass–observable relation. If we assume there is no redshift evolution in the radio galaxy LF then 1.8 ± 0.7 per cent of the clusters with detection significance ξ ≥ 4.5 would be lost from the sample. As a result, allowing for redshift evolution of the form (1 + z) 2.5 increases the incompleteness to 5.6 ± 1.0 per cent. Improved constraints on the evolution of the cluster radio galaxy LF require a larger cluster sample extending to higher redshift.« less
High–frequency cluster radio galaxies: Luminosity functions and implications for SZE–selected cluster samples

DOE PAGES

Gupta, Nikhel; Saro, A.; Mohr, J. J.; ...

2017-01-15

We study the overdensity of point sources in the direction of X-ray-selected galaxy clusters from the meta-catalogue of X-ray-detected clusters of galaxies (MCXC; < z > = 0.14) at South Pole Telescope (SPT) and Sydney University Molonglo Sky Survey (SUMSS) frequencies. Flux densities at 95, 150 and 220 GHz are extracted from the 2500 deg 2 SPT-SZ survey maps at the locations of SUMSS sources, producing a multifrequency catalogue of radio galaxies. In the direction of massive galaxy clusters, the radio galaxy flux densities at 95 and 150 GHz are biased low by the cluster Sunyaev–Zel’dovich Effect (SZE) signal, whichmore » is negative at these frequencies. We employ a cluster SZE model to remove the expected flux bias and then study these corrected source catalogues. We find that the high-frequency radio galaxies are centrally concentrated within the clusters and that their luminosity functions (LFs) exhibit amplitudes that are characteristically an order of magnitude lower than the cluster LF at 843 MHz. We use the 150 GHz LF to estimate the impact of cluster radio galaxies on an SPT-SZ like survey. The radio galaxy flux typically produces a small bias on the SZE signal and has negligible impact on the observed scatter in the SZE mass–observable relation. If we assume there is no redshift evolution in the radio galaxy LF then 1.8 ± 0.7 per cent of the clusters with detection significance ξ ≥ 4.5 would be lost from the sample. As a result, allowing for redshift evolution of the form (1 + z) 2.5 increases the incompleteness to 5.6 ± 1.0 per cent. Improved constraints on the evolution of the cluster radio galaxy LF require a larger cluster sample extending to higher redshift.« less
The cluster-cluster correlation function. [of galaxies

NASA Technical Reports Server (NTRS)

Postman, M.; Geller, M. J.; Huchra, J. P.

1986-01-01

The clustering properties of the Abell and Zwicky cluster catalogs are studied using the two-point angular and spatial correlation functions. The catalogs are divided into eight subsamples to determine the dependence of the correlation function on distance, richness, and the method of cluster identification. It is found that the Corona Borealis supercluster contributes significant power to the spatial correlation function to the Abell cluster sample with distance class of four or less. The distance-limited catalog of 152 Abell clusters, which is not greatly affected by a single system, has a spatial correlation function consistent with the power law Xi(r) = 300r exp -1.8. In both the distance class four or less and distance-limited samples the signal in the spatial correlation function is a power law detectable out to 60/h Mpc. The amplitude of Xi(r) for clusters of richness class two is about three times that for richness class one clusters. The two-point spatial correlation function is sensitive to the use of estimated redshifts.
Progeny Clustering: A Method to Identify Biological Phenotypes

PubMed Central

Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.

2015-01-01

Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
Observed intra-cluster correlation coefficients in a cluster survey sample of patient encounters in general practice in Australia

PubMed Central

Knox, Stephanie A; Chondros, Patty

2004-01-01

Background Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Methods Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Results Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. Conclusions The demonstrated
Improved population estimates through the use of auxiliary information

USGS Publications Warehouse

Johnson, D.H.; Ralph, C.J.; Scott, J.M.

1981-01-01

When estimating the size of a population of birds, the investigator may have, in addition to an estimator based on a statistical sample, information on one of several auxiliary variables, such as: (1) estimates of the population made on previous occasions, (2) measures of habitat variables associated with the size of the population, and (3) estimates of the population sizes of other species that correlate with the species of interest. Although many studies have described the relationships between each of these kinds of data and the population size to be estimated, very little work has been done to improve the estimator by incorporating such auxiliary information. A statistical methodology termed 'empirical Bayes' seems to be appropriate to these situations. The potential that empirical Bayes methodology has for improved estimation of the population size of the Mallard (Anas platyrhynchos) is explored. In the example considered, three empirical Bayes estimators were found to reduce the error by one-fourth to one-half of that of the usual estimator.
A cluster-randomised quality improvement study to improve two inpatient stroke quality indicators.

PubMed

Williams, Linda; Daggett, Virginia; Slaven, James E; Yu, Zhangsheng; Sager, Danielle; Myers, Jennifer; Plue, Laurie; Woodward-Hagg, Heather; Damush, Teresa M

2016-04-01

Quality indicator collection and feedback improves stroke care. We sought to determine whether quality improvement training plus indicator feedback was more effective than indicator feedback alone in improving inpatient stroke indicators. We conducted a cluster-randomised quality improvement trial, randomising hospitals to quality improvement training plus indicator feedback versus indicator feedback alone to improve deep vein thrombosis (DVT) prophylaxis and dysphagia screening. Intervention sites received collaborative-based quality improvement training, external facilitation and indicator feedback. Control sites received only indicator feedback. We compared indicators pre-implementation (pre-I) to active implementation (active-I) and post-implementation (post-I) periods. We constructed mixed-effect logistic models of the two indicators with a random intercept for hospital effect, adjusting for patient, time, intervention and hospital variables. Patients at intervention sites (1147 admissions), had similar race, gender and National Institutes of Health Stroke Scale scores to control sites (1017 admissions). DVT prophylaxis improved more in intervention sites during active-I period (ratio of ORs 4.90, p<0.001), but did not differ in post-I period. Dysphagia screening improved similarly in both groups during active-I, but control sites improved more in post-I period (ratio of ORs 0.67, p=0.04). In logistic models, the intervention was independently positively associated with DVT performance during active-I period, and negatively associated with dysphagia performance post-I period. Quality improvement training was associated with early DVT improvement, but the effect was not sustained over time and was not seen with dysphagia screening. External quality improvement programmes may quickly boost performance but their effect may vary by indicator and may not sustain over time. Published by the BMJ Publishing Group Limited. For permission to use (where not already
Behaviour change intervention to improve shared toilet maintenance and cleanliness in urban slums of Dhaka: a cluster-randomised controlled trial.

PubMed

Alam, Mahbub-Ul; Winch, Peter J; Saxton, Ronald E; Nizame, Fosiul A; Yeasmin, Farzana; Norman, Guy; Masud, Abdullah-Al; Begum, Farzana; Rahman, Mahbubur; Hossain, Kamal; Layden, Anita; Unicomb, Leanne; Luby, Stephen P

2017-08-01

Shared toilets in urban slums are often unclean and poorly maintained, discouraging consistent use and thereby limiting impacts on health and quality of life. We developed behaviour change interventions to support shared toilet maintenance and improve user satisfaction. We report the intervention effectiveness on improving shared toilet cleanliness. We conducted a cluster-randomised controlled trial among users of 1226 shared toilets in 23 Dhaka slums. We assessed baseline toilet cleanliness in January 2015. The six-month intervention included provision of hardware (bin for solid waste, 4 l flushing bucket, 70 l water reservoir), and behaviour change communication (compound meetings, interpersonal household sessions, signs depicting rules for toilet use). We estimated the adjusted difference in difference (DID) to assess outcomes and accounted for clustering effects using generalised estimating equations. Compared to controls, intervention toilets were more likely to have water available inside toilet cubicles (DID: +4.7%, 95% CI: 0.2, 9.2), access to brush/broom for cleaning (DID: +8.4%, 95% CI: 2, 15) and waste bins (DID: +63%, 95% CI: 59, 66), while less likely to have visible faeces inside the pan (DID: -13%, 95% CI: -19, -5), the smell of faeces (DID: -7.6%, 95% CI: -14, -1.3) and household waste inside the cubicle (DID: -4%, 95% CI: -7, -1). In one of few efforts to promote shared toilet cleanliness, intervention compounds were significantly more likely to have cleaner toilets after six months. Future research might explore how residents can self-finance toilet maintenance, or employ mass media to reduce per-capita costs of behaviour change. © 2017 John Wiley & Sons Ltd.
Deducing the Milky Way's Massive Cluster Population

NASA Astrophysics Data System (ADS)

Hanson, M. M.; Popescu, B.; Larsen, S. S.; Ivanov, V. D.

2010-11-01

Recent near-infrared surveys of the galactic plane have been used to identify new massive cluster candidates. Follow up study indicates about half are not true, gravitationally-bound clusters. These false positives are created by high density fields of unassociated stars, often due to a sight-line of reduced extinction. What is not so easy to estimate is the number of false negatives, clusters which exist but are not currently being detected by our surveys. In order to derive critical characteristics of the Milky Way's massive cluster population, such as cluster mass function and cluster lifetimes, one must be able to estimate the characteristics of these false negatives. Our group has taken on the daunting task of attempting such an estimate by first creating the stellar cluster imaging simulation program, MASSCLEAN. I will present our preliminary models and methods for deriving the biases of current searches.

Improving estimation of flight altitude in wildlife telemetry studies

USGS Publications Warehouse

Poessel, Sharon; Duerr, Adam E.; Hall, Jonathan C.; Braham, Melissa A.; Katzner, Todd

2018-01-01

Altitude measurements from wildlife tracking devices, combined with elevation data, are commonly used to estimate the flight altitude of volant animals. However, these data often include measurement error. Understanding this error may improve estimation of flight altitude and benefit applied ecology.There are a number of different approaches that have been used to address this measurement error. These include filtering based on GPS data, filtering based on behaviour of the study species, and use of state-space models to correct measurement error. The effectiveness of these approaches is highly variable.Recent studies have based inference of flight altitude on misunderstandings about avian natural history and technical or analytical tools. In this Commentary, we discuss these misunderstandings and suggest alternative strategies both to resolve some of these issues and to improve estimation of flight altitude. These strategies also can be applied to other measures derived from telemetry data.Synthesis and applications. Our Commentary is intended to clarify and improve upon some of the assumptions made when estimating flight altitude and, more broadly, when using GPS telemetry data. We also suggest best practices for identifying flight behaviour, addressing GPS error, and using flight altitudes to estimate collision risk with anthropogenic structures. Addressing the issues we describe would help improve estimates of flight altitude and advance understanding of the treatment of error in wildlife telemetry studies.
Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

PubMed

Shen, Chung-Wei; Chen, Yi-Hau

2018-03-13

We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

PubMed

NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

2017-08-01

Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.
The HectoMAP Cluster Survey. II. X-Ray Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, Jubee; Chon, Gayoung; Bohringer, Hans

Here, we apply a friends-of-friends algorithm to the HectoMAP redshift survey and cross-identify associated X-ray emission in the ROSAT All-Sky Survey data (RASS). The resulting flux-limited catalog of X-ray cluster surveys is complete to a limiting flux of ~3 × 10 –13 erg s –1 cm –2 and includes 15 clusters (7 newly discovered) with redshifts z ≤ 0.4. HectoMAP is a dense survey (~1200 galaxies deg –2) that provides ~50 members (median) in each X-ray cluster. We provide redshifts for the 1036 cluster members. Subaru/Hyper Suprime-Cam imaging covers three of the X-ray systems and confirms that they are impressivemore » clusters. The HectoMAP X-ray clusters have an L X–σ cl scaling relation similar to that of known massive X-ray clusters. The HectoMAP X-ray cluster sample predicts ~12,000 ± 3000 detectable X-ray clusters in RASS to the limiting flux, comparable with previous estimates.« less
The HectoMAP Cluster Survey. II. X-Ray Clusters

DOE PAGES

Sohn, Jubee; Chon, Gayoung; Bohringer, Hans; ...

2018-03-10

Here, we apply a friends-of-friends algorithm to the HectoMAP redshift survey and cross-identify associated X-ray emission in the ROSAT All-Sky Survey data (RASS). The resulting flux-limited catalog of X-ray cluster surveys is complete to a limiting flux of ~3 × 10 –13 erg s –1 cm –2 and includes 15 clusters (7 newly discovered) with redshifts z ≤ 0.4. HectoMAP is a dense survey (~1200 galaxies deg –2) that provides ~50 members (median) in each X-ray cluster. We provide redshifts for the 1036 cluster members. Subaru/Hyper Suprime-Cam imaging covers three of the X-ray systems and confirms that they are impressivemore » clusters. The HectoMAP X-ray clusters have an L X–σ cl scaling relation similar to that of known massive X-ray clusters. The HectoMAP X-ray cluster sample predicts ~12,000 ± 3000 detectable X-ray clusters in RASS to the limiting flux, comparable with previous estimates.« less
Precision of systematic and random sampling in clustered populations: habitat patches and aggregating organisms.

PubMed

McGarvey, Richard; Burch, Paul; Matthews, Janet M

2016-01-01

Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic field sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard one-start aligned systematic survey design, a uniform 10 x 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fifth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to five times larger. Organisms being restricted to patches of habitat was alone sufficient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confidence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators (v) that corrected for inter-transect correlation (ν₈ and ν(W)) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two "post-stratification" variance estimators (ν₂ and ν₃) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with
Distance-Learning, ADHD Quality Improvement in Primary Care: A Cluster-Randomized Trial.

PubMed

Fiks, Alexander G; Mayne, Stephanie L; Michel, Jeremy J; Miller, Jeffrey; Abraham, Manju; Suh, Andrew; Jawad, Abbas F; Guevara, James P; Grundmeier, Robert W; Blum, Nathan J; Power, Thomas J

2017-10-01

To evaluate a distance-learning, quality improvement intervention to improve pediatric primary care provider use of attention-deficit/hyperactivity disorder (ADHD) rating scales. Primary care practices were cluster randomized to a 3-part distance-learning, quality improvement intervention (web-based education, collaborative consultation with ADHD experts, and performance feedback reports/calls), qualifying for Maintenance of Certification (MOC) Part IV credit, or wait-list control. We compared changes relative to a baseline period in rating scale use by study arm using logistic regression clustered by practice (primary analysis) and examined effect modification by level of clinician participation. An electronic health record-linked system for gathering ADHD rating scales from parents and teachers was implemented before the intervention period at all sites. Rating scale use was ascertained by manual chart review. One hundred five clinicians at 19 sites participated. Differences between arms were not significant. From the baseline to intervention period and after implementation of the electronic system, clinicians in both study arms were significantly more likely to administer and receive parent and teacher rating scales. Among intervention clinicians, those who participated in at least 1 feedback call or qualified for MOC credit were more likely to give parents rating scales with differences of 14.2 (95% confidence interval [CI], 0.6-27.7) and 18.8 (95% CI, 1.9-35.7) percentage points, respectively. A 3-part clinician-focused distance-learning, quality improvement intervention did not improve rating scale use. Complementary strategies that support workflows and more fully engage clinicians may be needed to bolster care. Electronic systems that gather rating scales may help achieve this goal. Index terms: ADHD, primary care, quality improvement, clinical decision support.
Estimating carnivoran diets using a combination of carcass observations and scats from GPS clusters

PubMed Central

Tambling, C.J.; Laurence, S.D.; Bellan, S.E.; Cameron, E.Z.; du Toit, J.T.; Getz, W.M.

2011-01-01

Scat analysis is one of the most frequently used methods to assess carnivoran diets and Global Positioning System (GPS) cluster methods are increasingly being used to locate feeding sites for large carnivorans. However, both methods have inherent biases that limit their use. GPS methods to locate kill sites are biased towards large carcasses, while scat analysis over-estimates the biomass consumed from smaller prey. We combined carcass observations and scats collected along known movement routes, assessed using GPS data from four African lion (Panthera leo) prides in the Kruger National Park, South Africa, to determine how a combination of these two datasets change diet estimates. As expected, using carcasses alone under-estimated the number of feeding events on small species, primarily impala (Aepyceros melampus) and warthog (Phacochoerus africanus), in our case by more than 50% and thus significantly under-estimated the biomass consumed per pride per day in comparison to when the diet was assessed using carcass observations alone. We show that an approach that supplements carcass observations with scats that enables the identification of potentially missed feeding events increases the estimates of food intake rates for large carnivorans, with possible ramifications for predator-prey interaction studies dealing with biomass intake rate. PMID:22408290
Improving multisensor estimation of heavy-to-extreme precipitation via conditional bias-penalized optimal estimation

NASA Astrophysics Data System (ADS)

Kim, Beomgeun; Seo, Dong-Jun; Noh, Seong Jin; Prat, Olivier P.; Nelson, Brian R.

2018-01-01

A new technique for merging radar precipitation estimates and rain gauge data is developed and evaluated to improve multisensor quantitative precipitation estimation (QPE), in particular, of heavy-to-extreme precipitation. Unlike the conventional cokriging methods which are susceptible to conditional bias (CB), the proposed technique, referred to herein as conditional bias-penalized cokriging (CBPCK), explicitly minimizes Type-II CB for improved quantitative estimation of heavy-to-extreme precipitation. CBPCK is a bivariate version of extended conditional bias-penalized kriging (ECBPK) developed for gauge-only analysis. To evaluate CBPCK, cross validation and visual examination are carried out using multi-year hourly radar and gauge data in the North Central Texas region in which CBPCK is compared with the variant of the ordinary cokriging (OCK) algorithm used operationally in the National Weather Service Multisensor Precipitation Estimator. The results show that CBPCK significantly reduces Type-II CB for estimation of heavy-to-extreme precipitation, and that the margin of improvement over OCK is larger in areas of higher fractional coverage (FC) of precipitation. When FC > 0.9 and hourly gauge precipitation is > 60 mm, the reduction in root mean squared error (RMSE) by CBPCK over radar-only (RO) is about 12 mm while the reduction in RMSE by OCK over RO is about 7 mm. CBPCK may be used in real-time analysis or in reanalysis of multisensor precipitation for which accurate estimation of heavy-to-extreme precipitation is of particular importance.
MIXED MODEL AND ESTIMATING EQUATION APPROACHES FOR ZERO INFLATION IN CLUSTERED BINARY RESPONSE DATA WITH APPLICATION TO A DATING VIOLENCE STUDY1

PubMed Central

Fulton, Kara A.; Liu, Danping; Haynie, Denise L.; Albert, Paul S.

2016-01-01

The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian–Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored. PMID:26937263
The Cluster-EAGLE project: velocity bias and the velocity dispersion-mass relation of cluster galaxies

NASA Astrophysics Data System (ADS)

Armitage, Thomas J.; Barnes, David J.; Kay, Scott T.; Bahé, Yannick M.; Dalla Vecchia, Claudio; Crain, Robert A.; Theuns, Tom

2018-03-01

We use the Cluster-EAGLE simulations to explore the velocity bias introduced when using galaxies, rather than dark matter particles, to estimate the velocity dispersion of a galaxy cluster, a property known to be tightly correlated with cluster mass. The simulations consist of 30 clusters spanning a mass range 14.0 ≤ log10(M200 c/M⊙) ≤ 15.4, with their sophisticated subgrid physics modelling and high numerical resolution (subkpc gravitational softening), making them ideal for this purpose. We find that selecting galaxies by their total mass results in a velocity dispersion that is 5-10 per cent higher than the dark matter particles. However, selecting galaxies by their stellar mass results in an almost unbiased (<5 per cent) estimator of the velocity dispersion. This result holds out to z = 1.5 and is relatively insensitive to the choice of cluster aperture, varying by less than 5 per cent between r500 c and r200 m. We show that the velocity bias is a function of the time spent by a galaxy inside the cluster environment. Selecting galaxies by their total mass results in a larger bias because a larger fraction of objects have only recently entered the cluster and these have a velocity bias above unity. Galaxies that entered more than 4 Gyr ago become progressively colder with time, as expected from dynamical friction. We conclude that velocity bias should not be a major issue when estimating cluster masses from kinematic methods.
Breaking the bottleneck: Use of molecular tailoring approach for the estimation of binding energies at MP2/CBS limit for large water clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Singh, Gurmeet; Nandi, Apurba; Gadre, Shridhar R., E-mail: gadre@iitk.ac.in

2016-03-14

A pragmatic method based on the molecular tailoring approach (MTA) for estimating the complete basis set (CBS) limit at Møller-Plesset second order perturbation (MP2) theory accurately for large molecular clusters with limited computational resources is developed. It is applied to water clusters, (H{sub 2}O){sub n} (n = 7, 8, 10, 16, 17, and 25) optimized employing aug-cc-pVDZ (aVDZ) basis-set. Binding energies (BEs) of these clusters are estimated at the MP2/aug-cc-pVNZ (aVNZ) [N = T, Q, and 5 (whenever possible)] levels of theory employing grafted MTA (GMTA) methodology and are found to lie within 0.2 kcal/mol of the corresponding full calculationmore » MP2 BE, wherever available. The results are extrapolated to CBS limit using a three point formula. The GMTA-MP2 calculations are feasible on off-the-shelf hardware and show around 50%–65% saving of computational time. The methodology has a potential for application to molecular clusters containing ∼100 atoms.« less
Improvements in estimating proportions of objects from multispectral data

NASA Technical Reports Server (NTRS)

Horwitz, H. M.; Hyde, P. D.; Richardson, W.

1974-01-01

Methods for estimating proportions of objects and materials imaged within the instantaneous field of view of a multispectral sensor were developed further. Improvements in the basic proportion estimation algorithm were devised as well as improved alien object detection procedures. Also, a simplified signature set analysis scheme was introduced for determining the adequacy of signature set geometry for satisfactory proportion estimation. Averaging procedures used in conjunction with the mixtures algorithm were examined theoretically and applied to artificially generated multispectral data. A computationally simpler estimator was considered and found unsatisfactory. Experiments conducted to find a suitable procedure for setting the alien object threshold yielded little definitive result. Mixtures procedures were used on a limited amount of ERTS data to estimate wheat proportion in selected areas. Results were unsatisfactory, partly because of the ill-conditioned nature of the pure signature set.
The SAMI Galaxy Survey: the cluster redshift survey, target selection and cluster properties

NASA Astrophysics Data System (ADS)

Owers, M. S.; Allen, J. T.; Baldry, I.; Bryant, J. J.; Cecil, G. N.; Cortese, L.; Croom, S. M.; Driver, S. P.; Fogarty, L. M. R.; Green, A. W.; Helmich, E.; de Jong, J. T. A.; Kuijken, K.; Mahajan, S.; McFarland, J.; Pracy, M. B.; Robotham, A. G. S.; Sikkema, G.; Sweet, S.; Taylor, E. N.; Verdoes Kleijn, G.; Bauer, A. E.; Bland-Hawthorn, J.; Brough, S.; Colless, M.; Couch, W. J.; Davies, R. L.; Drinkwater, M. J.; Goodwin, M.; Hopkins, A. M.; Konstantopoulos, I. S.; Foster, C.; Lawrence, J. S.; Lorente, N. P. F.; Medling, A. M.; Metcalfe, N.; Richards, S. N.; van de Sande, J.; Scott, N.; Shanks, T.; Sharp, R.; Thomas, A. D.; Tonini, C.

2017-06-01

We describe the selection of galaxies targeted in eight low-redshift clusters (APMCC0917, A168, A4038, EDCC442, A3880, A2399, A119 and A85; 0.029 < z < 0.058) as part of the Sydney-AAO Multi-Object Integral field spectrograph Galaxy Survey (SAMI-GS). We have conducted a redshift survey of these clusters using the AAOmega multi-object spectrograph on the 3.9-m Anglo-Australian Telescope. The redshift survey is used to determine cluster membership and to characterize the dynamical properties of the clusters. In combination with existing data, the survey resulted in 21 257 reliable redshift measurements and 2899 confirmed cluster member galaxies. Our redshift catalogue has a high spectroscopic completeness (˜94 per cent) for rpetro ≤ 19.4 and cluster-centric distances R < 2R200. We use the confirmed cluster member positions and redshifts to determine cluster velocity dispersion, R200, virial and caustic masses, as well as cluster structure. The clusters have virial masses 14.25 ≤ log(M200/M⊙) ≤ 15.19. The cluster sample exhibits a range of dynamical states, from relatively relaxed-appearing systems, to clusters with strong indications of merger-related substructure. Aperture- and point spread function matched photometry are derived from Sloan Digital Sky Survey and VLT Survey Telescope/ATLAS imaging and used to estimate stellar masses. These estimates, in combination with the redshifts, are used to define the input target catalogue for the cluster portion of the SAMI-GS. The primary SAMI-GS cluster targets have R cluster regions.
Improving photometric redshift estimation using GPZ: size information, post processing, and improved photometry

NASA Astrophysics Data System (ADS)

Gomes, Zahra; Jarvis, Matt J.; Almosallam, Ibrahim A.; Roberts, Stephen J.

2018-03-01

The next generation of large-scale imaging surveys (such as those conducted with the Large Synoptic Survey Telescope and Euclid) will require accurate photometric redshifts in order to optimally extract cosmological information. Gaussian Process for photometric redshift estimation (GPZ) is a promising new method that has been proven to provide efficient, accurate photometric redshift estimations with reliable variance predictions. In this paper, we investigate a number of methods for improving the photometric redshift estimations obtained using GPZ (but which are also applicable to others). We use spectroscopy from the Galaxy and Mass Assembly Data Release 2 with a limiting magnitude of r < 19.4 along with corresponding Sloan Digital Sky Survey visible (ugriz) photometry and the UKIRT Infrared Deep Sky Survey Large Area Survey near-IR (YJHK) photometry. We evaluate the effects of adding near-IR magnitudes and angular size as features for the training, validation, and testing of GPZ and find that these improve the accuracy of the results by ˜15-20 per cent. In addition, we explore a post-processing method of shifting the probability distributions of the estimated redshifts based on their Quantile-Quantile plots and find that it improves the bias by ˜40 per cent. Finally, we investigate the effects of using more precise photometry obtained from the Hyper Suprime-Cam Subaru Strategic Program Data Release 1 and find that it produces significant improvements in accuracy, similar to the effect of including additional features.
Link prediction with node clustering coefficient

NASA Astrophysics Data System (ADS)

Wu, Zhihao; Lin, Youfang; Wang, Jing; Gregory, Steve

2016-06-01

Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed Cannistrai-Alanis-Ravai (CAR) index shows the power of local link/triangle information in improving link-prediction accuracy. Inspired by the idea of employing local link/triangle information, we propose a new similarity index with more local structure information. In our method, local link/triangle structure information can be conveyed by clustering coefficient of common-neighbors directly. The reason why clustering coefficient has good effectiveness in estimating the contribution of a common-neighbor is that it employs links existing between neighbors of a common-neighbor and these links have the same structural position with the candidate link to this common-neighbor. In our experiments, three estimators: precision, AUP and AUC are used to evaluate the accuracy of link prediction algorithms. Experimental results on ten tested networks drawn from various fields show that our new index is more effective in predicting missing links than CAR index, especially for networks with low correlation between number of common-neighbors and number of links between common-neighbors.
Automatic video shot boundary detection using k-means clustering and improved adaptive dual threshold comparison

NASA Astrophysics Data System (ADS)

Sa, Qila; Wang, Zhihui

2018-03-01

At present, content-based video retrieval (CBVR) is the most mainstream video retrieval method, using the video features of its own to perform automatic identification and retrieval. This method involves a key technology, i.e. shot segmentation. In this paper, the method of automatic video shot boundary detection with K-means clustering and improved adaptive dual threshold comparison is proposed. First, extract the visual features of every frame and divide them into two categories using K-means clustering algorithm, namely, one with significant change and one with no significant change. Then, as to the classification results, utilize the improved adaptive dual threshold comparison method to determine the abrupt as well as gradual shot boundaries.Finally, achieve automatic video shot boundary detection system.
STELLAR ENCOUNTER RATE IN GALACTIC GLOBULAR CLUSTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bahramian, Arash; Heinke, Craig O.; Sivakoff, Gregory R.

2013-04-01

The high stellar densities in the cores of globular clusters cause significant stellar interactions. These stellar interactions can produce close binary mass-transferring systems involving compact objects and their progeny, such as X-ray binaries and radio millisecond pulsars. Comparing the numbers of these systems and interaction rates in different clusters drives our understanding of how cluster parameters affect the production of close binaries. In this paper we estimate stellar encounter rates ({Gamma}) for 124 Galactic globular clusters based on observational data as opposed to the methods previously employed, which assumed 'King-model' profiles for all clusters. By deprojecting cluster surface brightness profilesmore » to estimate luminosity density profiles, we treat 'King-model' and 'core-collapsed' clusters in the same way. In addition, we use Monte Carlo simulations to investigate the effects of uncertainties in various observational parameters (distance, reddening, surface brightness) on {Gamma}, producing the first catalog of globular cluster stellar encounter rates with estimated errors. Comparing our results with published observations of likely products of stellar interactions (numbers of X-ray binaries, numbers of radio millisecond pulsars, and {gamma}-ray luminosity) we find both clear correlations and some differences with published results.« less
Cluster membership probability: polarimetric approach

NASA Astrophysics Data System (ADS)

Medhi, Biman J.; Tamura, Motohide

2013-04-01

Interstellar polarimetric data of the six open clusters Hogg 15, NGC 6611, NGC 5606, NGC 6231, NGC 5749 and NGC 6250 have been used to estimate the membership probability for the stars within them. For proper-motion member stars, the membership probability estimated using the polarimetric data is in good agreement with the proper-motion cluster membership probability. However, for proper-motion non-member stars, the membership probability estimated by the polarimetric method is in total disagreement with the proper-motion cluster membership probability. The inconsistencies in the determined memberships may be because of the fundamental differences between the two methods of determination: one is based on stellar proper motion in space and the other is based on selective extinction of the stellar output by the asymmetric aligned dust grains present in the interstellar medium. The results and analysis suggest that the scatter of the Stokes vectors q (per cent) and u (per cent) for the proper-motion member stars depends on the interstellar and intracluster differential reddening in the open cluster. It is found that this method could be used to estimate the cluster membership probability if we have additional polarimetric and photometric information for a star to identify it as a probable member/non-member of a particular cluster, such as the maximum wavelength value (λmax), the unit weight error of the fit (σ1), the dispersion in the polarimetric position angles (overline{ɛ }), reddening (E(B - V)) or the differential intracluster reddening (ΔE(B - V)). This method could also be used to estimate the membership probability of known member stars having no membership probability as well as to resolve disagreements about membership among different proper-motion surveys.
An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.

PubMed

Botía, Juan A; Vandrovcova, Jana; Forabosco, Paola; Guelfi, Sebastian; D'Sa, Karishma; Hardy, John; Lewis, Cathryn M; Ryten, Mina; Weale, Michael E

2017-04-12

Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ). We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.

Improved estimates of fixed reproducible tangible wealth, 1929-95

DOT National Transportation Integrated Search

1997-05-01

This article presents revised estimates of the value of fixed reproducible tangible wealth in the United States for 192995; these estimates incorporate the definitional and statistical : improvements introduced in last years comprehensive revis...
Improving care of patients with diabetes and CKD: a pilot study for a cluster-randomized trial.

PubMed

Cortés-Sanabria, Laura; Cabrera-Pivaral, Carlos E; Cueto-Manzano, Alfonso M; Rojas-Campos, Enrique; Barragán, Graciela; Hernández-Anaya, Moisés; Martínez-Ramírez, Héctor R

2008-05-01

Family physicians may have the main role in managing patients with type 2 diabetes mellitus with early nephropathy. It is therefore important to determine the clinical competence of family physicians in preserving renal function of patients. The aim of this study is to evaluate the effect of an educational intervention on family physicians' clinical competence and subsequently determine the impact on kidney function of their patients with type 2 diabetes mellitus. Pilot study for a cluster-randomized trial. Primary health care units of the Mexican Institute of Social Security, Guadalajara, Mexico. The study group was composed of 21 family physicians from 1 unit and a control group of 19 family physicians from another unit. 46 patients treated by study physicians and 48 treated by control physicians also were evaluated. An educative strategy based on a participative model used during 6 months in the study group. Allocation of units to receive or not receive the educative intervention was randomly established. Clinical competence of family physicians and kidney function of patients. To evaluate clinical competence, a validated questionnaire measuring family physicians' capability to identify risk factors, integrate diagnosis, and correctly use laboratory tests and therapeutic resources was applied to all physicians at the beginning and end of educative intervention (0 and 6 months). In patients, serum creatinine level, estimated glomerular filtration rate, and albuminuria were evaluated at 0, 6, and 12 months. At the end of the intervention, more family physicians from the study group improved clinical competence (91%) compared with controls (37%; P = 0.001). Family physicians in the study group who increased their competence improved renal function significantly better than physicians in the same group who did not increase competence and physicians in the control group (with or without increase in competence): change in estimated glomerular filtration rate, 0
Formation of Education Clusters as a Way to Improve Education

ERIC Educational Resources Information Center

Aitbayeva, Gul'zamira D.; Zhubanova, Mariyash K.; Kulgildinova, Tulebike A.; Tusupbekova, Gulsum M.; Uaisova, Gulnar I.

2016-01-01

The purpose of this research is to analyze basic prerequisites formation and development factors of educational clusters of the world's leading nations for studying the possibility of cluster policy introduction and creating educational clusters in the Republic of Kazakhstan. The authors of this study concluded that educational cluster could be…
Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

PubMed

Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

2015-09-01

According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters

NASA Astrophysics Data System (ADS)

Mahmoud, E.; Shoukry, A.; Takey, A.

2018-04-01

Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values
Hierarchical modeling of cluster size in wildlife surveys

USGS Publications Warehouse

Royle, J. Andrew

2008-01-01

Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Linear clusters of galaxies - A999 and A1016

NASA Astrophysics Data System (ADS)

Chapman, G. N. F.; Geller, M. J.; Huchra, J. P.

1987-09-01

The authors have measured 44 new redshifts in A 999 and 40 in A 1016: these clusters are both "linear" according to Rood and Sastry (1971) and Struble and Rood (1982, 1984). With 20 cluster members in A 999 and 22 in A 1016, the authors can estimate the probability that these clusters are actually drawn from spherically symmetric distributions. By comparing the clusters with Monte Carlo King models, they find that A 999 is probably intrinsically spherically symmetric, but A 1016 is probably linear. The authors estimate that ⪆2% of a catalog of spherically symmetric clusters might be erroneously classified as linear. They use the data to estimate the virial masses for these systems. The authors reassess the cluster-galaxy alignment analysis of Adams, Strom, and Strom (1980) and examine the relationship between the luminosity and morphological type of the cluster members and the cluster itself.
Cluster-lensing: A Python Package for Galaxy Clusters and Miscentering

NASA Astrophysics Data System (ADS)

Ford, Jes; VanderPlas, Jake

2016-12-01

We describe a new open source package for calculating properties of galaxy clusters, including Navarro, Frenk, and White halo profiles with and without the effects of cluster miscentering. This pure-Python package, cluster-lensing, provides well-documented and easy-to-use classes and functions for calculating cluster scaling relations, including mass-richness and mass-concentration relations from the literature, as well as the surface mass density {{Σ }}(R) and differential surface mass density {{Δ }}{{Σ }}(R) profiles, probed by weak lensing magnification and shear. Galaxy cluster miscentering is especially a concern for stacked weak lensing shear studies of galaxy clusters, where offsets between the assumed and the true underlying matter distribution can lead to a significant bias in the mass estimates if not accounted for. This software has been developed and released in a public GitHub repository, and is licensed under the permissive MIT license. The cluster-lensing package is archived on Zenodo. Full documentation, source code, and installation instructions are available at http://jesford.github.io/cluster-lensing/.
MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Umetsu, Keiichi

2013-05-20

Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less
The Development of a Model for Estimating the Costs Associated with the Delivery of a Metals Cluster Program.

ERIC Educational Resources Information Center

Hunt, Charles R.

A study developed a model to assist school administrators to estimate costs associated with the delivery of a metals cluster program at Norfolk State College, Virginia. It sought to construct the model so that costs could be explained as a function of enrollment levels. Data were collected through a literature review, computer searches of the…
Improving the realism of hydrologic model through multivariate parameter estimation

NASA Astrophysics Data System (ADS)

Rakovec, Oldrich; Kumar, Rohini; Attinger, Sabine; Samaniego, Luis

2017-04-01

Increased availability and quality of near real-time observations should improve understanding of predictive skills of hydrological models. Recent studies have shown the limited capability of river discharge data alone to adequately constrain different components of distributed model parameterizations. In this study, the GRACE satellite-based total water storage (TWS) anomaly is used to complement the discharge data with an aim to improve the fidelity of mesoscale hydrologic model (mHM) through multivariate parameter estimation. The study is conducted in 83 European basins covering a wide range of hydro-climatic regimes. The model parameterization complemented with the TWS anomalies leads to statistically significant improvements in (1) discharge simulations during low-flow period, and (2) evapotranspiration estimates which are evaluated against independent (FLUXNET) data. Overall, there is no significant deterioration in model performance for the discharge simulations when complemented by information from the TWS anomalies. However, considerable changes in the partitioning of precipitation into runoff components are noticed by in-/exclusion of TWS during the parameter estimation. A cross-validation test carried out to assess the transferability and robustness of the calibrated parameters to other locations further confirms the benefit of complementary TWS data. In particular, the evapotranspiration estimates show more robust performance when TWS data are incorporated during the parameter estimation, in comparison with the benchmark model constrained against discharge only. This study highlights the value for incorporating multiple data sources during parameter estimation to improve the overall realism of hydrologic model and its applications over large domains. Rakovec, O., Kumar, R., Attinger, S. and Samaniego, L. (2016): Improving the realism of hydrologic model functioning through multivariate parameter estimation. Water Resour. Res., 52, http://dx.doi.org/10
A partial list of southern clusters of galaxies

NASA Technical Reports Server (NTRS)

Quintana, H.; White, R. A.

1990-01-01

An inspection of 34 SRC/ESO J southern sky fields is the basis of the present list of clusters of galaxies and their approximate classifications in terms of cluster concentration, defined independently of richness and shape-symmetry. Where possible, an estimate of the cluster morphological population is provided. The Bautz-Morgan classification was applied using a strict comparison with clusters on the Palomar Sky Survey. Magnitudes were estimated on the basis of galaxies with photoelectric or photographic magnitudes.
MASSCLEANage—Stellar Cluster Ages from Integrated Colors

NASA Astrophysics Data System (ADS)

Popescu, Bogdan; Hanson, M. M.

2010-11-01

We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC. Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.
MASSCLEANage-STELLAR CLUSTER AGES FROM INTEGRATED COLORS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Popescu, Bogdan; Hanson, M. M., E-mail: popescb@mail.uc.ed, E-mail: margaret.hanson@uc.ed

2010-11-20

We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC.more » Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.« less
Participatory women's groups and counselling through home visits to improve child growth in rural eastern India: protocol for a cluster randomised controlled trial.

PubMed

Nair, Nirmala; Tripathy, Prasanta; Sachdev, Harshpal S; Bhattacharyya, Sanghita; Gope, Rajkumar; Gagrai, Sumitra; Rath, Shibanand; Rath, Suchitra; Sinha, Rajesh; Roy, Swati Sarbani; Shewale, Suhas; Singh, Vijay; Srivastava, Aradhana; Pradhan, Hemanta; Costello, Anthony; Copas, Andrew; Skordis-Worrall, Jolene; Haghparast-Bidgoli, Hassan; Saville, Naomi; Prost, Audrey

2015-04-15

Child stunting (low height-for-age) is a marker of chronic undernutrition and predicts children's subsequent physical and cognitive development. Around one third of the world's stunted children live in India. Our study aims to assess the impact, cost-effectiveness, and scalability of a community intervention with a government-proposed community-based worker to improve growth in children under two in rural India. The study is a cluster randomised controlled trial in two rural districts of Jharkhand and Odisha (eastern India). The intervention tested involves a community-based worker carrying out two activities: (a) one home visit to all pregnant women in the third trimester, followed by subsequent monthly home visits to all infants aged 0-24 months to support appropriate feeding, infection control, and care-giving; (b) a monthly women's group meeting using participatory learning and action to catalyse individual and community action for maternal and child health and nutrition. Both intervention and control clusters also receive an intervention to strengthen Village Health Sanitation and Nutrition Committees. The unit of randomisation is a purposively selected cluster of approximately 1000 population. A total of 120 geographical clusters covering an estimated population of 121,531 were randomised to two trial arms: 60 clusters in the intervention arm receive home visits, group meetings, and support to Village Health Sanitation and Nutrition Committees; 60 clusters in the control arm receive support to Committees only. The study participants are pregnant women identified in the third trimester of pregnancy and their children (n = 2520). Mothers and their children are followed up at seven time points: during pregnancy, within 72 hours of delivery, and at 3, 6, 9, 12 and 18 months after birth. The trial's primary outcome is children's mean length-for-age Z scores at 18 months. Secondary outcomes include wasting and underweight at all time points, birth weight, growth
How social information can improve estimation accuracy in human groups.

PubMed

Jayles, Bertrand; Kim, Hye-Rin; Escobedo, Ramón; Cezera, Stéphane; Blanchet, Adrien; Kameda, Tatsuya; Sire, Clément; Theraulaz, Guy

2017-11-21

In our digital and connected societies, the development of social networks, online shopping, and reputation systems raises the questions of how individuals use social information and how it affects their decisions. We report experiments performed in France and Japan, in which subjects could update their estimates after having received information from other subjects. We measure and model the impact of this social information at individual and collective scales. We observe and justify that, when individuals have little prior knowledge about a quantity, the distribution of the logarithm of their estimates is close to a Cauchy distribution. We find that social influence helps the group improve its properly defined collective accuracy. We quantify the improvement of the group estimation when additional controlled and reliable information is provided, unbeknownst to the subjects. We show that subjects' sensitivity to social influence permits us to define five robust behavioral traits and increases with the difference between personal and group estimates. We then use our data to build and calibrate a model of collective estimation to analyze the impact on the group performance of the quantity and quality of information received by individuals. The model quantitatively reproduces the distributions of estimates and the improvement of collective performance and accuracy observed in our experiments. Finally, our model predicts that providing a moderate amount of incorrect information to individuals can counterbalance the human cognitive bias to systematically underestimate quantities and thereby improve collective performance. Copyright © 2017 the Author(s). Published by PNAS.
How social information can improve estimation accuracy in human groups

PubMed Central

Jayles, Bertrand; Kim, Hye-rin; Cezera, Stéphane; Blanchet, Adrien; Kameda, Tatsuya; Sire, Clément; Theraulaz, Guy

2017-01-01

In our digital and connected societies, the development of social networks, online shopping, and reputation systems raises the questions of how individuals use social information and how it affects their decisions. We report experiments performed in France and Japan, in which subjects could update their estimates after having received information from other subjects. We measure and model the impact of this social information at individual and collective scales. We observe and justify that, when individuals have little prior knowledge about a quantity, the distribution of the logarithm of their estimates is close to a Cauchy distribution. We find that social influence helps the group improve its properly defined collective accuracy. We quantify the improvement of the group estimation when additional controlled and reliable information is provided, unbeknownst to the subjects. We show that subjects’ sensitivity to social influence permits us to define five robust behavioral traits and increases with the difference between personal and group estimates. We then use our data to build and calibrate a model of collective estimation to analyze the impact on the group performance of the quantity and quality of information received by individuals. The model quantitatively reproduces the distributions of estimates and the improvement of collective performance and accuracy observed in our experiments. Finally, our model predicts that providing a moderate amount of incorrect information to individuals can counterbalance the human cognitive bias to systematically underestimate quantities and thereby improve collective performance. PMID:29118142
ASCA Temperature Maps for Merging and Relaxed Clusters and Physics of the Cluster Gas

NASA Technical Reports Server (NTRS)

Markevitch, M.; Sarazin, C.; Nevalainen, J.; Vikhlinin, A.; Forman, W.

1999-01-01

ASCA temperature maps for several galaxy clusters undergoing strong mergers will be presented. From these maps, it is possible to estimate velocities of the colliding subclusters. I will discuss several interesting implications of these estimates for the physics of the cluster gas and the shape of the gravitational potential. I will also present temperature maps and profiles for several relaxed clusters selected for X-ray mass determination, and present the mass values derived without the assumption of isothermality. The accurate mass-temperature and luminosity-temperature relations will be discussed. This talk will review how AXAF will revolutionize X-ray astronomy through its radically better imaging and spectroscopic resolution. Examples from many fields of astrophysics will be given.
Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

PubMed

Candel, Math J J M; Van Breukelen, Gerard J P

2010-06-30

Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.
Study of a few cluster candidates in the Magellanic Bridge

NASA Astrophysics Data System (ADS)

Choudhury, Samyaday; Subramaniam Subramaniam, Annapurni; Sohn, Young-Jong

2018-06-01

The Magellanic Clouds (LMC & SMC) are gas rich, metal poor, dwarf satellite galaxies to our Milky Way that are interacting with each other. The Magellanic Bridge (MB), joining the larger and smaller Cloud is considered to be a signature of this interaction process. Studies have revealed that the MB, apart from gas also hosts stellar populations and star clusters. The number of clusters, with well-estimated parameters within the MB is still underway. In this work, we study a sample of 9 previously cataloged star clusters in the MB region. We use Washington C, Harris R and Cousins I bands data from literature, taken using the 4-m Blanco telescope to estimate the cluster properties (size, age, reddening). We also identify and separate out genuine cluster candidates from possible clusters/asterism. The increase in number of genuine cluster candidates with well-estimated parameters is important in the context of understanding cluster formation and evolution in such low-metallicity, and tidally disrupted environment. The clusters studied here can also help estimate distances to different parts of the MB, as recent studies indicate that portions of MB near the SMC is a closer to us, than the LMC.

[Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

PubMed

Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

2016-10-01

Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.
State estimation improves prospects for ocean research

NASA Astrophysics Data System (ADS)

Stammer, Detlef; Wunsch, C.; Fukumori, I.; Marshall, J.

Rigorous global ocean state estimation methods can now be used to produce dynamically consistent time-varying model/data syntheses, the results of which are being used to study a variety of important scientific problems. Figure 1 shows a schematic of a complete ocean observing and synthesis system that includes global observations and state-of-the-art ocean general circulation models (OGCM) run on modern computer platforms. A global observing system is described in detail in Smith and Koblinsky [2001],and the present status of ocean modeling and anticipated improvements are addressed by Griffies et al. [2001]. Here, the focus is on the third component of state estimation: the synthesis of the observations and a model into a unified, dynamically consistent estimate.
Amplification of the entire kanamycin biosynthetic gene cluster during empirical strain improvement of Streptomyces kanamyceticus.

PubMed

Yanai, Koji; Murakami, Takeshi; Bibb, Mervyn

2006-06-20

Streptomyces kanamyceticus 12-6 is a derivative of the wild-type strain developed for industrial kanamycin (Km) production. Southern analysis and DNA sequencing revealed amplification of a large genomic segment including the entire Km biosynthetic gene cluster in the chromosome of strain 12-6. At 145 kb, the amplifiable unit of DNA (AUD) is the largest AUD reported in Streptomyces. Striking repetitive DNA sequences belonging to the clustered regularly interspaced short palindromic repeats family were found in the AUD and may play a role in its amplification. Strain 12-6 contains a mixture of different chromosomes with varying numbers of AUDs, sometimes exceeding 36 copies and producing an amplified region >5.7 Mb. The level of Km production depended on the copy number of the Km biosynthetic gene cluster, suggesting that DNA amplification occurred during strain improvement as a consequence of selection for increased Km resistance. Amplification of DNA segments including entire antibiotic biosynthetic gene clusters might be a common mechanism leading to increased antibiotic production in industrial strains.
A Bayesian hierarchical model for mortality data from cluster-sampling household surveys in humanitarian crises.

PubMed

Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko

2018-05-31

The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

NASA Astrophysics Data System (ADS)

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-04-01

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

PubMed

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-03-13

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
Improving Evapotranspiration Estimates Using Multi-Platform Remote Sensing

NASA Astrophysics Data System (ADS)

Knipper, Kyle; Hogue, Terri; Franz, Kristie; Scott, Russell

2016-04-01

Understanding the linkages between energy and water cycles through evapotranspiration (ET) is uniquely challenging given its dependence on a range of climatological parameters and surface/atmospheric heterogeneity. A number of methods have been developed to estimate ET either from primarily remote-sensing observations, in-situ measurements, or a combination of the two. However, the scale of many of these methods may be too large to provide needed information about the spatial and temporal variability of ET that can occur over regions with acute or chronic land cover change and precipitation driven fluxes. The current study aims to improve the spatial and temporal variability of ET utilizing only satellite-based observations by incorporating a potential evapotranspiration (PET) methodology with satellite-based down-scaled soil moisture estimates in southern Arizona, USA. Initially, soil moisture estimates from AMSR2 and SMOS are downscaled to 1km through a triangular relationship between MODIS land surface temperature (MYD11A1), vegetation indices (MOD13Q1/MYD13Q1), and brightness temperature. Downscaled soil moisture values are then used to scale PET to actual ET (AET) at a daily, 1km resolution. Derived AET estimates are compared to observed flux tower estimates, the North American Land Data Assimilation System (NLDAS) model output (i.e. Variable Infiltration Capacity (VIC) Macroscale Hydrologic Model, Mosiac Model, and Noah Model simulations), the Operational Simplified Surface Energy Balance Model (SSEBop), and a calibrated empirical ET model created specifically for the region. Preliminary results indicate a strong increase in correlation when incorporating the downscaling technique to original AMSR2 and SMOS soil moisture values, with the added benefit of being able to decipher small scale heterogeneity in soil moisture (riparian versus desert grassland). AET results show strong correlations with relatively low error and bias when compared to flux tower
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

DTIC Science & Technology

2015-01-01

ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
An adaptive displacement estimation algorithm for improved reconstruction of thermal strain.

PubMed

Ding, Xuan; Dutta, Debaditya; Mahmoud, Ahmed M; Tillman, Bryan; Leers, Steven A; Kim, Kang

2015-01-01

Thermal strain imaging (TSI) can be used to differentiate between lipid and water-based tissues in atherosclerotic arteries. However, detecting small lipid pools in vivo requires accurate and robust displacement estimation over a wide range of displacement magnitudes. Phase-shift estimators such as Loupas' estimator and time-shift estimators such as normalized cross-correlation (NXcorr) are commonly used to track tissue displacements. However, Loupas' estimator is limited by phase-wrapping and NXcorr performs poorly when the SNR is low. In this paper, we present an adaptive displacement estimation algorithm that combines both Loupas' estimator and NXcorr. We evaluated this algorithm using computer simulations and an ex vivo human tissue sample. Using 1-D simulation studies, we showed that when the displacement magnitude induced by thermal strain was >λ/8 and the electronic system SNR was >25.5 dB, the NXcorr displacement estimate was less biased than the estimate found using Loupas' estimator. On the other hand, when the displacement magnitude was ≤λ/4 and the electronic system SNR was ≤25.5 dB, Loupas' estimator had less variance than NXcorr. We used these findings to design an adaptive displacement estimation algorithm. Computer simulations of TSI showed that the adaptive displacement estimator was less biased than either Loupas' estimator or NXcorr. Strain reconstructed from the adaptive displacement estimates improved the strain SNR by 43.7 to 350% and the spatial accuracy by 1.2 to 23.0% (P < 0.001). An ex vivo human tissue study provided results that were comparable to computer simulations. The results of this study showed that a novel displacement estimation algorithm, which combines two different displacement estimators, yielded improved displacement estimation and resulted in improved strain reconstruction.
An Adaptive Displacement Estimation Algorithm for Improved Reconstruction of Thermal Strain

PubMed Central

Ding, Xuan; Dutta, Debaditya; Mahmoud, Ahmed M.; Tillman, Bryan; Leers, Steven A.; Kim, Kang

2014-01-01

Thermal strain imaging (TSI) can be used to differentiate between lipid and water-based tissues in atherosclerotic arteries. However, detecting small lipid pools in vivo requires accurate and robust displacement estimation over a wide range of displacement magnitudes. Phase-shift estimators such as Loupas’ estimator and time-shift estimators like normalized cross-correlation (NXcorr) are commonly used to track tissue displacements. However, Loupas’ estimator is limited by phase-wrapping and NXcorr performs poorly when the signal-to-noise ratio (SNR) is low. In this paper, we present an adaptive displacement estimation algorithm that combines both Loupas’ estimator and NXcorr. We evaluated this algorithm using computer simulations and an ex-vivo human tissue sample. Using 1-D simulation studies, we showed that when the displacement magnitude induced by thermal strain was >λ/8 and the electronic system SNR was >25.5 dB, the NXcorr displacement estimate was less biased than the estimate found using Loupas’ estimator. On the other hand, when the displacement magnitude was ≤λ/4 and the electronic system SNR was ≤25.5 dB, Loupas’ estimator had less variance than NXcorr. We used these findings to design an adaptive displacement estimation algorithm. Computer simulations of TSI using Field II showed that the adaptive displacement estimator was less biased than either Loupas’ estimator or NXcorr. Strain reconstructed from the adaptive displacement estimates improved the strain SNR by 43.7–350% and the spatial accuracy by 1.2–23.0% (p < 0.001). An ex-vivo human tissue study provided results that were comparable to computer simulations. The results of this study showed that a novel displacement estimation algorithm, which combines two different displacement estimators, yielded improved displacement estimation and results in improved strain reconstruction. PMID:25585398
Adaptive OFDM Radar Waveform Design for Improved Micro-Doppler Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Satyabrata

Here we analyze the performance of a wideband orthogonal frequency division multiplexing (OFDM) signal in estimating the micro-Doppler frequency of a rotating target having multiple scattering centers. The use of a frequency-diverse OFDM signal enables us to independently analyze the micro-Doppler characteristics with respect to a set of orthogonal subcarrier frequencies. We characterize the accuracy of micro-Doppler frequency estimation by computing the Cramer-Rao bound (CRB) on the angular-velocity estimate of the target. Additionally, to improve the accuracy of the estimation procedure, we formulate and solve an optimization problem by minimizing the CRB on the angular-velocity estimate with respect to themore » OFDM spectral coefficients. We present several numerical examples to demonstrate the CRB variations with respect to the signal-to-noise ratios, number of temporal samples, and number of OFDM subcarriers. We also analysed numerically the improvement in estimation accuracy due to the adaptive waveform design. A grid-based maximum likelihood estimation technique is applied to evaluate the corresponding mean-squared error performance.« less
Could the clinical interpretability of subgroups detected using clustering methods be improved by using a novel two-stage approach?

PubMed

Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice

2015-01-01

Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical
Parameter estimation for chaotic systems using improved bird swarm algorithm

NASA Astrophysics Data System (ADS)

Xu, Chuangbiao; Yang, Renhuan

2017-12-01

Parameter estimation of chaotic systems is an important problem in nonlinear science and has aroused increasing interest of many research fields, which can be basically reduced to a multidimensional optimization problem. In this paper, an improved boundary bird swarm algorithm is used to estimate the parameters of chaotic systems. This algorithm can combine the good global convergence and robustness of the bird swarm algorithm and the exploitation capability of improved boundary learning strategy. Experiments are conducted on the Lorenz system and the coupling motor system. Numerical simulation results reveal the effectiveness and with desirable performance of IBBSA for parameter estimation of chaotic systems.
The halo boundary of galaxy clusters in the SDSS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh

Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the "infalling" regime outside the halo to the "collapsed" regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a "splashback"-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. As a result, with upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The Halo Boundary of Galaxy Clusters in the SDSS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baxter, Eric; Jain, Bhuvnesh; Sheth, Ravi K.

Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the “infalling” regime outside the halo to the “collapsed” regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a “splashback”-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. With upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The halo boundary of galaxy clusters in the SDSS

DOE PAGES

Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh; ...

2017-05-18

Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the "infalling" regime outside the halo to the "collapsed" regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a "splashback"-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. As a result, with upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The Halo Boundary of Galaxy Clusters in the SDSS

NASA Astrophysics Data System (ADS)

Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh; Adhikari, Susmita; Dalal, Neal; Kravtsov, Andrey; More, Surhud; Rozo, Eduardo; Rykoff, Eli; Sheth, Ravi K.

2017-05-01

Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the “infalling” regime outside the halo to the “collapsed” regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxy colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a “splashback”-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. With upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.
Improving Estimated Optical Constants With MSTM and DDSCAT Modeling

NASA Astrophysics Data System (ADS)

Pitman, K. M.; Wolff, M. J.

2015-12-01

We present numerical experiments to determine quantitatively the effects of mineral particle clustering on Mars spacecraft spectral signatures and to improve upon the values of refractive indices (optical constants n, k) derived from Mars dust laboratory analog spectra such as those from RELAB and MRO CRISM libraries. Whereas spectral properties for Mars analog minerals and actual Mars soil are dominated by aggregates of particles smaller than the size of martian atmospheric dust, the analytic radiative transfer (RT) solutions used to interpret planetary surfaces assume that individual, well-separated particles dominate the spectral signature. Both in RT models and in the refractive index derivation methods that include analytic RT approximations, spheres are also over-used to represent nonspherical particles. Part of the motivation is that the integrated effect over randomly oriented particles on quantities such as single scattering albedo and phase function are relatively less than for single particles. However, we have seen in previous numerical experiments that when varying the shape and size of individual grains within a cluster, the phase function changes in both magnitude and slope, thus the "relatively less" effect is more significant than one might think. Here we examine the wavelength dependence of the forward scattering parameter with multisphere T-matrix (MSTM) and discrete dipole approximation (DDSCAT) codes that compute light scattering by layers of particles on planetary surfaces to see how albedo is affected and integrate our model results into refractive index calculations to remove uncertainties in approximations and parameters that can lower the accuracy of optical constants. By correcting the single scattering albedo and phase function terms in the refractive index determinations, our data will help to improve the understanding of Mars in identifying, mapping the distributions, and quantifying abundances for these minerals and will address long
As-built design specification for proportion estimate software subsystem

NASA Technical Reports Server (NTRS)

Obrien, S. (Principal Investigator)

1980-01-01

The Proportion Estimate Processor evaluates four estimation techniques in order to get an improved estimate of the proportion of a scene that is planted in a selected crop. The four techniques to be evaluated were provided by the techniques development section and are: (1) random sampling; (2) proportional allocation, relative count estimate; (3) proportional allocation, Bayesian estimate; and (4) sequential Bayesian allocation. The user is given two options for computation of the estimated mean square error. These are referred to as the cluster calculation option and the segment calculation option. The software for the Proportion Estimate Processor is operational on the IBM 3031 computer.
Clustering cancer gene expression data by projective clustering ensemble

PubMed Central

Yu, Xianxue; Yu, Guoxian

2017-01-01

Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920

Digital camera auto white balance based on color temperature estimation clustering

NASA Astrophysics Data System (ADS)

Zhang, Lei; Liu, Peng; Liu, Yuling; Yu, Feihong

2010-11-01

Auto white balance (AWB) is an important technique for digital cameras. Human vision system has the ability to recognize the original color of an object in a scene illuminated by a light source that has a different color temperature from D65-the standard sun light. However, recorded images or video clips, can only record the original information incident into the sensor. Therefore, those recorded will appear different from the real scene observed by the human. Auto white balance is a technique to solve this problem. Traditional methods such as gray world assumption, white point estimation, may fail for scenes with large color patches. In this paper, an AWB method based on color temperature estimation clustering is presented and discussed. First, the method gives a list of several lighting conditions that are common for daily life, which are represented by their color temperatures, and thresholds for each color temperature to determine whether a light source is this kind of illumination; second, an image to be white balanced are divided into N blocks (N is determined empirically). For each block, the gray world assumption method is used to calculate the color cast, which can be used to estimate the color temperature of that block. Third, each calculated color temperature are compared with the color temperatures in the given illumination list. If the color temperature of a block is not within any of the thresholds in the given list, that block is discarded. Fourth, the remaining blocks are given a majority selection, the color temperature having the most blocks are considered as the color temperature of the light source. Experimental results show that the proposed method works well for most commonly used light sources. The color casts are removed and the final images look natural.
Novel angle estimation for bistatic MIMO radar using an improved MUSIC

NASA Astrophysics Data System (ADS)

Li, Jianfeng; Zhang, Xiaofei; Chen, Han

2014-09-01

In this article, we study the problem of angle estimation for bistatic multiple-input multiple-output (MIMO) radar and propose an improved multiple signal classification (MUSIC) algorithm for joint direction of departure (DOD) and direction of arrival (DOA) estimation. The proposed algorithm obtains initial estimations of angles obtained from the signal subspace and uses the local one-dimensional peak searches to achieve the joint estimations of DOD and DOA. The angle estimation performance of the proposed algorithm is better than that of estimation of signal parameters via rotational invariance techniques (ESPRIT) algorithm, and is almost the same as that of two-dimensional MUSIC. Furthermore, the proposed algorithm can be suitable for irregular array geometry, obtain automatically paired DOD and DOA estimations, and avoid two-dimensional peak searching. The simulation results verify the effectiveness and improvement of the algorithm.
Spatial and temporal estimation of soil loss for the sustainable management of a wet semi-arid watershed cluster.

PubMed

Rejani, R; Rao, K V; Osman, M; Srinivasa Rao, Ch; Reddy, K Sammi; Chary, G R; Pushpanjali; Samuel, Josily

2016-03-01

The ungauged wet semi-arid watershed cluster, Seethagondi, lies in the Adilabad district of Telangana in India and is prone to severe erosion and water scarcity. The runoff and soil loss data at watershed, catchment, and field level are necessary for planning soil and water conservation interventions. In this study, an attempt was made to develop a spatial soil loss estimation model for Seethagondi cluster using RUSLE coupled with ARCGIS and was used to estimate the soil loss spatially and temporally. The daily rainfall data of Aphrodite for the period from 1951 to 2007 was used, and the annual rainfall varied from 508 to 1351 mm with a mean annual rainfall of 950 mm and a mean erosivity of 6789 MJ mm ha(-1) h(-1) year(-1). Considerable variation in land use land cover especially in crop land and fallow land was observed during normal and drought years, and corresponding variation in the erosivity, C factor, and soil loss was also noted. The mean value of C factor derived from NDVI for crop land was 0.42 and 0.22 in normal year and drought years, respectively. The topography is undulating and major portion of the cluster has slope less than 10°, and 85.3% of the cluster has soil loss below 20 t ha(-1) year(-1). The soil loss from crop land varied from 2.9 to 3.6 t ha(-1) year(-1) in low rainfall years to 31.8 to 34.7 t ha(-1) year(-1) in high rainfall years with a mean annual soil loss of 12.2 t ha(-1) year(-1). The soil loss from crop land was higher in the month of August with an annual soil loss of 13.1 and 2.9 t ha(-1) year(-1) in normal and drought year, respectively. Based on the soil loss in a normal year, the interventions recommended for 85.3% of area of the watershed includes agronomic measures such as contour cultivation, graded bunds, strip cropping, mixed cropping, crop rotations, mulching, summer plowing, vegetative bunds, agri-horticultural system, and management practices such as broad bed furrow, raised sunken beds, and harvesting available water
An Information-Theoretic-Cluster Visualization for Self-Organizing Maps.

PubMed

Brito da Silva, Leonardo Enzo; Wunsch, Donald C

2018-06-01

Improved data visualization will be a significant tool to enhance cluster analysis. In this paper, an information-theoretic-based method for cluster visualization using self-organizing maps (SOMs) is presented. The information-theoretic visualization (IT-vis) has the same structure as the unified distance matrix, but instead of depicting Euclidean distances between adjacent neurons, it displays the similarity between the distributions associated with adjacent neurons. Each SOM neuron has an associated subset of the data set whose cardinality controls the granularity of the IT-vis and with which the first- and second-order statistics are computed and used to estimate their probability density functions. These are used to calculate the similarity measure, based on Renyi's quadratic cross entropy and cross information potential (CIP). The introduced visualizations combine the low computational cost and kernel estimation properties of the representative CIP and the data structure representation of a single-linkage-based grouping algorithm to generate an enhanced SOM-based visualization. The visual quality of the IT-vis is assessed by comparing it with other visualization methods for several real-world and synthetic benchmark data sets. Thus, this paper also contains a significant literature survey. The experiments demonstrate the IT-vis cluster revealing capabilities, in which cluster boundaries are sharply captured. Additionally, the information-theoretic visualizations are used to perform clustering of the SOM. Compared with other methods, IT-vis of large SOMs yielded the best results in this paper, for which the quality of the final partitions was evaluated using external validity indices.
Improving the S-Shape Solar Radiation Estimation Method for Supporting Crop Models

PubMed Central

Fodor, Nándor

2012-01-01

In line with the critical comments formulated in relation to the S-shape global solar radiation estimation method, the original formula was improved via a 5-step procedure. The improved method was compared to four-reference methods on a large North-American database. According to the investigated error indicators, the final 7-parameter S-shape method has the same or even better estimation efficiency than the original formula. The improved formula is able to provide radiation estimates with a particularly low error pattern index (PIdoy) which is especially important concerning the usability of the estimated radiation values in crop models. Using site-specific calibration, the radiation estimates of the improved S-shape method caused an average of 2.72 ± 1.02 (α = 0.05) relative error in the calculated biomass. Using only readily available site specific metadata the radiation estimates caused less than 5% relative error in the crop model calculations when they were used for locations in the middle, plain territories of the USA. PMID:22645451
Energy spectra of X-ray clusters of galaxies

NASA Technical Reports Server (NTRS)

Avni, Y.

1976-01-01

A procedure for estimating the ranges of parameters that describe the spectra of X-rays from clusters of galaxies is presented. The applicability of the method is proved by statistical simulations of cluster spectra; such a proof is necessary because of the nonlinearity of the spectral functions. Implications for the spectra of the Perseus, Coma, and Virgo clusters are discussed. The procedure can be applied in more general problems of parameter estimation.
Individual participant data meta-analyses should not ignore clustering

PubMed Central

Abo-Zaid, Ghada; Guo, Boliang; Deeks, Jonathan J.; Debray, Thomas P.A.; Steyerberg, Ewout W.; Moons, Karel G.M.; Riley, Richard David

2013-01-01

Objectives Individual participant data (IPD) meta-analyses often analyze their IPD as if coming from a single study. We compare this approach with analyses that rather account for clustering of patients within studies. Study Design and Setting Comparison of effect estimates from logistic regression models in real and simulated examples. Results The estimated prognostic effect of age in patients with traumatic brain injury is similar, regardless of whether clustering is accounted for. However, a family history of thrombophilia is found to be a diagnostic marker of deep vein thrombosis [odds ratio, 1.30; 95% confidence interval (CI): 1.00, 1.70; P = 0.05] when clustering is accounted for but not when it is ignored (odds ratio, 1.06; 95% CI: 0.83, 1.37; P = 0.64). Similarly, the treatment effect of nicotine gum on smoking cessation is severely attenuated when clustering is ignored (odds ratio, 1.40; 95% CI: 1.02, 1.92) rather than accounted for (odds ratio, 1.80; 95% CI: 1.29, 2.52). Simulations show models accounting for clustering perform consistently well, but downwardly biased effect estimates and low coverage can occur when ignoring clustering. Conclusion Researchers must routinely account for clustering in IPD meta-analyses; otherwise, misleading effect estimates and conclusions may arise. PMID:23651765
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

PubMed Central

Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

2016-01-01

Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
An optimal autonomous microgrid cluster based on distributed generation droop parameter optimization and renewable energy sources using an improved grey wolf optimizer

NASA Astrophysics Data System (ADS)

Moazami Goodarzi, Hamed; Kazemi, Mohammad Hosein

2018-05-01

Microgrid (MG) clustering is regarded as an important driver in improving the robustness of MGs. However, little research has been conducted on providing appropriate MG clustering. This article addresses this shortfall. It proposes a novel multi-objective optimization approach for finding optimal clustering of autonomous MGs by focusing on variables such as distributed generation (DG) droop parameters, the location and capacity of DG units, renewable energy sources, capacitors and powerline transmission. Power losses are minimized and voltage stability is improved while virtual cut-set lines with minimum power transmission for clustering MGs are obtained. A novel chaotic grey wolf optimizer (CGWO) algorithm is applied to solve the proposed multi-objective problem. The performance of the approach is evaluated by utilizing a 69-bus MG in several scenarios.
Unbiased clustering estimation in the presence of missing observations

NASA Astrophysics Data System (ADS)

Bianchi, Davide; Percival, Will J.

2017-11-01

In order to be efficient, spectroscopic galaxy redshift surveys do not obtain redshifts for all galaxies in the population targeted. The missing galaxies are often clustered, commonly leading to a lower proportion of successful observations in dense regions. One example is the close-pair issue for SDSS spectroscopic galaxy surveys, which have a deficit of pairs of observed galaxies with angular separation closer than the hardware limit on placing neighbouring fibres. Spatially clustered missing observations will exist in the next generations of surveys. Various schemes have previously been suggested to mitigate these effects, but none works for all situations. We argue that the solution is to link the missing galaxies to those observed with statistically equivalent clustering properties, and that the best way to do this is to rerun the targeting algorithm, varying the angular position of the observations. Provided that every pair has a non-zero probability of being observed in one realization of the algorithm, then a pair-upweighting scheme linking targets to successful observations, can correct these issues. We present such a scheme, and demonstrate its validity using realizations of an idealized simple survey strategy.
Multi-Optimisation Consensus Clustering

NASA Astrophysics Data System (ADS)

Li, Jian; Swift, Stephen; Liu, Xiaohui

Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
The Next Generation Virgo Cluster Survey. XX. RedGOLD Background Galaxy Cluster Detections

NASA Astrophysics Data System (ADS)

Licitra, Rossella; Mei, Simona; Raichoor, Anand; Erben, Thomas; Hildebrandt, Hendrik; Muñoz, Roberto P.; Van Waerbeke, Ludovic; Côté, Patrick; Cuillandre, Jean-Charles; Duc, Pierre-Alain; Ferrarese, Laura; Gwyn, Stephen D. J.; Huertas-Company, Marc; Lançon, Ariane; Parroni, Carolina; Puzia, Thomas H.

2016-09-01

We build a background cluster candidate catalog from the Next Generation Virgo Cluster Survey (NGVS) using our detection algorithm RedGOLD. The NGVS covers 104 deg2 of the Virgo cluster in the {u}* ,g,r,I,z-bandpasses to a depth of g ˜ 25.7 mag (5σ). Part of the survey was not covered or has shallow observations in the r band. We build two cluster catalogs: one using all bandpasses, for the fields with deep r-band observations (˜20 deg2), and the other using four bandpasses ({u}* ,g,I,z) for the entire NGVS area. Based on our previous Canada-France-Hawaii Telescope Legacy Survey W1 studies, we estimate that both of our catalogs are ˜100% (˜70%) complete and ˜80% pure, at z ≤ 0.6 (z ≲ 1), for galaxy clusters with masses of M ≳ 1014 M ⊙. We show that when using four bandpasses, though the photometric redshift accuracy is lower, RedGOLD detects massive galaxy clusters up to z ˜ 1 with completeness and purity similar to the five-band case. This is achieved when taking into account the bias in the richness estimation, which is ˜40% lower at 0.5 ≤ z < 0.6 and ˜20% higher at 0.6 < z < 0.8, with respect to the five-band case. RedGOLD recovers all the X-ray clusters in the area with mass M 500 > 1.4 × 1014 M ⊙ and 0.08 < z < 0.5. Because of our different cluster richness limits and the NGVS depth, our catalogs reach lower masses than the published redMaPPer cluster catalog over the area, and we recover ˜90%-100% of its detections.
Improving estimates of wilderness use from mandatory travel permits.

Treesearch

David W. Lime; Grace A. Lorence

1974-01-01

Mandatory permits provide recreation managers with better use estimates. Because some visitors do not obtain permits, use estimates based on permit data need to be corrected. In the Boundary Waters Canoe Area, a method was devised for distinguishing noncomplying groups and finding correction factors that reflect the impact of these groups. Suggestions for improving...
Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms

NASA Astrophysics Data System (ADS)

Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut

1998-06-01

X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.
Improved Estimation and Interpretation of Correlations in Neural Circuits

PubMed Central

Yatsenko, Dimitri; Josić, Krešimir; Ecker, Alexander S.; Froudarakis, Emmanouil; Cotton, R. James; Tolias, Andreas S.

2015-01-01

Ambitious projects aim to record the activity of ever larger and denser neuronal populations in vivo. Correlations in neural activity measured in such recordings can reveal important aspects of neural circuit organization. However, estimating and interpreting large correlation matrices is statistically challenging. Estimation can be improved by regularization, i.e. by imposing a structure on the estimate. The amount of improvement depends on how closely the assumed structure represents dependencies in the data. Therefore, the selection of the most efficient correlation matrix estimator for a given neural circuit must be determined empirically. Importantly, the identity and structure of the most efficient estimator informs about the types of dominant dependencies governing the system. We sought statistically efficient estimators of neural correlation matrices in recordings from large, dense groups of cortical neurons. Using fast 3D random-access laser scanning microscopy of calcium signals, we recorded the activity of nearly every neuron in volumes 200 μm wide and 100 μm deep (150–350 cells) in mouse visual cortex. We hypothesized that in these densely sampled recordings, the correlation matrix should be best modeled as the combination of a sparse graph of pairwise partial correlations representing local interactions and a low-rank component representing common fluctuations and external inputs. Indeed, in cross-validation tests, the covariance matrix estimator with this structure consistently outperformed other regularized estimators. The sparse component of the estimate defined a graph of interactions. These interactions reflected the physical distances and orientation tuning properties of cells: The density of positive ‘excitatory’ interactions decreased rapidly with geometric distances and with differences in orientation preference whereas negative ‘inhibitory’ interactions were less selective. Because of its superior performance, this �
Optimized Clustering Estimators for BAO Measurements Accounting for Significant Redshift Uncertainty

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ross, Ashley J.; Banik, Nilanjan; Avila, Santiago

2017-05-15

We determine an optimized clustering statistic to be used for galaxy samples with significant redshift uncertainty, such as those that rely on photometric redshifts. To do so, we study the BAO information content as a function of the orientation of galaxy clustering modes with respect to their angle to the line-of-sight (LOS). The clustering along the LOS, as observed in a redshift-space with significant redshift uncertainty, has contributions from clustering modes with a range of orientations with respect to the true LOS. For redshift uncertaintymore » $$\\sigma_z \\geq 0.02(1+z)$$ we find that while the BAO information is confined to transverse clustering modes in the true space, it is spread nearly evenly in the observed space. Thus, measuring clustering in terms of the projected separation (regardless of the LOS) is an efficient and nearly lossless compression of the signal for $$\\sigma_z \\geq 0.02(1+z)$$. For reduced redshift uncertainty, a more careful consideration is required. We then use more than 1700 realizations of galaxy simulations mimicking the Dark Energy Survey Year 1 sample to validate our analytic results and optimized analysis procedure. We find that using the correlation function binned in projected separation, we can achieve uncertainties that are within 10 per cent of of those predicted by Fisher matrix forecasts. We predict that DES Y1 should achieve a 5 per cent distance measurement using our optimized methods. We expect the results presented here to be important for any future BAO measurements made using photometric redshift data.« less
Competing risks regression for clustered data

PubMed Central

Zhou, Bingqing; Fine, Jason; Latouche, Aurelien; Labopin, Myriam

2012-01-01

A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine–Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry. PMID:22045910
Ages of intermediate-age Magellanic Cloud star clusters

NASA Technical Reports Server (NTRS)

Flower, P. J.

1984-01-01

Ages of intermediate-age Large Magellanic Cloud star clusters have been estimated without locating the faint, unevolved portion of cluster main sequences. Six clusters with established color-magnitude diagrams were selected for study: SL 868, NGC 1783, NGC 1868, NGC 2121, NGC 2209, and NGC 2231. Since red giant photometry is more accurate than the necessarily fainter main-sequence photometry, the distributions of red giants on the cluster color-magnitude diagrams were compared to a grid of 33 stellar evolutionary tracks, evolved from the main sequence through core-helium exhaustion, spanning the expected mass and metallicity range for Magellanic Cloud cluster red giants. The time-dependent behavior of the luminosity of the model red giants was used to estimate cluster ages from the observed cluster red giant luminosities. Except for the possibility of SL 868 being an old globular cluster, all clusters studied were found to have ages less than 10 to the 9th yr. It is concluded that there is currently no substantial evidence for a major cluster population of large, populous clusters greater than 10 to the 9th yr old in the Large Magellanic Cloud.
Parameters of oscillation generation regions in open star cluster models

NASA Astrophysics Data System (ADS)

Danilov, V. M.; Putkov, S. I.

2017-07-01

We determine the masses and radii of central regions of open star cluster (OCL) models with small or zero entropy production and estimate the masses of oscillation generation regions in clustermodels based on the data of the phase-space coordinates of stars. The radii of such regions are close to the core radii of the OCL models. We develop a new method for estimating the total OCL masses based on the cluster core mass, the cluster and cluster core radii, and radial distribution of stars. This method yields estimates of dynamical masses of Pleiades, Praesepe, and M67, which agree well with the estimates of the total masses of the corresponding clusters based on proper motions and spectroscopic data for cluster stars.We construct the spectra and dispersion curves of the oscillations of the field of azimuthal velocities v φ in OCL models. Weak, low-amplitude unstable oscillations of v φ develop in cluster models near the cluster core boundary, and weak damped oscillations of v φ often develop at frequencies close to the frequencies of more powerful oscillations, which may reduce the non-stationarity degree in OCL models. We determine the number and parameters of such oscillations near the cores boundaries of cluster models. Such oscillations points to the possible role that gradient instability near the core of cluster models plays in the decrease of the mass of the oscillation generation regions and production of entropy in the cores of OCL models with massive extended cores.
Galaxy Clusters

NASA Astrophysics Data System (ADS)

Miller, Christopher J. Miller

2012-03-01

There are many examples of clustering in astronomy. Stars in our own galaxy are often seen as being gravitationally bound into tight globular or open clusters. The Solar System's Trojan asteroids cluster at the gravitational Langrangian in front of Jupiter’s orbit. On the largest of scales, we find gravitationally bound clusters of galaxies, the Virgo cluster (in the constellation of Virgo at a distance of ˜50 million light years) being a prime nearby example. The Virgo cluster subtends an angle of nearly 8◦ on the sky and is known to contain over a thousand member galaxies. Galaxy clusters play an important role in our understanding of theUniverse. Clusters exist at peaks in the three-dimensional large-scale matter density field. Their sky (2D) locations are easy to detect in astronomical imaging data and their mean galaxy redshifts (redshift is related to the third spatial dimension: distance) are often better (spectroscopically) and cheaper (photometrically) when compared with the entire galaxy population in large sky surveys. Photometric redshift (z) [Photometric techniques use the broad band filter magnitudes of a galaxy to estimate the redshift. Spectroscopic techniques use the galaxy spectra and emission/absorption line features to measure the redshift] determinations of galaxies within clusters are accurate to better than delta_z = 0.05 [7] and when studied as a cluster population, the central galaxies form a line in color-magnitude space (called the the E/S0 ridgeline and visible in Figure 16.3) that contains galaxies with similar stellar populations [15]. The shape of this E/S0 ridgeline enables astronomers to measure the cluster redshift to within delta_z = 0.01 [23]. The most accurate cluster redshift determinations come from spectroscopy of the member galaxies, where only a fraction of the members need to be spectroscopically observed [25,42] to get an accurate redshift to the whole system. If light traces mass in the Universe, then the locations

The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing

DOE PAGES

Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh; ...

2018-01-04

Here, we study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halomore » shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.« less
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh

Here, we study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halomore » shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.« less
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing

NASA Astrophysics Data System (ADS)

Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh; Bernstein, Gary; Neil, Andrew; Rozo, Eduardo; Rykoff, Eli

2018-04-01

We study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halo shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.
Turbulence measurements in clusters of galaxies with XMM-Newton

NASA Astrophysics Data System (ADS)

Pinto, C.; Fabian, A.; de Plaa, J.; Sanders, J.

2014-07-01

The kinematics structure of the intracluster medium (ICM) in clusters of galaxies is related to the their evolution. AGN feedback, sloshing of gas within the potential well, and galaxy mergers are thought to generate ICM velocity widths of several hundred km/s. Appropriate determinations of turbulent broadening are crucial not only to understand the effects of the central engine onto the evolution of the clusters, but are also mandatory to obtain realistic (emission) line fits and abundances estimate. We have analyzed the data from the CHEERS catalog which includes 1.5 Ms of new observations (PI: Jelle de Plaa) and archival data for a total of 29 clusters and groups of galaxies, and elliptical galaxies. This campaign provided us with a unique database that significantly improves the quality of the existing observations and the measurements of chemical abundances and turbulent broadening. We have applied the continuum-subtraction spectral-fitting method of Sanders and Fabian and measured turbulence, temperatures, and abundances for the sources in the catalog. For some sources we obtain tight estimates of velocity broadening which is related to the past AGN activity and mergers. We will show our results at the conference and their relevance in the context of future missions.
Improving RNA-Seq expression estimates by correcting for fragment bias

PubMed Central

2011-01-01

The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973
Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness

PubMed Central

2015-01-01

Background Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. The lack of knowledge on the number of different strains in a quasispecies population is observed to hinder the precision of existing Viral Quasispecies Spectrum Reconstruction (QSR) methods due to the uncontrolled reconstruction of a large number of in silico false positives. In this work, we formulated a novel probabilistic method for strain richness estimation specifically targeting viral quasispecies. By using this approach we improved our recently proposed spectrum reconstruction pipeline ViQuaS to achieve higher levels of precision in reconstructed quasispecies spectra without compromising the recall rates. We also discuss how one other existing popular QSR method named ShoRAH can be improved using this new approach. Results On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. Conclusions The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. Availability http://sourceforge.net/projects/viquas/ PMID:26678073
Optimized clustering estimators for BAO measurements accounting for significant redshift uncertainty

NASA Astrophysics Data System (ADS)

Ross, Ashley J.; Banik, Nilanjan; Avila, Santiago; Percival, Will J.; Dodelson, Scott; Garcia-Bellido, Juan; Crocce, Martin; Elvin-Poole, Jack; Giannantonio, Tommaso; Manera, Marc; Sevilla-Noarbe, Ignacio

2017-12-01

We determine an optimized clustering statistic to be used for galaxy samples with significant redshift uncertainty, such as those that rely on photometric redshifts. To do so, we study the baryon acoustic oscillation (BAO) information content as a function of the orientation of galaxy clustering modes with respect to their angle to the line of sight (LOS). The clustering along the LOS, as observed in a redshift-space with significant redshift uncertainty, has contributions from clustering modes with a range of orientations with respect to the true LOS. For redshift uncertainty σz ≥ 0.02(1 + z), we find that while the BAO information is confined to transverse clustering modes in the true space, it is spread nearly evenly in the observed space. Thus, measuring clustering in terms of the projected separation (regardless of the LOS) is an efficient and nearly lossless compression of the signal for σz ≥ 0.02(1 + z). For reduced redshift uncertainty, a more careful consideration is required. We then use more than 1700 realizations (combining two separate sets) of galaxy simulations mimicking the Dark Energy Survey Year 1 (DES Y1) sample to validate our analytic results and optimized analysis procedure. We find that using the correlation function binned in projected separation, we can achieve uncertainties that are within 10 per cent of those predicted by Fisher matrix forecasts. We predict that DES Y1 should achieve a 5 per cent distance measurement using our optimized methods. We expect the results presented here to be important for any future BAO measurements made using photometric redshift data.
Development of Energy Efficient Clustering Protocol in Wireless Sensor Network Using Neuro-Fuzzy Approach.

PubMed

Julie, E Golden; Selvi, S Tamil

2016-01-01

Wireless sensor networks (WSNs) consist of sensor nodes with limited processing capability and limited nonrechargeable battery power. Energy consumption in WSN is a significant issue in networks for improving network lifetime. It is essential to develop an energy aware clustering protocol in WSN to reduce energy consumption for increasing network lifetime. In this paper, a neuro-fuzzy energy aware clustering scheme (NFEACS) is proposed to form optimum and energy aware clusters. NFEACS consists of two parts: fuzzy subsystem and neural network system that achieved energy efficiency in forming clusters and cluster heads in WSN. NFEACS used neural network that provides effective training set related to energy and received signal strength of all nodes to estimate the expected energy for tentative cluster heads. Sensor nodes with higher energy are trained with center location of base station to select energy aware cluster heads. Fuzzy rule is used in fuzzy logic part that inputs to form clusters. NFEACS is designed for WSN handling mobility of node. The proposed scheme NFEACS is compared with related clustering schemes, cluster-head election mechanism using fuzzy logic, and energy aware fuzzy unequal clustering. The experiment results show that NFEACS performs better than the other related schemes.
Evaluation of small area crop estimation techniques using LANDSAT- and ground-derived data. [South Dakota

NASA Technical Reports Server (NTRS)

Amis, M. L.; Martin, M. V.; Mcguire, W. G.; Shen, S. S. (Principal Investigator)

1982-01-01

Studies completed in fiscal year 1981 in support of the clustering/classification and preprocessing activities of the Domestic Crops and Land Cover project. The theme throughout the study was the improvement of subanalysis district (usually county level) crop hectarage estimates, as reflected in the following three objectives: (1) to evaluate the current U.S. Department of Agriculture Statistical Reporting Service regression approach to crop area estimation as applied to the problem of obtaining subanalysis district estimates; (2) to develop and test alternative approaches to subanalysis district estimation; and (3) to develop and test preprocessing techniques for use in improving subanalysis district estimates.
Mixture modelling for cluster analysis.

PubMed

McLachlan, G J; Chang, S U

2004-10-01

Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

PubMed

Anandakrishnan, Ramu; Onufriev, Alexey

2008-03-01

In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Implementation of client versus care-provider strategies to improve external cephalic version rates: a cluster randomized controlled trial.

PubMed

Vlemmix, Floortje; Rosman, Ageeth N; Rijnders, Marlies E; Beuckens, Antje; Opmeer, Brent C; Mol, Ben W J; Kok, Marjolein; Fleuren, Margot A H

2015-05-01

To determine the effectiveness of a client or care-provider strategy to improve the implementation of external cephalic version. Cluster randomized controlled trial. Twenty-five clusters; hospitals and their referring midwifery practices randomly selected in the Netherlands. Singleton breech presentation from 32 weeks of gestation onwards. We randomized clusters to a client strategy (written information leaflets and decision aid), a care-provider strategy (1-day counseling course focused on knowledge and counseling skills), a combined client and care-provider strategy and care-as-usual strategy. We performed an intention-to-treat analysis. Rate of external cephalic version in various strategies. Secondary outcomes were the percentage of women counseled and opting for a version attempt. The overall implementation rate of external cephalic version was 72% (1169 of 1613 eligible clients) with a range between clusters of 8-95%. Neither the client strategy (OR 0.8, 95% CI 0.4-1.5) nor the care-provider strategy (OR 1.2, 95% CI 0.6-2.3) showed significant improvements. Results were comparable when we limited the analysis to those women who were actually offered intervention (OR 0.6, 95% CI 0.3-1.4 and OR 2.0, 95% CI 0.7-4.5). Neither a client nor a care-provider strategy improved the external cephalic version implementation rate for breech presentation, neither with regard to the number of version attempts offered nor the number of women accepting the procedure. © 2015 Nordic Federation of Societies of Obstetrics and Gynecology.
Review of methods for handling confounding by cluster and informative cluster size in clustered data

PubMed Central

Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew

2014-01-01

Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
Genetic algorithm-based improved DOA estimation using fourth-order cumulants

NASA Astrophysics Data System (ADS)

Ahmed, Ammar; Tufail, Muhammad

2017-05-01

Genetic algorithm (GA)-based direction of arrival (DOA) estimation is proposed using fourth-order cumulants (FOC) and ESPRIT principle which results in Multiple Invariance Cumulant ESPRIT algorithm. In the existing FOC ESPRIT formulations, only one invariance is utilised to estimate DOAs. The unused multiple invariances (MIs) must be exploited simultaneously in order to improve the estimation accuracy. In this paper, a fitness function based on a carefully designed cumulant matrix is developed which incorporates MIs present in the sensor array. Better DOA estimation can be achieved by minimising this fitness function. Moreover, the effectiveness of Newton's method as well as GA for this optimisation problem has been illustrated. Simulation results show that the proposed algorithm provides improved estimation accuracy compared to existing algorithms, especially in the case of low SNR, less number of snapshots, closely spaced sources and high signal and noise correlation. Moreover, it is observed that the optimisation using Newton's method is more likely to converge to false local optima resulting in erroneous results. However, GA-based optimisation has been found attractive due to its global optimisation capability.
Computer aided detection of clusters of microcalcifications on full field digital mammograms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.

2006-08-15

We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from themore » artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case
Correspondence between ion-cluster and bulk thermodynamics: on the validity of the cluster pair approximation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vlcek, Lukas; Chialvo, Ariel; Simonson, J Michael

2013-01-01

Molecular models and experimental estimates based on the cluster pair approximation (CPA) provide inconsistent predictions of absolute single-ion hydration properties. To understand the origin of this discrepancy we used molecular simulations to study the transition between hydration of alkali metal and halide ions in small aqueous clusters and bulk water. The results demonstrate that the assumptions underlying the CPA are not generally valid as a result of a significant shift in the ion hydration free energies (~15 kJ/mol) and enthalpies (~47 kJ/mol) in the intermediate range of cluster sizes. When this effect is accounted for, the systematic differences between modelsmore » and experimental predictions disappear, and the value of absolute proton hydration enthalpy based on the CPA gets in closer agreement with other estimates.« less
Centre-excised X-ray luminosity as an efficient mass proxy for future galaxy cluster surveys

DOE PAGES

Mantz, Adam B.; Allen, Steven W.; Morris, R. Glenn; ...

2017-10-02

The cosmological constraining power of modern galaxy cluster catalogues can be improved by obtaining low-scatter mass proxy measurements for even a small fraction of sources. In the context of large upcoming surveys that will reveal the cluster population down to the group scale and out to high redshifts, efficient strategies for obtaining such mass proxies will be valuable. Here in this work, we use high-quality weak-lensing and X-ray mass estimates for massive clusters in current X-ray-selected catalogues to revisit the scaling relations of the projected, centre-excised X-ray luminosity (L ce), which previous work suggests correlates tightly with total mass. Ourmore » data confirm that this is the case with Lce having an intrinsic scatter at fixed mass comparable to that of gas mass, temperature or YX. Compared to the other proxies, however, Lce is less susceptible to systematic uncertainties due to background modelling, and can be measured precisely with shorter exposures. This opens up the possibility of using L ce to estimate masses for large numbers of clusters discovered by new X-ray surveys (e.g. eROSITA) directly from the survey data, as well as for clusters discovered at other wavelengths with relatively short follow-up observations. We describe a simple procedure for making such estimates from X-ray surface brightness data, and comment on the spatial resolution required to apply this method as a function of cluster mass and redshift. Lastly, we also explore the potential impact of Chandra and XMM–Newton follow-up observations over the next decade on dark energy constraints from new cluster surveys.« less
Centre-excised X-ray luminosity as an efficient mass proxy for future galaxy cluster surveys

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mantz, Adam B.; Allen, Steven W.; Morris, R. Glenn

The cosmological constraining power of modern galaxy cluster catalogues can be improved by obtaining low-scatter mass proxy measurements for even a small fraction of sources. In the context of large upcoming surveys that will reveal the cluster population down to the group scale and out to high redshifts, efficient strategies for obtaining such mass proxies will be valuable. Here in this work, we use high-quality weak-lensing and X-ray mass estimates for massive clusters in current X-ray-selected catalogues to revisit the scaling relations of the projected, centre-excised X-ray luminosity (L ce), which previous work suggests correlates tightly with total mass. Ourmore » data confirm that this is the case with Lce having an intrinsic scatter at fixed mass comparable to that of gas mass, temperature or YX. Compared to the other proxies, however, Lce is less susceptible to systematic uncertainties due to background modelling, and can be measured precisely with shorter exposures. This opens up the possibility of using L ce to estimate masses for large numbers of clusters discovered by new X-ray surveys (e.g. eROSITA) directly from the survey data, as well as for clusters discovered at other wavelengths with relatively short follow-up observations. We describe a simple procedure for making such estimates from X-ray surface brightness data, and comment on the spatial resolution required to apply this method as a function of cluster mass and redshift. Lastly, we also explore the potential impact of Chandra and XMM–Newton follow-up observations over the next decade on dark energy constraints from new cluster surveys.« less
Estimating Treatment Effects via Multilevel Matching within Homogenous Groups of Clusters

ERIC Educational Resources Information Center

Steiner, Peter M.; Kim, Jee-Seon

2015-01-01

Despite the popularity of propensity score (PS) techniques they are not yet well studied for matching multilevel data where selection into treatment takes place among level-one units within clusters. This paper suggests a PS matching strategy that tries to avoid the disadvantages of within- and across-cluster matching. The idea is to first…
An Improved Source-Scanning Algorithm for Locating Earthquake Clusters or Aftershock Sequences

NASA Astrophysics Data System (ADS)

Liao, Y.; Kao, H.; Hsu, S.

2010-12-01

The Source-scanning Algorithm (SSA) was originally introduced in 2004 to locate non-volcanic tremors. Its application was later expanded to the identification of earthquake rupture planes and the near-real-time detection and monitoring of landslides and mud/debris flows. In this study, we further improve SSA for the purpose of locating earthquake clusters or aftershock sequences when only a limited number of waveform observations are available. The main improvements include the application of a ground motion analyzer to separate P and S waves, the automatic determination of resolution based on the grid size and time step of the scanning process, and a modified brightness function to utilize constraints from multiple phases. Specifically, the improved SSA (named as ISSA) addresses two major issues related to locating earthquake clusters/aftershocks. The first one is the massive amount of both time and labour to locate a large number of seismic events manually. And the second one is to efficiently and correctly identify the same phase across the entire recording array when multiple events occur closely in time and space. To test the robustness of ISSA, we generate synthetic waveforms consisting of 3 separated events such that individual P and S phases arrive at different stations in different order, thus making correct phase picking nearly impossible. Using these very complicated waveforms as the input, the ISSA scans all model space for possible combination of time and location for the existence of seismic sources. The scanning results successfully associate various phases from each event at all stations, and correctly recover the input. To further demonstrate the advantage of ISSA, we apply it to the waveform data collected by a temporary OBS array for the aftershock sequence of an offshore earthquake southwest of Taiwan. The overall signal-to-noise ratio is inadequate for locating small events; and the precise arrival times of P and S phases are difficult to

Cluster Adjusted Regression for Displaced Subject Data (CARDS): Marginal Inference under Potentially Informative Temporal Cluster Size Profiles

PubMed Central

Bible, Joe; Beck, James D.; Datta, Somnath

2016-01-01

Summary Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497–505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study. PMID:26682911
Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation

NASA Astrophysics Data System (ADS)

Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.

2016-12-01

With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.
The Richness Dependence of Galaxy Cluster Correlations: Results From A Redshift Survey Of Rich APM Clusters

NASA Technical Reports Server (NTRS)

Croft, R. A. C.; Dalton, G. B.; Efstathiou, G.; Sutherland, W. J.; Maddox, S. J.

1997-01-01

We analyze the spatial clustering properties of a new catalog of very rich galaxy clusters selected from the APM Galaxy Survey. These clusters are of comparable richness and space density to Abell Richness Class greater than or equal to 1 clusters, but selected using an objective algorithm from a catalog demonstrably free of artificial inhomogeneities. Evaluation of the two-point correlation function xi(sub cc)(r) for the full sample and for richer subsamples reveals that the correlation amplitude is consistent with that measured for lower richness APM clusters and X-ray selected clusters. We apply a maximum likelihood estimator to find the best fitting slope and amplitude of a power law fit to x(sub cc)(r), and to estimate the correlation length r(sub 0) (the value of r at which xi(sub cc)(r) is equal to unity). For clusters with a mean space density of 1.6 x 10(exp -6) h(exp 3) MpC(exp -3) (equivalent to the space density of Abell Richness greater than or equal to 2 clusters), we find r(sub 0) = 21.3(+11.1/-9.3) h(exp -1) Mpc (95% confidence limits). This is consistent with the weak richness dependence of xi(sub cc)(r) expected in Gaussian models of structure formation. In particular, the amplitude of xi(sub cc)(r) at all richnesses matches that of xi(sub cc)(r) for clusters selected in N-Body simulations of a low density Cold Dark Matter model.
Bone Pose Estimation in the Presence of Soft Tissue Artifact Using Triangular Cosserat Point Elements.

PubMed

Solav, Dana; Rubin, M B; Cereatti, Andrea; Camomilla, Valentina; Wolf, Alon

2016-04-01

Accurate estimation of the position and orientation (pose) of a bone from a cluster of skin markers is limited mostly by the relative motion between the bone and the markers, which is known as the soft tissue artifact (STA). This work presents a method, based on continuum mechanics, to describe the kinematics of a cluster affected by STA. The cluster is characterized by triangular cosserat point elements (TCPEs) defined by all combinations of three markers. The effects of the STA on the TCPEs are quantified using three parameters describing the strain in each TCPE and the relative rotation and translation between TCPEs. The method was evaluated using previously collected ex vivo kinematic data. Femur pose was estimated from 12 skin markers on the thigh, while its reference pose was measured using bone pins. Analysis revealed that instantaneous subsets of TCPEs exist which estimate bone position and orientation more accurately than the Procrustes Superimposition applied to the cluster of all markers. It has been shown that some of these parameters correlate well with femur pose errors, which suggests that they can be used to select, at each instant, subsets of TCPEs leading an improved estimation of the underlying bone pose.
Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schrabback, T.; Applegate, D.; Dietrich, J. P.

Here we present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z median = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev–Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass–observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration–mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass–temperature scaling relation ln (E(z)M 500c/10 14 M ⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81more » $$+0.24\\atop{-0.14}$$(stat.)±0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c 200c=5.6$$+3.7\\atop{-1.8}$$.« less
Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schrabback, T.; Applegate, D.; Dietrich, J. P.

We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z(median) = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in Vmore » - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z) M-500c/10(14)M(circle dot)) = A + 1.5ln (kT/7.2 keV) to A = 1.81(-0.14)(+0.24)(stat.)+/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c(200c) = 5.6(-1.8)(+3.7).« less
Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

DOE PAGES

Schrabback, T.; Applegate, D.; Dietrich, J. P.; ...

2017-10-14

Here we present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z median = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev–Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass–observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration–mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass–temperature scaling relation ln (E(z)M 500c/10 14 M ⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81more » $$+0.24\\atop{-0.14}$$(stat.)±0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c 200c=5.6$$+3.7\\atop{-1.8}$$.« less
Cluster mass calibration at high redshift: HST weak lensing analysis of 13 distant galaxy clusters from the South Pole Telescope Sunyaev-Zel'dovich Survey

NASA Astrophysics Data System (ADS)

Schrabback, T.; Applegate, D.; Dietrich, J. P.; Hoekstra, H.; Bocquet, S.; Gonzalez, A. H.; von der Linden, A.; McDonald, M.; Morrison, C. B.; Raihan, S. F.; Allen, S. W.; Bayliss, M.; Benson, B. A.; Bleem, L. E.; Chiu, I.; Desai, S.; Foley, R. J.; de Haan, T.; High, F. W.; Hilbert, S.; Mantz, A. B.; Massey, R.; Mohr, J.; Reichardt, C. L.; Saro, A.; Simon, P.; Stern, C.; Stubbs, C. W.; Zenteno, A.

2018-02-01

We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (zmedian = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z)M500c/1014 M⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81^{+0.24}_{-0.14}(stat.) {± } 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.
Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel'dovich Survey

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schrabback, T.; et al.

We present an HST/ACS weak gravitational lensing analysis of 13 massive high-redshift (z_median=0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the sourcemore » redshift distribution is based on CANDELS data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the mass-concentration relation using simulations. In combination with temperature estimates from Chandra we constrain the normalisation of the mass-temperature scaling relation ln(E(z) M_500c/10^14 M_sun)=A+1.5 ln(kT/7.2keV) to A=1.81^{+0.24}_{-0.14}(stat.) +/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.« less
GibbsCluster: unsupervised clustering and alignment of peptide sequences.

PubMed

Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

2017-07-03

Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Scanning linear estimation: improvements over region of interest (ROI) methods

NASA Astrophysics Data System (ADS)

Kupinski, Meredith K.; Clarkson, Eric W.; Barrett, Harrison H.

2013-03-01

In tomographic medical imaging, a signal activity is typically estimated by summing voxels from a reconstructed image. We introduce an alternative estimation scheme that operates on the raw projection data and offers a substantial improvement, as measured by the ensemble mean-square error (EMSE), when compared to using voxel values from a maximum-likelihood expectation-maximization (MLEM) reconstruction. The scanning-linear (SL) estimator operates on the raw projection data and is derived as a special case of maximum-likelihood estimation with a series of approximations to make the calculation tractable. The approximated likelihood accounts for background randomness, measurement noise and variability in the parameters to be estimated. When signal size and location are known, the SL estimate of signal activity is unbiased, i.e. the average estimate equals the true value. By contrast, unpredictable bias arising from the null functions of the imaging system affect standard algorithms that operate on reconstructed data. The SL method is demonstrated for two different tasks: (1) simultaneously estimating a signal’s size, location and activity; (2) for a fixed signal size and location, estimating activity. Noisy projection data are realistically simulated using measured calibration data from the multi-module multi-resolution small-animal SPECT imaging system. For both tasks, the same set of images is reconstructed using the MLEM algorithm (80 iterations), and the average and maximum values within the region of interest (ROI) are calculated for comparison. This comparison shows dramatic improvements in EMSE for the SL estimates. To show that the bias in ROI estimates affects not only absolute values but also relative differences, such as those used to monitor the response to therapy, the activity estimation task is repeated for three different signal sizes.
Image Location Estimation by Salient Region Matching.

PubMed

Qian, Xueming; Zhao, Yisi; Han, Junwei

2015-11-01

Nowadays, locations of images have been widely used in many application scenarios for large geo-tagged image corpora. As to images which are not geographically tagged, we estimate their locations with the help of the large geo-tagged image set by content-based image retrieval. In this paper, we exploit spatial information of useful visual words to improve image location estimation (or content-based image retrieval performances). We proposed to generate visual word groups by mean-shift clustering. To improve the retrieval performance, spatial constraint is utilized to code the relative position of visual words. We proposed to generate a position descriptor for each visual word and build fast indexing structure for visual word groups. Experiments show the effectiveness of our proposed approach.
Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

PubMed Central

Andridge, Rebecca. R.

2011-01-01

In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309
ASCA Temperature Maps for Several Interesting Clusters and Their Interpretations

NASA Technical Reports Server (NTRS)

Markevitch, M.; Sarazin, C.; Forman, W.; Vikhlinin, A.

1998-01-01

We present ASCA temperature maps for several galaxy clusters with strong mergers, as well as for several relaxed clusters selected for X-ray mass determination. From the merger temperature maps, we estimate velocities of the colliding subunits and discuss several implications of these estimates. For the relaxed clusters, we derive unprecedentedly accurate mass and gas fraction profiles out to radii of overdensity approximately 500.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

PubMed Central

Diaz-Ordaz, Karla; Bartlett, Jonathan W

2016-01-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

PubMed

Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

2017-06-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Open star clusters and Galactic structure

NASA Astrophysics Data System (ADS)

Joshi, Yogesh C.

2018-04-01

In order to understand the Galactic structure, we perform a statistical analysis of the distribution of various cluster parameters based on an almost complete sample of Galactic open clusters yet available. The geometrical and physical characteristics of a large number of open clusters given in the MWSC catalogue are used to study the spatial distribution of clusters in the Galaxy and determine the scale height, solar offset, local mass density and distribution of reddening material in the solar neighbourhood. We also explored the mass-radius and mass-age relations in the Galactic open star clusters. We find that the estimated parameters of the Galactic disk are largely influenced by the choice of cluster sample.
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

PubMed

Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

2012-10-01

The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
An astrophysics data program investigation of cluster evolution

NASA Technical Reports Server (NTRS)

Kellogg, Edwin M.

1990-01-01

A preliminary status report is given on studies using the Einstein x ray observations of distant clusters of galaxies that are also candidates for gravitational lenses. The studies will determine the location and surface brightness distribution of the x ray emission from clusters associated with selected gravitational lenses. The x ray emission comes from hot gas that traces out the total gravitational potential in the cluster, so its distribution is approximately the same as the mass distribution causing gravitational lensing. Core radii and x ray virial masses can be computed for several of the brighter Einstein sources, and preliminary results are presented on A2218. Preliminary status is also reported on a study of the optical data from 0024+16. A provisional value of 1800 to 2200 km/s for the equivalent velocity dispersion is obtained. The ultimate objective is to extract the mass of the gravitational lens, and perhaps more detailed information on the distribution of matter as warranted. A survey of the Einstein archive shows that the clusters A520, A1704, 3C295, A2397, A1722, SC5029-247, A3186 and A370 have enough x ray counts observed to warrant more detailed optical observations of arcs for comparison. Mass estimates for these clusters can therefore be obtained from three independent sources: the length scale (core radius) that characterizes the density dropoff of the x ray emitting hot gas away from its center, the velocity dispersion of the galaxies moving in the cluster potential, and gravitational bending of light by the total cluster mass. This study will allow the comparison of these three techniques and ultimately improve the knowledge of cluster masses.
Fast Constrained Spectral Clustering and Cluster Ensemble with Random Projection

PubMed Central

Liu, Wenfen

2017-01-01

Constrained spectral clustering (CSC) method can greatly improve the clustering accuracy with the incorporation of constraint information into spectral clustering and thus has been paid academic attention widely. In this paper, we propose a fast CSC algorithm via encoding landmark-based graph construction into a new CSC model and applying random sampling to decrease the data size after spectral embedding. Compared with the original model, the new algorithm has the similar results with the increase of its model size asymptotically; compared with the most efficient CSC algorithm known, the new algorithm runs faster and has a wider range of suitable data sets. Meanwhile, a scalable semisupervised cluster ensemble algorithm is also proposed via the combination of our fast CSC algorithm and dimensionality reduction with random projection in the process of spectral ensemble clustering. We demonstrate by presenting theoretical analysis and empirical results that the new cluster ensemble algorithm has advantages in terms of efficiency and effectiveness. Furthermore, the approximate preservation of random projection in clustering accuracy proved in the stage of consensus clustering is also suitable for the weighted k-means clustering and thus gives the theoretical guarantee to this special kind of k-means clustering where each point has its corresponding weight. PMID:29312447

Fast clustering using adaptive density peak detection.

PubMed

Wang, Xiao-Feng; Xu, Yifan

2017-12-01

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Improved blood glucose estimation through multi-sensor fusion.

PubMed

Xiong, Feiyu; Hipszer, Brian R; Joseph, Jeffrey; Kam, Moshe

2011-01-01

Continuous glucose monitoring systems are an integral component of diabetes management. Efforts to improve the accuracy and robustness of these systems are at the forefront of diabetes research. Towards this goal, a multi-sensor approach was evaluated in hospitalized patients. In this paper, we report on a multi-sensor fusion algorithm to combine glucose sensor measurements in a retrospective fashion. The results demonstrate the algorithm's ability to improve the accuracy and robustness of the blood glucose estimation with current glucose sensor technology.
Improved Modeling of Three-Point Estimates for Decision Making: Going Beyond the Triangle

DTIC Science & Technology

2016-03-01

OF THREE-POINT ESTIMATES FOR DECISION MAKING: GOING BEYOND THE TRIANGLE by Daniel W. Mulligan March 2016 Thesis Advisor: Mark Rhoades...REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE IMPROVED MODELING OF THREE-POINT ESTIMATES FOR DECISION MAKING: GOING BEYOND...unlimited IMPROVED MODELING OF THREE-POINT ESTIMATES FOR DECISION MAKING: GOING BEYOND THE TRIANGLE Daniel W. Mulligan Civilian, National
Improving the precision of dynamic forest parameter estimates using Landsat

Treesearch

Evan B. Brooks; John W. Coulston; Randolph H. Wynne; Valerie A. Thomas

2016-01-01

The use of satellite-derived classification maps to improve post-stratified forest parameter estimates is wellestablished.When reducing the variance of post-stratification estimates for forest change parameters such as forestgrowth, it is logical to use a change-related strata map. At the stand level, a time series of Landsat images is
Development of Energy Efficient Clustering Protocol in Wireless Sensor Network Using Neuro-Fuzzy Approach

PubMed Central

Julie, E. Golden; Selvi, S. Tamil

2016-01-01

Wireless sensor networks (WSNs) consist of sensor nodes with limited processing capability and limited nonrechargeable battery power. Energy consumption in WSN is a significant issue in networks for improving network lifetime. It is essential to develop an energy aware clustering protocol in WSN to reduce energy consumption for increasing network lifetime. In this paper, a neuro-fuzzy energy aware clustering scheme (NFEACS) is proposed to form optimum and energy aware clusters. NFEACS consists of two parts: fuzzy subsystem and neural network system that achieved energy efficiency in forming clusters and cluster heads in WSN. NFEACS used neural network that provides effective training set related to energy and received signal strength of all nodes to estimate the expected energy for tentative cluster heads. Sensor nodes with higher energy are trained with center location of base station to select energy aware cluster heads. Fuzzy rule is used in fuzzy logic part that inputs to form clusters. NFEACS is designed for WSN handling mobility of node. The proposed scheme NFEACS is compared with related clustering schemes, cluster-head election mechanism using fuzzy logic, and energy aware fuzzy unequal clustering. The experiment results show that NFEACS performs better than the other related schemes. PMID:26881269
A 1400-MHz survey of 1478 Abell clusters of galaxies

NASA Technical Reports Server (NTRS)

Owen, F. N.; White, R. A.; Hilldrup, K. C.; Hanisch, R. J.

1982-01-01

Observations of 1478 Abell clusters of galaxies with the NRAO 91-m telescope at 1400 MHz are reported. The measured beam shape was deconvolved from the measured source Gaussian fits in order to estimate the source size and position angle. All detected sources within 0.5 corrected Abell cluster radii are listed, including the cluster number, richness class, distance class, magnitude of the tenth brightest galaxy, redshift estimate, corrected cluster radius in arcmin, right ascension and error, declination and error, total flux density and error, and angular structure for each source.
Improvements in prevalence trend fitting and incidence estimation in EPP 2013

PubMed Central

Brown, Tim; Bao, Le; Eaton, Jeffrey W.; Hogan, Daniel R.; Mahy, Mary; Marsh, Kimberly; Mathers, Bradley M.; Puckett, Robert

2014-01-01

Objective: Describe modifications to the latest version of the Joint United Nations Programme on AIDS (UNAIDS) Estimation and Projection Package component of Spectrum (EPP 2013) to improve prevalence fitting and incidence trend estimation in national epidemics and global estimates of HIV burden. Methods: Key changes made under the guidance of the UNAIDS Reference Group on Estimates, Modelling and Projections include: availability of a range of incidence calculation models and guidance for selecting a model; a shift to reporting the Bayesian median instead of the maximum likelihood estimate; procedures for comparison and validation against reported HIV and AIDS data; incorporation of national surveys as an integral part of the fitting and calibration procedure, allowing survey trends to inform the fit; improved antenatal clinic calibration procedures in countries without surveys; adjustment of national antiretroviral therapy reports used in the fitting to include only those aged 15–49 years; better estimates of mortality among people who inject drugs; and enhancements to speed fitting. Results: The revised models in EPP 2013 allow closer fits to observed prevalence trend data and reflect improving understanding of HIV epidemics and associated data. Conclusion: Spectrum and EPP continue to adapt to make better use of the existing data sources, incorporate new sources of information in their fitting and validation procedures, and correct for quantifiable biases in inputs as they are identified and understood. These adaptations provide countries with better calibrated estimates of incidence and prevalence, which increase epidemic understanding and provide a solid base for program and policy planning. PMID:25406747
OPEN CLUSTERS AS PROBES OF THE GALACTIC MAGNETIC FIELD. I. CLUSTER PROPERTIES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoq, Sadia; Clemens, D. P., E-mail: shoq@bu.edu, E-mail: clemens@bu.edu

2015-10-15

Stars in open clusters are powerful probes of the intervening Galactic magnetic field via background starlight polarimetry because they provide constraints on the magnetic field distances. We use 2MASS photometric data for a sample of 31 clusters in the outer Galaxy for which near-IR polarimetric data were obtained to determine the cluster distances, ages, and reddenings via fitting theoretical isochrones to cluster color–magnitude diagrams. The fitting approach uses an objective χ{sup 2} minimization technique to derive the cluster properties and their uncertainties. We found the ages, distances, and reddenings for 24 of the clusters, and the distances and reddenings formore » 6 additional clusters that were either sparse or faint in the near-IR. The derived ranges of log(age), distance, and E(B−V) were 7.25–9.63, ∼670–6160 pc, and 0.02–1.46 mag, respectively. The distance uncertainties ranged from ∼8% to 20%. The derived parameters were compared to previous studies, and most cluster parameters agree within our uncertainties. To test the accuracy of the fitting technique, synthetic clusters with 50, 100, or 200 cluster members and a wide range of ages were fit. These tests recovered the input parameters within their uncertainties for more than 90% of the individual synthetic cluster parameters. These results indicate that the fitting technique likely provides reliable estimates of cluster properties. The distances derived will be used in an upcoming study of the Galactic magnetic field in the outer Galaxy.« less
A random cluster survey and a convenience sample give comparable estimates of immunity to vaccine preventable diseases in children of school age in Victoria, Australia.

PubMed

Kelly, Heath; Riddell, Michaela A; Gidding, Heather F; Nolan, Terry; Gilbert, Gwendolyn L

2002-08-19

We compared estimates of the age-specific population immunity to measles, mumps, rubella, hepatitis B and varicella zoster viruses in Victorian school children obtained by a national sero-survey, using a convenience sample of residual sera from diagnostic laboratories throughout Australia, with those from a three-stage random cluster survey. When grouped according to school age (primary or secondary school) there was no significant difference in the estimates of immunity to measles, mumps, hepatitis B or varicella. Compared with the convenience sample, the random cluster survey estimated higher immunity to rubella in samples from both primary (98.7% versus 93.6%, P = 0.002) and secondary school students (98.4% versus 93.2%, P = 0.03). Despite some limitations, this study suggests that the collection of a convenience sample of sera from diagnostic laboratories is an appropriate sampling strategy to provide population immunity data that will inform Australia's current and future immunisation policies. Copyright 2002 Elsevier Science Ltd.
Improving stochastic estimates with inference methods: calculating matrix diagonals.

PubMed

Selig, Marco; Oppermann, Niels; Ensslin, Torsten A

2012-02-01

Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method. © 2012 American Physical Society
Estimating Function Approaches for Spatial Point Processes

NASA Astrophysics Data System (ADS)

Deng, Chong

Spatial point pattern data consist of locations of events that are often of interest in biological and ecological studies. Such data are commonly viewed as a realization from a stochastic process called spatial point process. To fit a parametric spatial point process model to such data, likelihood-based methods have been widely studied. However, while maximum likelihood estimation is often too computationally intensive for Cox and cluster processes, pairwise likelihood methods such as composite likelihood, Palm likelihood usually suffer from the loss of information due to the ignorance of correlation among pairs. For many types of correlated data other than spatial point processes, when likelihood-based approaches are not desirable, estimating functions have been widely used for model fitting. In this dissertation, we explore the estimating function approaches for fitting spatial point process models. These approaches, which are based on the asymptotic optimal estimating function theories, can be used to incorporate the correlation among data and yield more efficient estimators. We conducted a series of studies to demonstrate that these estmating function approaches are good alternatives to balance the trade-off between computation complexity and estimating efficiency. First, we propose a new estimating procedure that improves the efficiency of pairwise composite likelihood method in estimating clustering parameters. Our approach combines estimating functions derived from pairwise composite likeli-hood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate its efficacy through a simulation study and an application to the longleaf pine data. Second, we further explore the quasi-likelihood approach on fitting
Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.

PubMed

Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra

2016-11-20

The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Structural parameters of young star clusters: fractal analysis

NASA Astrophysics Data System (ADS)

Hetem, A.

2017-07-01

A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.
Factors influencing the quality of life of haemodialysis patients according to symptom cluster.

PubMed

Shim, Hye Yeung; Cho, Mi-Kyoung

2018-05-01

To identify the characteristics in each symptom cluster and factors influencing the quality of life of haemodialysis patients in Korea according to cluster. Despite developments in renal replacement therapy, haemodialysis still restricts the activities of daily living due to pain and impairs physical functioning induced by the disease and its complications. Descriptive survey. Two hundred and thirty dialysis patients aged >18 years. They completed self-administered questionnaires of Dialysis Symptom Index and Kidney Disease Quality of Life instrument-Short Form 1.3. To determine the optimal number of clusters, the collected data were analysed using polytomous variable latent class analysis in R software (poLCA) to estimate the latent class models and the latent class regression models for polytomous outcome variables. Differences in characteristics, symptoms and QOL according to the symptom cluster of haemodialysis patients were analysed using the independent t test and chi-square test. The factors influencing the QOL according to symptom cluster were identified using hierarchical multiple regression analysis. Physical and emotional symptoms were significantly more severe, and the QOL was significantly worse in Cluster 1 than in Cluster 2. The factors influencing the QOL were spouse, job, insurance type and physical and emotional symptoms in Cluster 1, with these variables having an explanatory power of 60.9%. Physical and emotional symptoms were the only influencing factors in Cluster 2, and they had an explanatory power of 37.4%. Mitigating the symptoms experienced by haemodialysis patients and improving their QOL require educational and therapeutic symptom management interventions that are tailored according to the characteristics and symptoms in each cluster. The findings of this study are expected to lead to practical guidelines for addressing the symptoms experienced by haemodialysis patients, and they provide basic information for developing nursing
Morphological estimators on Sunyaev-Zel'dovich maps of MUSIC clusters of galaxies

NASA Astrophysics Data System (ADS)

Cialone, Giammarco; De Petris, Marco; Sembolini, Federico; Yepes, Gustavo; Baldi, Anna Silvia; Rasia, Elena

2018-06-01

The determination of the morphology of galaxy clusters has important repercussions for cosmological and astrophysical studies of them. In this paper, we address the morphological characterization of synthetic maps of the Sunyaev-Zel'dovich (SZ) effect for a sample of 258 massive clusters (Mvir > 5 × 1014 h-1 M⊙ at z = 0), extracted from the MUSIC hydrodynamical simulations. Specifically, we use five known morphological parameters (which are already used in X-ray) and two newly introduced ones, and we combine them in a single parameter. We analyse two sets of simulations obtained with different prescriptions of the gas physics (non-radiative and with cooling, star formation and stellar feedback) at four red shifts between 0.43 and 0.82. For each parameter, we test its stability and efficiency in discriminating the true cluster dynamical state, measured by theoretical indicators. The combined parameter is more efficient at discriminating between relaxed and disturbed clusters. This parameter had a mild correlation with the hydrostatic mass (˜0.3) and a strong correlation (˜0.8) with the offset between the SZ centroid and the cluster centre of mass. The latter quantity is, thus, the most accessible and efficient indicator of the dynamical state for SZ studies.
A bias correction for covariance estimators to improve inference with generalized estimating equations that use an unstructured correlation matrix.

PubMed

Westgate, Philip M

2013-07-20

Generalized estimating equations (GEEs) are routinely used for the marginal analysis of correlated data. The efficiency of GEE depends on how closely the working covariance structure resembles the true structure, and therefore accurate modeling of the working correlation of the data is important. A popular approach is the use of an unstructured working correlation matrix, as it is not as restrictive as simpler structures such as exchangeable and AR-1 and thus can theoretically improve efficiency. However, because of the potential for having to estimate a large number of correlation parameters, variances of regression parameter estimates can be larger than theoretically expected when utilizing the unstructured working correlation matrix. Therefore, standard error estimates can be negatively biased. To account for this additional finite-sample variability, we derive a bias correction that can be applied to typical estimators of the covariance matrix of parameter estimates. Via simulation and in application to a longitudinal study, we show that our proposed correction improves standard error estimation and statistical inference. Copyright © 2012 John Wiley & Sons, Ltd.
Improved Event Location Uncertainty Estimates

DTIC Science & Technology

2006-09-21

validation purposes, we use GT0-2 event clusters. These include the Nevada Lop Nor, Semipalatinsk , and Novaya Zemlys test sites , as well as the Azgir...uncertainties. Furthermore, the tails of real seismic data distributions are heavier than Gaussian. The main objectives of this project are to develop, test
Applications of cluster analysis to satellite soundings

NASA Technical Reports Server (NTRS)

Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.

1984-01-01

The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.
IMPROVING EMISSIONS ESTIMATES WITH COMPUTATIONAL INTELLIGENCE, DATABASE EXPANSION, AND COMPREHENSIVE VALIDATION

EPA Science Inventory

The report discusses an EPA investigation of techniques to improve methods for estimating volatile organic compound (VOC) emissions from area sources. Using the automobile refinishing industry for a detailed area source case study, an emission estimation method is being developed...
Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

PubMed

Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

2015-02-01

Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage

Fuzzy Subspace Clustering

NASA Astrophysics Data System (ADS)

Borgelt, Christian

In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).
Utilizing Hierarchical Clustering to improve Efficiency of Self-Organizing Feature Map to Identify Hydrological Homogeneous Regions

NASA Astrophysics Data System (ADS)

Farsadnia, Farhad; Ghahreman, Bijan

2016-04-01

Hydrologic homogeneous group identification is considered both fundamental and applied research in hydrology. Clustering methods are among conventional methods to assess the hydrological homogeneous regions. Recently, Self-Organizing feature Map (SOM) method has been applied in some studies. However, the main problem of this method is the interpretation on the output map of this approach. Therefore, SOM is used as input to other clustering algorithms. The aim of this study is to apply a two-level Self-Organizing feature map and Ward hierarchical clustering method to determine the hydrologic homogenous regions in North and Razavi Khorasan provinces. At first by principal component analysis, we reduced SOM input matrix dimension, then the SOM was used to form a two-dimensional features map. To determine homogeneous regions for flood frequency analysis, SOM output nodes were used as input into the Ward method. Generally, the regions identified by the clustering algorithms are not statistically homogeneous. Consequently, they have to be adjusted to improve their homogeneity. After adjustment of the homogeneity regions by L-moment tests, five hydrologic homogeneous regions were identified. Finally, adjusted regions were created by a two-level SOM and then the best regional distribution function and associated parameters were selected by the L-moment approach. The results showed that the combination of self-organizing maps and Ward hierarchical clustering by principal components as input is more effective than the hierarchical method, by principal components or standardized inputs to achieve hydrologic homogeneous regions.
An Improved BeiDou-2 Satellite-Induced Code Bias Estimation Method.

PubMed

Fu, Jingyang; Li, Guangyun; Wang, Li

2018-04-27

Different from GPS, GLONASS, GALILEO and BeiDou-3, it is confirmed that the code multipath bias (CMB), which originate from the satellite end and can be over 1 m, are commonly found in the code observations of BeiDou-2 (BDS) IGSO and MEO satellites. In order to mitigate their adverse effects on absolute precise applications which use the code measurements, we propose in this paper an improved correction model to estimate the CMB. Different from the traditional model which considering the correction values are orbit-type dependent (estimating two sets of values for IGSO and MEO, respectively) and modeling the CMB as a piecewise linear function with a elevation node separation of 10°, we estimate the corrections for each BDS IGSO + MEO satellite on one hand, and a denser elevation node separation of 5° is used to model the CMB variations on the other hand. Currently, the institutions such as IGS-MGEX operate over 120 stations which providing the daily BDS observations. These large amounts of data provide adequate support to refine the CMB estimation satellite by satellite in our improved model. One month BDS observations from MGEX are used for assessing the performance of the improved CMB model by means of precise point positioning (PPP). Experimental results show that for the satellites on the same orbit type, obvious differences can be found in the CMB at the same node and frequency. Results show that the new correction model can improve the wide-lane (WL) ambiguity usage rate for WL fractional cycle bias estimation, shorten the WL and narrow-lane (NL) time to first fix (TTFF) in PPP ambiguity resolution (AR) as well as improve the PPP positioning accuracy. With our improved correction model, the usage of WL ambiguity is increased from 94.1% to 96.0%, the WL and NL TTFF of PPP AR is shorten from 10.6 to 9.3 min, 67.9 to 63.3 min, respectively, compared with the traditional correction model. In addition, both the traditional and improved CMB model have a
An Improved BeiDou-2 Satellite-Induced Code Bias Estimation Method

PubMed Central

Fu, Jingyang; Li, Guangyun; Wang, Li

2018-01-01

Different from GPS, GLONASS, GALILEO and BeiDou-3, it is confirmed that the code multipath bias (CMB), which originate from the satellite end and can be over 1 m, are commonly found in the code observations of BeiDou-2 (BDS) IGSO and MEO satellites. In order to mitigate their adverse effects on absolute precise applications which use the code measurements, we propose in this paper an improved correction model to estimate the CMB. Different from the traditional model which considering the correction values are orbit-type dependent (estimating two sets of values for IGSO and MEO, respectively) and modeling the CMB as a piecewise linear function with a elevation node separation of 10°, we estimate the corrections for each BDS IGSO + MEO satellite on one hand, and a denser elevation node separation of 5° is used to model the CMB variations on the other hand. Currently, the institutions such as IGS-MGEX operate over 120 stations which providing the daily BDS observations. These large amounts of data provide adequate support to refine the CMB estimation satellite by satellite in our improved model. One month BDS observations from MGEX are used for assessing the performance of the improved CMB model by means of precise point positioning (PPP). Experimental results show that for the satellites on the same orbit type, obvious differences can be found in the CMB at the same node and frequency. Results show that the new correction model can improve the wide-lane (WL) ambiguity usage rate for WL fractional cycle bias estimation, shorten the WL and narrow-lane (NL) time to first fix (TTFF) in PPP ambiguity resolution (AR) as well as improve the PPP positioning accuracy. With our improved correction model, the usage of WL ambiguity is increased from 94.1% to 96.0%, the WL and NL TTFF of PPP AR is shorten from 10.6 to 9.3 min, 67.9 to 63.3 min, respectively, compared with the traditional correction model. In addition, both the traditional and improved CMB model have a better
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

PubMed

Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

2016-12-01

In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Effectiveness of a community-based nutrition programme to improve child growth in rural Ethiopia: a cluster randomized trial.

PubMed

Kang, Yunhee; Kim, Sungtae; Sinamo, Sisay; Christian, Parul

2017-01-01

Few trials have shown that promoting complementary feeding among young children is effective in improving child linear growth in resource-challenged settings. We designed a community-based participatory nutrition promotion (CPNP) programme adapting a Positive Deviance/Hearth approach that engaged mothers in 2-week nutrition sessions using the principles of 'learning by doing' around child feeding. We aimed to test the effectiveness of the CPNP for improving child growth in rural Ethiopia. A cluster randomized trial was implemented by adding the CPNP to the existing government nutrition programmes (six clusters) vs. government programmes only (six clusters). A total of 1790 children aged 6 to 12 months (876 in the intervention and 914 in the control areas) were enrolled and assessed on anthropometry every 3 months for a year. Multi-level mixed-effect regression analysis of longitudinal outcome data (n = 1475) examined the programme impact on growth, adjusting for clustering and enrollment characteristics. Compared with children 6 to 24 months of age in the control area, those in the intervention area had a greater increase in z scores for length-for-age [difference (diff): 0.021 z score/month, 95% CI: 0.008, 0.034] and weight-for-length (diff: 0.042 z score/month, 95% CI: 0.024, 0.059). At the end of the 12-month follow-up, children in the intervention area showed an 8.1% (P = 0.02) and 6.3% (P = 0.046) lower prevalence of stunting and underweight, respectively, after controlling for differences in the prevalence at enrollment, compared with the control group. A novel CPNP programme was effective in improving child growth and reducing undernutrition in this setting. © 2016 John Wiley & Sons Ltd. © 2016 John Wiley & Sons Ltd.
Measuring the scatter in the cluster optical richness-mass relation with machine learning

NASA Astrophysics Data System (ADS)

Boada, Steven Alvaro

The distribution of massive clusters of galaxies depends strongly on the total cosmic mass density, the mass variance, and the dark energy equation of state. As such, measures of galaxy clusters can provide constraints on these parameters and even test models of gravity, but only if observations of clusters can lead to accurate estimates of their total masses. Here, we carry out a study to investigate the ability of a blind spectroscopic survey to recover accurate galaxy cluster masses through their line-of- sight velocity dispersions (LOSVD) using probability based and machine learning methods. We focus on the Hobby Eberly Telescope Dark Energy Experiment (HETDEX), which will employ new Visible Integral-Field Replicable Unit Spectrographs (VIRUS), over 420 degree2 on the sky with a 1/4.5 fill factor. VIRUS covers the blue/optical portion of the spectrum (3500 - 5500 A), allowing surveys to measure redshifts for a large sample of galaxies out to z < 0.5 based on their absorption or emission (e.g., [O II], Mg II, Ne V) features. We use a detailed mock galaxy catalog from a semi-analytic model to simulate surveys observed with VIRUS, including: (1) Survey, a blind, HETDEX-like survey with an incomplete but uniform spectroscopic selection function; and (2) Targeted, a survey which targets clusters directly, obtaining spectra of all galaxies in a VIRUS-sized field. For both surveys, we include realistic uncertainties from galaxy magnitude and line-flux limits. We benchmark both surveys against spectroscopic observations with perfect" knowledge of galaxy line-of-sight velocities. With Survey observations, we can recover cluster masses to ˜ 0.1 dex which can be further improved to < 0.1 dex with Targeted observations. This level of cluster mass recovery provides important measurements of the intrinsic scatter in the optical richness-cluster mass relation, and enables constraints on the key cosmological parameter, sigma 8, to < 20%. As a demonstration of the methods
Weighing the Giants - I. Weak-lensing masses for 51 massive galaxy clusters: project overview, data analysis methods and cluster images

NASA Astrophysics Data System (ADS)

von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

2014-03-01

This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic
Color-magnitude diagrams for six metal-rich, low-latitude globular clusters

NASA Technical Reports Server (NTRS)

Armandroff, Taft E.

1988-01-01

Colors and magnitudes for stars on CCD frames for six metal-rich, low-latitude, previously unstudied globular clusters and one well-studied, metal-rich cluster (47 Tuc) have been derived and color-magnitude diagrams have been constructed. The photometry for stars in 47 Tuc are in good agreement with previous studies, while the V magnitudes of the horizontal-branch stars in the six program clusters do not agree with estimates based on secondary methods. The distances to these clusters are different from prior estimates. Redding values are derived for each program cluster. The horizontal branches of the program clusters all appear to lie entirely redwards of the red edge of the instability strip, as is normal for their metallicities.
Improving Empirical Approaches to Estimating Local Greenhouse Gas Emissions

NASA Astrophysics Data System (ADS)

Blackhurst, M.; Azevedo, I. L.; Lattanzi, A.

2016-12-01

Evidence increasingly indicates our changing climate will have significant global impacts on public health, economies, and ecosystems. As a result, local governments have become increasingly interested in climate change mitigation. In the U.S., cities and counties representing nearly 15% of the domestic population plan to reduce 300 million metric tons of greenhouse gases over the next 40 years (or approximately 1 ton per capita). Local governments estimate greenhouse gas emissions to establish greenhouse gas mitigation goals and select supporting mitigation measures. However, current practices produce greenhouse gas estimates - also known as a "greenhouse gas inventory " - of empirical quality often insufficient for robust mitigation decision making. Namely, current mitigation planning uses sporadic, annual, and deterministic estimates disaggregated by broad end use sector, obscuring sources of emissions uncertainty, variability, and exogeneity that influence mitigation opportunities. As part of AGU's Thriving Earth Exchange, Ari Lattanzi of City of Pittsburgh, PA recently partnered with Dr. Inez Lima Azevedo (Carnegie Mellon University) and Dr. Michael Blackhurst (University of Pittsburgh) to improve the empirical approach to characterizing Pittsburgh's greenhouse gas emissions. The project will produce first-order estimates of the underlying sources of uncertainty, variability, and exogeneity influencing Pittsburgh's greenhouse gases and discuss implications of mitigation decision making. The results of the project will enable local governments to collect more robust greenhouse gas inventories to better support their mitigation goals and improve measurement and verification efforts.
Stochastic coupled cluster theory: Efficient sampling of the coupled cluster expansion

NASA Astrophysics Data System (ADS)

Scott, Charles J. C.; Thom, Alex J. W.

2017-09-01

We consider the sampling of the coupled cluster expansion within stochastic coupled cluster theory. Observing the limitations of previous approaches due to the inherently non-linear behavior of a coupled cluster wavefunction representation, we propose new approaches based on an intuitive, well-defined condition for sampling weights and on sampling the expansion in cluster operators of different excitation levels. We term these modifications even and truncated selections, respectively. Utilising both approaches demonstrates dramatically improved calculation stability as well as reduced computational and memory costs. These modifications are particularly effective at higher truncation levels owing to the large number of terms within the cluster expansion that can be neglected, as demonstrated by the reduction of the number of terms to be sampled when truncating at triple excitations by 77% and hextuple excitations by 98%.
Using Appendicitis to Improve Estimates of Childhood Medicaid Participation Rates.

PubMed

Silber, Jeffrey H; Zeigler, Ashley E; Reiter, Joseph G; Hochman, Lauren L; Ludwig, Justin M; Wang, Wei; Calhoun, Shawna R; Pati, Susmita

2018-03-23

Administrative data are often used to estimate state Medicaid/Children's Health Insurance Program duration of enrollment and insurance continuity, but they are generally not used to estimate participation (the fraction of eligible children enrolled) because administrative data do not include reasons for disenrollment and cannot observe eligible never-enrolled children, causing estimates of eligible unenrolled to be inaccurate. Analysts are therefore forced to either utilize survey information that is not generally linkable to administrative claims or rely on duration and continuity measures derived from administrative data and forgo estimating claims-based participation. We introduce appendectomy-based participation (ABP) to estimate statewide participation rates using claims by taking advantage of a natural experiment around statewide appendicitis admissions to improve the accuracy of participation rate estimates. We used Medicaid Analytic eXtract (MAX) for 2008-2010; and the American Community Survey for 2008-2010 from 43 states to calculate ABP, continuity ratio, duration, and participation based on the American Community Survey (ACS). In the validation study, median participation rate using ABP was 86% versus 87% for ACS-based participation estimates using logical edits and 84% without logical edits. Correlations between ABP and ACS with or without logical edits was 0.86 (P < .0001). Using regression analysis, ABP alone was a significant predictor of ACS (P < .0001) with or without logical edits, and adding duration and/or the continuity ratio did not significantly improve the model. Using the ABP rate derived from administrative claims (MAX) is a valid method to estimate statewide public insurance participation rates in children. Copyright © 2018 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
Probing the dynamical and X-ray mass proxies of the cluster of galaxies Abell S1101

NASA Astrophysics Data System (ADS)

Rabitz, Andreas; Zhang, Yu-Ying; Schwope, Axel; Verdugo, Miguel; Reiprich, Thomas H.; Klein, Matthias

2017-01-01

Context. The galaxy cluster Abell S1101 (S1101 hereafter) deviates significantly from the X-ray luminosity versus velocity dispersion relation (L-σ) of galaxy clusters in our previous study. Given reliable X-ray luminosity measurement combining XMM-Newton and ROSAT, this could most likely be caused by the bias in the velocity dispersion due to interlopers and low member statistic in the previous sample of member galaxies, which was solely based on 20 galaxy redshifts drawn from the literature. Aims: We intend to increase the galaxy member statistics to perform precision measurements of the velocity dispersion and dynamical mass of S1101. We aim for a detailed substructure and dynamical state characterization of this cluster, and a comparison of mass estimates derived from (I) the velocity dispersion (Mvir), (II) the caustic mass computation (Mcaustic), and (III) mass proxies from X-ray observations and the Sunyaev-Zel'dovich (SZ) effect. Methods: We carried out new optical spectroscopic observations of the galaxies in this cluster field with VIMOS, obtaining a sample of 60 member galaxies for S1101. We revised the cluster redshift and velocity dispersion measurements based on this sample and also applied the Dressler-Shectman substructure test. Results: The completeness of cluster members within r200 was significantly improved for this cluster. Tests for dynamical substructure do not show evidence of major disturbances or merging activities in S1101. We find good agreement between the dynamical cluster mass measurements and X-ray mass estimates, which confirms the relaxed state of the cluster displayed in the 2D substructure test. The SZ mass proxy is slightly higher than the other estimates. The updated measurement of σ erased the deviation of S1101 in the L-σ relation. We also noticed a background structure in the cluster field of S1101. This structure is a galaxy group that is very close to the cluster S1101 in projection but at almost twice its redshift
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

PubMed Central

Wang, Zhihao; Yi, Jing

2016-01-01

For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
X-Ray Binaries and Star Clusters in the Antennae: Optical Cluster Counterparts

NASA Astrophysics Data System (ADS)

Rangelov, Blagoy; Chandar, Rupali; Prestwich, Andrea; Whitmore, Bradley C.

2012-10-01

We compare the locations of 82 X-ray binaries (XRBs) detected in the merging Antennae galaxies by Zezas et al., based on observations taken with the Chandra X-Ray Observatory, with a catalog of optically selected star clusters presented by Whitmore et al., based on observations taken with the Hubble Space Telescope. Within the 2σ positional uncertainty of ≈0farcs8, we find 22 XRBs are coincident with star clusters, where only two to three chance coincidences are expected. The ages of the clusters were estimated by comparing their UBVI, Hα colors with predictions from stellar evolutionary models. We find that 14 of the 22 coincident XRBs (64%) are hosted by star clusters with ages of ≈6 Myr or less. All of the very young host clusters are fairly massive and have M >~ 3 × 104 M ⊙, with many having masses M ≈ 105 M ⊙. Five of the XRBs are hosted by young clusters with ages τ ≈ 10-100 Myr, while three are hosted by intermediate-age clusters with τ ≈ 100-300 Myr. Based on the results from recent N-body simulations, which suggest that black holes are far more likely to be retained within their parent clusters than neutron stars, we suggest that our sample consists primarily of black hole binaries with different ages.
Electrical Load Profile Analysis Using Clustering Techniques

NASA Astrophysics Data System (ADS)

Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.

2017-03-01

Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Improving Performance for Gifted Students in a Cluster Grouping Model

ERIC Educational Resources Information Center

Brulles, Dina; Saunders, Rachel; Cohn, Sanford J.

2010-01-01

Although experts in gifted education widely promote cluster grouping gifted students, little empirical evidence is available to attest to its effectiveness. This study is an example of comparative action research in the form of a quantitative case study that focused on the mandated cluster grouping practices for gifted students in an urban…
Clustering algorithm evaluation and the development of a replacement for procedure 1. [for crop inventories

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Johnson, J. K.

1979-01-01

An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Open source clustering software.

PubMed

de Hoon, M J L; Imoto, S; Nolan, J; Miyano, S

2004-06-12

We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.
Snowpack Estimates Improve Water Resources Climate-Change Adaptation Strategies

NASA Astrophysics Data System (ADS)

Lestak, L.; Molotch, N. P.; Guan, B.; Granger, S. L.; Nemeth, S.; Rizzardo, D.; Gehrke, F.; Franz, K. J.; Karsten, L. R.; Margulis, S. A.; Case, K.; Anderson, M.; Painter, T. H.; Dozier, J.

2010-12-01

Observed climate trends over the past 50 years indicate a reduction in snowpack water storage across the Western U.S. As the primary water source for the region, the loss in snowpack water storage presents significant challenges for managing water deliveries to meet agricultural, municipal, and hydropower demands. Improved snowpack information via remote sensing shows promise for improving seasonal water supply forecasts and for informing decadal scale infrastructure planning. An ongoing project in the California Sierra Nevada and examples from the Rocky Mountains indicate the tractability of estimating snowpack water storage on daily time steps using a distributed snowpack reconstruction model. Fractional snow covered area (FSCA) derived from Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data were used with modeled snowmelt from the snowpack model to estimate snow water equivalent (SWE) in the Sierra Nevada (64,515 km2). Spatially distributed daily SWE estimates were calculated for 10 years, 2000-2009, with detailed analysis for two anamolous years, 2006, a wet year and 2009, an over-forecasted year. Sierra-wide mean SWE was 0.8 cm for 01 April 2006 versus 0.4 cm for 01 April 2009, comparing favorably with known outflow. Modeled SWE was compared to in-situ (observed) SWE for 01 April 2006 for the Feather (northern Sierra, lower-elevation) and Merced (central Sierra, higher-elevation) basins, with mean modeled SWE 80% of observed SWE. Integration of spatial SWE estimates into forecasting operations will allow for better visualization and analysis of high-altitude late-season snow missed by in-situ snow sensors and inter-annual anomalies associated with extreme precipitation events/atmospheric rivers. Collaborations with state and local entities establish protocols on how to meet current and future information needs and improve climate-change adaptation strategies.

Improved estimation of Mars ionosphere total electron content

NASA Astrophysics Data System (ADS)

Cartacci, M.; Sánchez-Cano, B.; Orosei, R.; Noschese, R.; Cicchetti, A.; Witasse, O.; Cantini, F.; Rossi, A. P.

2018-01-01

We describe an improved method to estimate the Total Electron Content (TEC) of the Mars ionosphere from the echoes recorded by the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS) (Picardi et al., 2005; Orosei et al., 2015) onboard Mars Express in its subsurface sounding mode. In particular, we demonstrate that this method solves the issue of the former algorithm described at (Cartacci et al., 2013), which produced an overestimation of TEC estimates on the day side. The MARSIS signal is affected by a phase distortion introduced by the Mars ionosphere that produces a variation of the signal shape and a delay in its travel time. The new TEC estimation is achieved correlating the parameters obtained through the correction of the aforementioned effects. In detail, the knowledge of the quadratic term of the phase distortion estimated by the Contrast Method (Cartacci et al., 2013), together with the linear term (i.e. the extra time delay), estimated through a radar signal simulator, allows to develop a new algorithm particularly well suited to estimate the TEC for solar zenith angles (SZA) lower than 95° The new algorithm for the dayside has been validated with independent data from MARSIS in its Active Ionospheric Sounding (AIS) operational mode, with comparisons with other previous algorithms based on MARSIS subsurface data, with modeling and with modeling ionospheric distortion TEC reconstruction.
Lagrangian analysis by clustering. An example in the Nordic Seas.

NASA Astrophysics Data System (ADS)

Koszalka, Inga; Lacasce, Joseph H.

2010-05-01

We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in uniform geographical bins, as is commonly done, we group a specified number of nearest-neighbor velocities. This is done via a clustering algorithm operating on the instantaneous positions of the drifters. Thus it is the data distribution itself which determines the positions of the averages and the areal extent of the clusters. A major advantage is that because the number of members is essentially the same for all clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter is an accurate representation of the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities (both of which are known from the stochastic model). We also compare the results to those obtained with fixed geographical bins. Clustering is more successful at capturing spatial variability of the mean flow and also improves convergence in the eddy diffusivity estimates. We discuss both the future prospects and shortcomings of the new method.
Low-end mass function of the Quintuplet cluster

NASA Astrophysics Data System (ADS)

Shin, Jihye; Kim, Sungsoo S.

2016-08-01

The Quintuplet and Arches clusters, which were formed in the harsh environment of the Galactic Centre (GC) a few million years ago, have been excellent targets for studying the effects of a star-forming environment on the initial mass function (IMF). In order to estimate the shape of the low-end IMF of the Arches cluster, Shin & Kim devised a novel photometric method that utilizes pixel intensity histograms (PIHs) of the observed images. Here, we apply the PIH method to the Quintuplet cluster and estimate the shape of its low-end IMF below the magnitude of completeness limit as set by conventional photometry. We found that the low-end IMF of the Quintuplet is consistent with that found for the Arches cluster-Kroupa MF, with a significant number of low-mass stars below 1 M⊙. We conclude that the most likely IMFs of the Quintuplet and the Arches clusters are not too different from the IMFs found in the Galactic disc. We also find that the observed PIHs and stellar number density profiles of both clusters are best reproduced when the clusters are assumed to be at three-dimensional distances of approximately 100 pc from the GC.
Effect of study design and setting on tuberculosis clustering estimates using Mycobacterial Interspersed Repetitive Units-Variable Number Tandem Repeats (MIRU-VNTR): a systematic review.

PubMed

Mears, Jessica; Abubakar, Ibrahim; Cohen, Theodore; McHugh, Timothy D; Sonnenberg, Pam

2015-01-21

To systematically review the evidence for the impact of study design and setting on the interpretation of tuberculosis (TB) transmission using clustering derived from Mycobacterial Interspersed Repetitive Units-Variable Number Tandem Repeats (MIRU-VNTR) strain typing. MEDLINE, EMBASE, CINHAL, Web of Science and Scopus were searched for articles published before 21st October 2014. Studies in humans that reported the proportion of clustering of TB isolates by MIRU-VNTR were included in the analysis. Univariable meta-regression analyses were conducted to assess the influence of study design and setting on the proportion of clustering. The search identified 27 eligible articles reporting clustering between 0% and 63%. The number of MIRU-VNTR loci typed, requiring consent to type patient isolates (as a proxy for sampling fraction), the TB incidence and the maximum cluster size explained 14%, 14%, 27% and 48% of between-study variation, respectively, and had a significant association with the proportion of clustering. Although MIRU-VNTR typing is being adopted worldwide there is a paucity of data on how study design and setting may influence estimates of clustering. We have highlighted study design variables for consideration in the design and interpretation of future studies. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Does Ocean Color Data Assimilation Improve Estimates of Global Ocean Inorganic Carbon?

NASA Technical Reports Server (NTRS)

Gregg, Watson

2012-01-01

Ocean color data assimilation has been shown to dramatically improve chlorophyll abundances and distributions globally and regionally in the oceans. Chlorophyll is a proxy for phytoplankton biomass (which is explicitly defined in a model), and is related to the inorganic carbon cycle through the interactions of the organic carbon (particulate and dissolved) and through primary production where inorganic carbon is directly taken out of the system. Does ocean color data assimilation, whose effects on estimates of chlorophyll are demonstrable, trickle through the simulated ocean carbon system to produce improved estimates of inorganic carbon? Our emphasis here is dissolved inorganic carbon, pC02, and the air-sea flux. We use a sequential data assimilation method that assimilates chlorophyll directly and indirectly changes nutrient concentrations in a multi-variate approach. The results are decidedly mixed. Dissolved organic carbon estimates from the assimilation model are not meaningfully different from free-run, or unassimilated results, and comparisons with in situ data are similar. pC02 estimates are generally worse after data assimilation, with global estimates diverging 6.4% from in situ data, while free-run estimates are only 4.7% higher. Basin correlations are, however, slightly improved: r increase from 0.78 to 0.79, and slope closer to unity at 0.94 compared to 0.86. In contrast, air-sea flux of C02 is noticeably improved after data assimilation. Global differences decline from -0.635 mol/m2/y (stronger model sink from the atmosphere) to -0.202 mol/m2/y. Basin correlations are slightly improved from r=O.77 to r=0.78, with slope closer to unity (from 0.93 to 0.99). The Equatorial Atlantic appears as a slight sink in the free-run, but is correctly represented as a moderate source in the assimilation model. However, the assimilation model shows the Antarctic to be a source, rather than a modest sink and the North Indian basin is represented incorrectly as a sink
Cosmology from galaxy clusters as observed by Planck

NASA Astrophysics Data System (ADS)

Pierpaoli, Elena

We propose to use current all-sky data on galaxy clusters in the radio/infrared bands in order to constrain cosmology. This will be achieved performing parameter estimation with number counts and power spectra for galaxy clusters detected by Planck through their Sunyaev—Zeldovich signature. The ultimate goal of this proposal is to use clusters as tracers of matter density in order to provide information about fundamental properties of our Universe, such as the law of gravity on large scale, early Universe phenomena, structure formation and the nature of dark matter and dark energy. We will leverage on the availability of a larger and deeper cluster catalog from the latest Planck data release in order to include, for the first time, the cluster power spectrum in the cosmological parameter determination analysis. Furthermore, we will extend clusters' analysis to cosmological models not yet investigated by the Planck collaboration. These aims require a diverse set of activities, ranging from the characterization of the clusters' selection function, the choice of the cosmological cluster sample to be used for parameter estimation, the construction of mock samples in the various cosmological models with correct correlation properties in order to produce reliable selection functions and noise covariance matrices, and finally the construction of the appropriate likelihood for number counts and power spectra. We plan to make the final code available to the community and compatible with the most widely used cosmological parameter estimation code. This research makes use of data from the NASA satellites Planck and, less directly, Chandra, in order to constrain cosmology; and therefore perfectly fits the NASA objectives and the specifications of this solicitation.
Cool Core Bias in Sunyaev-Zel’dovich Galaxy Cluster Surveys

DOE PAGES

Lin, Henry W.; McDonald, Michael; Benson, Bradford; ...

2015-03-18

Sunyaev-Zeldovich (SZ) surveys find massive clusters of galaxies by measuring the inverse Compton scattering of cosmic microwave background off of intra-cluster gas. The cluster selection function from such surveys is expected to be nearly independent of redshift and cluster astrophysics. In this work, we estimate the effect on the observed SZ signal of centrally-peaked gas density profiles (cool cores) and radio emission from the brightest cluster galaxy (BCG) by creating mock observations of a sample of clusters that span the observed range of classical cooling rates and radio luminosities. For each cluster, we make simulated SZ observations by the Southmore » Pole Telescope and characterize the cluster selection function, but note that our results are broadly applicable to other SZ surveys. We find that the inclusion of a cool core can cause a change in the measured SPT significance of a cluster between 0.01%–10% at z > 0.3, increasing with cuspiness of the cool core and angular size on the sky of the cluster (i.e., decreasing redshift, increasing mass). We provide quantitative estimates of the bias in the SZ signal as a function of a gas density cuspiness parameter, redshift, mass, and the 1.4 GHz radio luminosity of the central AGN. Based on this work, we estimate that, for the Phoenix cluster (one of the strongest cool cores known), the presence of a cool core is biasing the SZ significance high by ~6%. The ubiquity of radio galaxies at the centers of cool core clusters will offset the cool core bias to varying degrees« less
Assessment of cluster yield components by image analysis.

PubMed

Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

2015-04-01

Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT.

PubMed

Carlis, John; Bruso, Kelsey

2012-03-01

Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n(2)) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing.
RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT

PubMed Central

Bruso, Kelsey

2012-01-01

Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n2) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing. PMID:22773923
Data-Driven Packet Loss Estimation for Node Healthy Sensing in Decentralized Cluster.

PubMed

Fan, Hangyu; Wang, Huandong; Li, Yong

2018-01-23

Decentralized clustering of modern information technology is widely adopted in various fields these years. One of the main reason is the features of high availability and the failure-tolerance which can prevent the entire system form broking down by a failure of a single point. Recently, toolkits such as Akka are used by the public commonly to easily build such kind of cluster. However, clusters of such kind that use Gossip as their membership managing protocol and use link failure detecting mechanism to detect link failures cannot deal with the scenario that a node stochastically drops packets and corrupts the member status of the cluster. In this paper, we formulate the problem to be evaluating the link quality and finding a max clique (NP-Complete) in the connectivity graph. We then proposed an algorithm that consists of two models driven by data from application layer to respectively solving these two problems. Through simulations with statistical data and a real-world product, we demonstrate that our algorithm has a good performance.
Data-Driven Packet Loss Estimation for Node Healthy Sensing in Decentralized Cluster

PubMed Central

Fan, Hangyu; Wang, Huandong; Li, Yong

2018-01-01

Decentralized clustering of modern information technology is widely adopted in various fields these years. One of the main reason is the features of high availability and the failure-tolerance which can prevent the entire system form broking down by a failure of a single point. Recently, toolkits such as Akka are used by the public commonly to easily build such kind of cluster. However, clusters of such kind that use Gossip as their membership managing protocol and use link failure detecting mechanism to detect link failures cannot deal with the scenario that a node stochastically drops packets and corrupts the member status of the cluster. In this paper, we formulate the problem to be evaluating the link quality and finding a max clique (NP-Complete) in the connectivity graph. We then proposed an algorithm that consists of two models driven by data from application layer to respectively solving these two problems. Through simulations with statistical data and a real-world product, we demonstrate that our algorithm has a good performance. PMID:29360792
Structural study of gold clusters.

PubMed

Xiao, Li; Tollberg, Bethany; Hu, Xiankui; Wang, Lichang

2006-03-21

Density functional theory (DFT) calculations were carried out to study gold clusters of up to 55 atoms. Between the linear and zigzag monoatomic Au nanowires, the zigzag nanowires were found to be more stable. Furthermore, the linear Au nanowires of up to 2 nm are formed by slightly stretched Au dimers. These suggest that a substantial Peierls distortion exists in those structures. Planar geometries of Au clusters were found to be the global minima till the cluster size of 13. A quantitative correlation is provided between various properties of Au clusters and the structure and size. The relative stability of selected clusters was also estimated by the Sutton-Chen potential, and the result disagrees with that obtained from the DFT calculations. This suggests that a modification of the Sutton-Chen potential has to be made, such as obtaining new parameters, in order to use it to search the global minima for bigger Au clusters.
The Role of Satellite Imagery to Improve Pastureland Estimates in South America

NASA Astrophysics Data System (ADS)

Graesser, J.

2015-12-01

Agriculture has changed substantially across the globe over the past half century. While much work has been done to improve spatial-temporal estimates of agricultural changes, we still know more about the extent of row-crop agriculture than livestock-grazed land. The gap between cropland and pastureland estimates exists largely because it is challenging to characterize natural versus grazed grasslands from a remote sensing perspective. However, the impasse of pastureland estimates is set to break, with an increasing number of spaceborne sensors and freely available satellite data. The Landsat satellite archive in particular provides researchers with immense amounts of data to improve pastureland information. Here we focus on South America, where pastureland expansion has been scrutinized for the past few decades. We explore the challenges of estimating pastureland using temporal Landsat imagery and focus on key agricultural countries, regions, and ecosystems. We focus on the suggested shift of pastureland from the Argentine Pampas to northern Argentina, and the mixing of small-scale and large-scale ranching in eastern Paraguay and how it could impact the Chaco forest to the west. Further, the Beni Savannahs of northern Bolivia and the Colombian Llanos—both grassland and savannah regions historically used for livestock grazing—have been hinted at as future areas for cropland expansion. There are certainly environmental concerns with pastureland expansion into forests; but what are the environmental implications when well-managed pasture systems are converted to intensive soybean or palm oil plantation? Tropical, grazed grasslands are important habitats for biodiversity, and pasturelands can mitigate soil erosion when well managed. Thus, we must improve estimates of grazed land before we can make informed policy and conservation decisions. This talk presents insights into pastureland estimates in South America and discusses the feasibility to improve current
Toward improving the Laplacian estimation with novel multipolar concentric ring electrodes.

PubMed

Makeyev, Oleksandr; Ding, Quan; Kay, Steven M; Besio, Walter G

2013-01-01

Conventional electroencephalography with disc electrodes has major drawbacks including poor spatial resolution, selectivity and low signal-to-noise ratio that are critically limiting its use. Concentric ring electrodes are a promising alternative with potential to improve all of the aforementioned aspects significantly. In our previous work, the tripolar concentric ring electrode was successfully used in a wide range of applications demonstrating its superiority to conventional disc electrode, in particular, in accuracy of Laplacian estimation. This paper takes the first fundamental step toward further improving the Laplacian estimation of the novel multipolar concentric ring electrodes by proposing a general approach to estimation of the Laplacian for an (n + 1)-polar electrode with n rings using the (4n + 1)-point method for n ≥ 2 that allows cancellation of all the truncation terms up to the order of 2n. Examples of using the proposed approach to estimate the Laplacian for the cases of tripolar and, for the first time, quadripolar concentric ring electrode are presented.
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

PubMed

Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

2018-07-01

Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
The Amish furniture cluster in Ohio: competitive factors and wood use estimates

Treesearch

Matthew Bumgardner; Robert Romig; William Luppold

2008-01-01

This paper is an assessment of wood use by the Amish furniture cluster located in northeastern Ohio. The paper also highlights the competitive and demographic factors that have enabled cluster growth and new business formation in a time of declining market share for the overall U.S. furniture industry. Several secondary information sources and discussions with local...
Improved Estimation of Orbits and Physical Properties of Objects in GEO

NASA Astrophysics Data System (ADS)

Bradley, B.; Axelrad, P.

2013-09-01

Orbital debris is a major concern for satellite operators, both commercial and military. Debris in the geosynchronous (GEO) belt is of particular concern because this unique region is such a valuable, limited resource, and, from the ground we cannot reliably track and characterize GEO objects smaller than 1 meter in diameter. Space-based space surveillance (SBSS) is required to observe GEO objects without weather restriction and with improved viewing geometry. SBSS satellites have thus far been placed in Sun-synchronous orbits. This paper investigates the benefits to GEO orbit determination (including the estimation of mass, area, and shape) that arises from placing observing satellites in geosynchronous transfer orbit (GTO) and a sub-GEO orbit. Recently, several papers have reported on simulation studies to estimate orbits and physical properties; however, these studies use simulated objects and ground-based measurements, often with dense and long data arcs. While this type of simulation provides valuable insight into what is possible, as far as state estimation goes, it is not a very realistic observing scenario and thus may not yield meaningful accuracies. Our research improves upon simulations published to date by utilizing publicly available ephemerides for the WAAS satellites (Anik F1R and Galaxy 15), accurate at the meter level. By simulating and deliberately degrading right ascension and declination observations, consistent with these ephemerides, a realistic assessment of the achievable orbit determination accuracy using GTO and sub-GEO SBSS platforms is performed. Our results show that orbit accuracy is significantly improved as compared to a Sun-synchronous platform. Physical property estimation is also performed using simulated astrometric and photometric data taken from GTO and sub-GEO sensors. Simulations of SBSS-only as well as combined SBSS and ground-based observation tracks are used to study the improvement in area, mass, and shape estimation
Improving hot region prediction by parameter optimization of density clustering in PPI.

PubMed

Hu, Jing; Zhang, Xiaolong

2016-11-01

This paper proposed an optimized algorithm which combines density clustering of parameter selection with feature-based classification for hot region prediction. First, all the residues are classified by SVM to remove non-hot spot residues, then density clustering of parameter selection is used to find hot regions. In the density clustering, this paper studies how to select input parameters. There are two parameters radius and density in density-based incremental clustering. We firstly fix density and enumerate radius to find a pair of parameters which leads to maximum number of clusters, and then we fix radius and enumerate density to find another pair of parameters which leads to maximum number of clusters. Experiment results show that the proposed method using both two pairs of parameters provides better prediction performance than the other method, and compare these two predictive results, the result by fixing radius and enumerating density have slightly higher prediction accuracy than that by fixing density and enumerating radius. Copyright © 2016. Published by Elsevier Inc.
Bone orientation and position estimation errors using Cosserat point elements and least squares methods: Application to gait.

PubMed

Solav, Dana; Camomilla, Valentina; Cereatti, Andrea; Barré, Arnaud; Aminian, Kamiar; Wolf, Alon

2017-09-06

The aim of this study was to analyze the accuracy of bone pose estimation based on sub-clusters of three skin-markers characterized by triangular Cosserat point elements (TCPEs) and to evaluate the capability of four instantaneous physical parameters, which can be measured non-invasively in vivo, to identify the most accurate TCPEs. Moreover, TCPE pose estimations were compared with the estimations of two least squares minimization methods applied to the cluster of all markers, using rigid body (RBLS) and homogeneous deformation (HDLS) assumptions. Analysis was performed on previously collected in vivo treadmill gait data composed of simultaneous measurements of the gold-standard bone pose by bi-plane fluoroscopy tracking the subjects' knee prosthesis and a stereophotogrammetric system tracking skin-markers affected by soft tissue artifact. Femur orientation and position errors estimated from skin-marker clusters were computed for 18 subjects using clusters of up to 35 markers. Results based on gold-standard data revealed that instantaneous subsets of TCPEs exist which estimate the femur pose with reasonable accuracy (median root mean square error during stance/swing: 1.4/2.8deg for orientation, 1.5/4.2mm for position). A non-invasive and instantaneous criteria to select accurate TCPEs for pose estimation (4.8/7.3deg, 5.8/12.3mm), was compared with RBLS (4.3/6.6deg, 6.9/16.6mm) and HDLS (4.6/7.6deg, 6.7/12.5mm). Accounting for homogeneous deformation, using HDLS or selected TCPEs, yielded more accurate position estimations than RBLS method, which, conversely, yielded more accurate orientation estimations. Further investigation is required to devise effective criteria for cluster selection that could represent a significant improvement in bone pose estimation accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.

Improving the accuracy of livestock distribution estimates through spatial interpolation.

PubMed

Bryssinckx, Ward; Ducheyne, Els; Muhwezi, Bernard; Godfrey, Sunday; Mintiens, Koen; Leirs, Herwig; Hendrickx, Guy

2012-11-01

Animal distribution maps serve many purposes such as estimating transmission risk of zoonotic pathogens to both animals and humans. The reliability and usability of such maps is highly dependent on the quality of the input data. However, decisions on how to perform livestock surveys are often based on previous work without considering possible consequences. A better understanding of the impact of using different sample designs and processing steps on the accuracy of livestock distribution estimates was acquired through iterative experiments using detailed survey. The importance of sample size, sample design and aggregation is demonstrated and spatial interpolation is presented as a potential way to improve cattle number estimates. As expected, results show that an increasing sample size increased the precision of cattle number estimates but these improvements were mainly seen when the initial sample size was relatively low (e.g. a median relative error decrease of 0.04% per sampled parish for sample sizes below 500 parishes). For higher sample sizes, the added value of further increasing the number of samples declined rapidly (e.g. a median relative error decrease of 0.01% per sampled parish for sample sizes above 500 parishes. When a two-stage stratified sample design was applied to yield more evenly distributed samples, accuracy levels were higher for low sample densities and stabilised at lower sample sizes compared to one-stage stratified sampling. Aggregating the resulting cattle number estimates yielded significantly more accurate results because of averaging under- and over-estimates (e.g. when aggregating cattle number estimates from subcounty to district level, P <0.009 based on a sample of 2,077 parishes using one-stage stratified samples). During aggregation, area-weighted mean values were assigned to higher administrative unit levels. However, when this step is preceded by a spatial interpolation to fill in missing values in non-sampled areas, accuracy
Improving PAGER's real-time earthquake casualty and loss estimation toolkit: a challenge

USGS Publications Warehouse

Jaiswal, K.S.; Wald, D.J.

2012-01-01

We describe the on-going developments of PAGER’s loss estimation models, and discuss value-added web content that can be generated related to exposure, damage and loss outputs for a variety of PAGER users. These developments include identifying vulnerable building types in any given area, estimating earthquake-induced damage and loss statistics by building type, and developing visualization aids that help locate areas of concern for improving post-earthquake response efforts. While detailed exposure and damage information is highly useful and desirable, significant improvements are still necessary in order to improve underlying building stock and vulnerability data at a global scale. Existing efforts with the GEM’s GED4GEM and GVC consortia will help achieve some of these objectives. This will benefit PAGER especially in regions where PAGER’s empirical model is less-well constrained; there, the semi-empirical and analytical models will provide robust estimates of damage and losses. Finally, we outline some of the challenges associated with rapid casualty and loss estimation that we experienced while responding to recent large earthquakes worldwide.
Ion-Stockmayer clusters: Minima, classical thermodynamics, and variational ground state estimates of Li+(CH3NO2)n (n = 1-20)

NASA Astrophysics Data System (ADS)

Curotto, E.

2015-12-01

Structural optimizations, classical NVT ensemble, and variational Monte Carlo simulations of ion Stockmayer clusters parameterized to approximate the Li+(CH3NO2)n (n = 1-20) systems are performed. The Metropolis algorithm enhanced by the parallel tempering strategy is used to measure internal energies and heat capacities, and a parallel version of the genetic algorithm is employed to obtain the most important minima. The first solvation sheath is octahedral and this feature remains the dominant theme in the structure of clusters with n ≥ 6. The first "magic number" is identified using the adiabatic solvent dissociation energy, and it marks the completion of the second solvation layer for the lithium ion-nitromethane clusters. It corresponds to the n = 18 system, a solvated ion with the first sheath having octahedral symmetry, weakly bound to an eight-membered and a four-membered ring crowning a vertex of the octahedron. Variational Monte Carlo estimates of the adiabatic solvent dissociation energy reveal that quantum effects further enhance the stability of the n = 18 system relative to its neighbors.
Stacked Weak Lensing Mass Calibration: Estimators, Systematics, and Impact on Cosmological Parameter Constraints

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rozo, Eduardo; /U. Chicago /Chicago U., KICP; Wu, Hao-Yi

2011-11-04

When extracting the weak lensing shear signal, one may employ either locally normalized or globally normalized shear estimators. The former is the standard approach when estimating cluster masses, while the latter is the more common method among peak finding efforts. While both approaches have identical signal-to-noise in the weak lensing limit, it is possible that higher order corrections or systematic considerations make one estimator preferable over the other. In this paper, we consider the efficacy of both estimators within the context of stacked weak lensing mass estimation in the Dark Energy Survey (DES). We find that the two estimators havemore » nearly identical statistical precision, even after including higher order corrections, but that these corrections must be incorporated into the analysis to avoid observationally relevant biases in the recovered masses. We also demonstrate that finite bin-width effects may be significant if not properly accounted for, and that the two estimators exhibit different systematics, particularly with respect to contamination of the source catalog by foreground galaxies. Thus, the two estimators may be employed as a systematic cross-check of each other. Stacked weak lensing in the DES should allow for the mean mass of galaxy clusters to be calibrated to {approx}2% precision (statistical only), which can improve the figure of merit of the DES cluster abundance experiment by a factor of {approx}3 relative to the self-calibration expectation. A companion paper investigates how the two types of estimators considered here impact weak lensing peak finding efforts.« less
See Change: the Supernova Sample from the Supernova Cosmology Project High Redshift Cluster Supernova Survey

NASA Astrophysics Data System (ADS)

Hayden, Brian; Perlmutter, Saul; Boone, Kyle; Nordin, Jakob; Rubin, David; Lidman, Chris; Deustua, Susana E.; Fruchter, Andrew S.; Aldering, Greg Scott; Brodwin, Mark; Cunha, Carlos E.; Eisenhardt, Peter R.; Gonzalez, Anthony H.; Jee, James; Hildebrandt, Hendrik; Hoekstra, Henk; Santos, Joana; Stanford, S. Adam; Stern, Daniel; Fassbender, Rene; Richard, Johan; Rosati, Piero; Wechsler, Risa H.; Muzzin, Adam; Willis, Jon; Boehringer, Hans; Gladders, Michael; Goobar, Ariel; Amanullah, Rahman; Hook, Isobel; Huterer, Dragan; Huang, Xiaosheng; Kim, Alex G.; Kowalski, Marek; Linder, Eric; Pain, Reynald; Saunders, Clare; Suzuki, Nao; Barbary, Kyle H.; Rykoff, Eli S.; Meyers, Joshua; Spadafora, Anthony L.; Sofiatti, Caroline; Wilson, Gillian; Rozo, Eduardo; Hilton, Matt; Ruiz-Lapuente, Pilar; Luther, Kyle; Yen, Mike; Fagrelius, Parker; Dixon, Samantha; Williams, Steven

2017-01-01

The Supernova Cosmology Project has finished executing a large (174 orbits, cycles 22-23) Hubble Space Telescope program, which has measured ~30 type Ia Supernovae above z~1 in the highest-redshift, most massive galaxy clusters known to date. Our SN Ia sample closely matches our pre-survey predictions; this sample will improve the constraint by a factor of 3 on the Dark Energy equation of state above z~1, allowing an unprecedented probe of Dark Energy time variation. When combined with the improved cluster mass calibration from gravitational lensing provided by the deep WFC3-IR observations of the clusters, See Change will triple the Dark Energy Task Force Figure of Merit. With the primary observing campaign completed, we present the preliminary supernova sample and our path forward to the supernova cosmology results. We also compare the number of SNe Ia discovered in each cluster with our pre-survey expectations based on cluster mass and SFR estimates. Our extensive HST and ground-based campaign has already produced unique results; we have confirmed several of the highest redshift cluster members known to date, confirmed the redshift of one of the most massive galaxy clusters at z~1.2 expected across the entire sky, and characterized one of the most extreme starburst environments yet known in a z~1.7 cluster. We have also discovered a lensed SN Ia at z=2.22 magnified by a factor of ~2.7, which is the highest spectroscopic redshift SN Ia currently known.
A stellar census in globular clusters with MUSE: The contribution of rotation to cluster dynamics studied with 200 000 stars

NASA Astrophysics Data System (ADS)

Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.

2018-02-01

This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
The young SMC cluster NGC 330

NASA Technical Reports Server (NTRS)

Carney, B. W.; Janes, K. A.; Flower, P. J.

1985-01-01

A color-magnitude diagram has been obtained for the young SMC cluster NGC 330. The diagram shows a well-defined main sequence, a group of blue supergiants, a group of red supergiants between B-V = 1.2 m and 1.6 m about one magnitude fainter, and an empty Hertzsprung gap. The surrounding field is a composite of a very gold population resembling galactic globular clusters and a very young population. DDO and infrared photometry strongly suggest that the cluster is metal-poor, but a definitive measure could not be made because of calibration difficulties. The cluster's age is estimated at 12 million years, with the surrounding field about 50 percent older. The cluster will prove very useful in testing stellar evolution models for young, metal-poor stars if the cluster's metallicity can be established via high-resolution spectroscopy.
Clustering approaches to improve the performance of low cost air pollution sensors.

PubMed

Smith, Katie R; Edwards, Peter M; Evans, Mathew J; Lee, James D; Shaw, Marvin D; Squires, Freya; Wilde, Shona; Lewis, Alastair C

2017-08-24

frequent calibration. The use of a cluster median value eliminates unpredictable medium term response changes, and other longer term outlier behaviours, extending the likely period needed between calibration and making a linear interpolation between calibrations more appropriate. Through the use of sensor clusters rather than individual sensors, existing low cost technologies could deliver significantly improved quality of observations.
X-RAY BINARIES AND STAR CLUSTERS IN THE ANTENNAE: OPTICAL CLUSTER COUNTERPARTS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rangelov, Blagoy; Chandar, Rupali; Prestwich, Andrea

2012-10-20

We compare the locations of 82 X-ray binaries (XRBs) detected in the merging Antennae galaxies by Zezas et al., based on observations taken with the Chandra X-Ray Observatory, with a catalog of optically selected star clusters presented by Whitmore et al., based on observations taken with the Hubble Space Telescope. Within the 2{sigma} positional uncertainty of Almost-Equal-To 0.''8, we find 22 XRBs are coincident with star clusters, where only two to three chance coincidences are expected. The ages of the clusters were estimated by comparing their UBVI, H{alpha} colors with predictions from stellar evolutionary models. We find that 14 ofmore » the 22 coincident XRBs (64%) are hosted by star clusters with ages of Almost-Equal-To 6 Myr or less. All of the very young host clusters are fairly massive and have M {approx}> 3 Multiplication-Sign 10{sup 4} M {sub Sun }, with many having masses M Almost-Equal-To 10{sup 5} M {sub Sun }. Five of the XRBs are hosted by young clusters with ages {tau} Almost-Equal-To 10-100 Myr, while three are hosted by intermediate-age clusters with {tau} Almost-Equal-To 100-300 Myr. Based on the results from recent N-body simulations, which suggest that black holes are far more likely to be retained within their parent clusters than neutron stars, we suggest that our sample consists primarily of black hole binaries with different ages.« less
Improved battery parameter estimation method considering operating scenarios for HEV/EV applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Jufeng; Xia, Bing; Shang, Yunlong

This study presents an improved battery parameter estimation method based on typical operating scenarios in hybrid electric vehicles and pure electric vehicles. Compared with the conventional estimation methods, the proposed method takes both the constant-current charging and the dynamic driving scenarios into account, and two separate sets of model parameters are estimated through different parts of the pulse-rest test. The model parameters for the constant-charging scenario are estimated from the data in the pulse-charging periods, while the model parameters for the dynamic driving scenario are estimated from the data in the rest periods, and the length of the fitted datasetmore » is determined by the spectrum analysis of the load current. In addition, the unsaturated phenomenon caused by the long-term resistor-capacitor (RC) network is analyzed, and the initial voltage expressions of the RC networks in the fitting functions are improved to ensure a higher model fidelity. Simulation and experiment results validated the feasibility of the developed estimation method.« less
Improved battery parameter estimation method considering operating scenarios for HEV/EV applications

DOE PAGES

Yang, Jufeng; Xia, Bing; Shang, Yunlong; ...

2016-12-22

This study presents an improved battery parameter estimation method based on typical operating scenarios in hybrid electric vehicles and pure electric vehicles. Compared with the conventional estimation methods, the proposed method takes both the constant-current charging and the dynamic driving scenarios into account, and two separate sets of model parameters are estimated through different parts of the pulse-rest test. The model parameters for the constant-charging scenario are estimated from the data in the pulse-charging periods, while the model parameters for the dynamic driving scenario are estimated from the data in the rest periods, and the length of the fitted datasetmore » is determined by the spectrum analysis of the load current. In addition, the unsaturated phenomenon caused by the long-term resistor-capacitor (RC) network is analyzed, and the initial voltage expressions of the RC networks in the fitting functions are improved to ensure a higher model fidelity. Simulation and experiment results validated the feasibility of the developed estimation method.« less
ASTM clustering for improving coal analysis by near-infrared spectroscopy.

PubMed

Andrés, J M; Bona, M T

2006-11-15

Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.
Improved regional-scale Brazilian cropping systems' mapping based on a semi-automatic object-based clustering approach

NASA Astrophysics Data System (ADS)

Bellón, Beatriz; Bégué, Agnès; Lo Seen, Danny; Lebourgeois, Valentine; Evangelista, Balbino Antônio; Simões, Margareth; Demonte Ferraz, Rodrigo Peçanha

2018-06-01

Cropping systems' maps at fine scale over large areas provide key information for further agricultural production and environmental impact assessments, and thus represent a valuable tool for effective land-use planning. There is, therefore, a growing interest in mapping cropping systems in an operational manner over large areas, and remote sensing approaches based on vegetation index time series analysis have proven to be an efficient tool. However, supervised pixel-based approaches are commonly adopted, requiring resource consuming field campaigns to gather training data. In this paper, we present a new object-based unsupervised classification approach tested on an annual MODIS 16-day composite Normalized Difference Vegetation Index time series and a Landsat 8 mosaic of the State of Tocantins, Brazil, for the 2014-2015 growing season. Two variants of the approach are compared: an hyperclustering approach, and a landscape-clustering approach involving a previous stratification of the study area into landscape units on which the clustering is then performed. The main cropping systems of Tocantins, characterized by the crop types and cropping patterns, were efficiently mapped with the landscape-clustering approach. Results show that stratification prior to clustering significantly improves the classification accuracies for underrepresented and sparsely distributed cropping systems. This study illustrates the potential of unsupervised classification for large area cropping systems' mapping and contributes to the development of generic tools for supporting large-scale agricultural monitoring across regions.
Towards a comprehensive knowledge of the open cluster Haffner 9

NASA Astrophysics Data System (ADS)

Piatti, Andrés E.

2017-03-01

We turn our attention to Haffner 9, a Milky Way open cluster whose previous fundamental parameter estimates are far from being in agreement. In order to provide with accurate estimates, we present high-quality Washington CT1 and Johnson BVI photometry of the cluster field. We put particular care in statistically cleaning the colour-magnitude diagrams (CMDs) from field star contamination, which was found a common source in previous works for the discordant fundamental parameter estimates. The resulting cluster CMD fiducial features were confirmed from a proper motion membership analysis. Haffner 9 is a moderately young object (age ∼350 Myr), placed in the Perseus arm - at a heliocentric distance of ∼3.2 kpc - , with a lower limit for its present mass of ∼160 M⊙ and of nearly metal solar content. The combination of the cluster structural and fundamental parameters suggest that it is in an advanced stage of internal dynamical evolution, possibly in the phase typical of those with mass segregation in their core regions. However, the cluster still keeps its mass function close to that of the Salpeter's law.
Three estimates of the association between linear growth failure and cognitive ability.

PubMed

Cheung, Y B; Lam, K F

2009-09-01

To compare three estimators of association between growth stunting as measured by height-for-age Z-score and cognitive ability in children, and to examine the extent statistical adjustment for covariates is useful for removing confounding due to socio-economic status. Three estimators, namely random-effects, within- and between-cluster estimators, for panel data were used to estimate the association in a survey of 1105 pairs of siblings who were assessed for anthropometry and cognition. Furthermore, a 'combined' model was formulated to simultaneously provide the within- and between-cluster estimates. Random-effects and between-cluster estimators showed strong association between linear growth and cognitive ability, even after adjustment for a range of socio-economic variables. In contrast, the within-cluster estimator showed a much more modest association: For every increase of one Z-score in linear growth, cognitive ability increased by about 0.08 standard deviation (P < 0.001). The combined model verified that the between-cluster estimate was significantly larger than the within-cluster estimate (P = 0.004). Residual confounding by socio-economic situations may explain a substantial proportion of the observed association between linear growth and cognition in studies that attempt to control the confounding by means of multivariable regression analysis. The within-cluster estimator provides more convincing and modest results about the strength of association.
An improved method for nonlinear parameter estimation: a case study of the Rössler model

NASA Astrophysics Data System (ADS)

He, Wen-Ping; Wang, Liu; Jiang, Yun-Di; Wan, Shi-Quan

2016-08-01

Parameter estimation is an important research topic in nonlinear dynamics. Based on the evolutionary algorithm (EA), Wang et al. (2014) present a new scheme for nonlinear parameter estimation and numerical tests indicate that the estimation precision is satisfactory. However, the convergence rate of the EA is relatively slow when multiple unknown parameters in a multidimensional dynamical system are estimated simultaneously. To solve this problem, an improved method for parameter estimation of nonlinear dynamical equations is provided in the present paper. The main idea of the improved scheme is to use all of the known time series for all of the components in some dynamical equations to estimate the parameters in single component one by one, instead of estimating all of the parameters in all of the components simultaneously. Thus, we can estimate all of the parameters stage by stage. The performance of the improved method was tested using a classic chaotic system—Rössler model. The numerical tests show that the amended parameter estimation scheme can greatly improve the searching efficiency and that there is a significant increase in the convergence rate of the EA, particularly for multiparameter estimation in multidimensional dynamical equations. Moreover, the results indicate that the accuracy of parameter estimation and the CPU time consumed by the presented method have no obvious dependence on the sample size.
MODIS Data Assimilation in the CROPGRO model for improving soybean yield estimations

NASA Astrophysics Data System (ADS)

Richetti, J.; Monsivais-Huertero, A.; Ahmad, I.; Judge, J.

2017-12-01

Soybean is one of the main agricultural commodities in the world. Thus, having better estimates of its agricultural production is important. Improving the soybean crop models in Brazil is crucial for better understanding of the soybean market and enhancing decision making, because Brazil is the second largest soybean producer in the world, Parana state is responsible for almost 20% of it, and by itself would be the fourth greatest soybean producer in the world. Data assimilation techniques provide a method to improve spatio-temporal continuity of crops through integration of remotely sensed observations and crop growth models. This study aims to use MODIS EVI to improve DSSAT-CROPGRO soybean yield estimations in the Parana state, southern Brazil. The method uses the Ensemble Kalman filter which assimilates MODIS Terra and Aqua combined products (MOD13Q1 and MYD13Q1) into the CROPGRO model to improve the agricultural production estimates through update of light interception data over time. Expected results will be validated with monitored commercial farms during the period of 2013-2014.
Local-world and cluster-growing weighted networks with controllable clustering

NASA Astrophysics Data System (ADS)

Yang, Chun-Xia; Tang, Min-Xuan; Tang, Hai-Qiang; Deng, Qiang-Qiang

2014-12-01

We constructed an improved weighted network model by introducing local-world selection mechanism and triangle coupling mechanism based on the traditional BBV model. The model gives power-law distributions of degree, strength and edge weight and presents the linear relationship both between the degree and strength and between the degree and the clustering coefficient. Particularly, the model is equipped with an ability to accelerate the speed increase of strength exceeding that of degree. Besides, the model is more sound and efficient in tuning clustering coefficient than the original BBV model. Finally, based on our improved model, we analyze the virus spread process and find that reducing the size of local-world has a great inhibited effect on virus spread.
Discrete Wavelet Transform-Based Whole-Spectral and Subspectral Analysis for Improved Brain Tumor Clustering Using Single Voxel MR Spectroscopy.

PubMed

Yang, Guang; Nawaz, Tahir; Barrick, Thomas R; Howe, Franklyn A; Slabaugh, Greg

2015-12-01

Many approaches have been considered for automatic grading of brain tumors by means of pattern recognition with magnetic resonance spectroscopy (MRS). Providing an improved technique which can assist clinicians in accurately identifying brain tumor grades is our main objective. The proposed technique, which is based on the discrete wavelet transform (DWT) of whole-spectral or subspectral information of key metabolites, combined with unsupervised learning, inspects the separability of the extracted wavelet features from the MRS signal to aid the clustering. In total, we included 134 short echo time single voxel MRS spectra (SV MRS) in our study that cover normal controls, low grade and high grade tumors. The combination of DWT-based whole-spectral or subspectral analysis and unsupervised clustering achieved an overall clustering accuracy of 94.8% and a balanced error rate of 7.8%. To the best of our knowledge, it is the first study using DWT combined with unsupervised learning to cluster brain SV MRS. Instead of dimensionality reduction on SV MRS or feature selection using model fitting, our study provides an alternative method of extracting features to obtain promising clustering results.
Estimation of Comfort/Disconfort Based on EEG in Massage by Use of Clustering according to Correration and Incremental Learning type NN

NASA Astrophysics Data System (ADS)

Teramae, Tatsuya; Kushida, Daisuke; Takemori, Fumiaki; Kitamura, Akira

Authors proposed the estimation method combining k-means algorithm and NN for evaluating massage. However, this estimation method has a problem that discrimination ratio is decreased to new user. There are two causes of this problem. One is that generalization of NN is bad. Another one is that clustering result by k-means algorithm has not high correlation coefficient in a class. Then, this research proposes k-means algorithm according to correlation coefficient and incremental learning for NN. The proposed k-means algorithm is method included evaluation function based on correlation coefficient. Incremental learning is method that NN is learned by new data and initialized weight based on the existing data. The effect of proposed methods are verified by estimation result using EEG data when testee is given massage.

Nanospectroscopy of thiacyanine dye molecules adsorbed on silver nanoparticle clusters

NASA Astrophysics Data System (ADS)

Ralević, Uroš; Isić, Goran; Anicijević, Dragana Vasić; Laban, Bojana; Bogdanović, Una; Lazović, Vladimir M.; Vodnik, Vesna; Gajić, Radoš

2018-03-01

The adsorption of thiacyanine dye molecules on citrate-stabilized silver nanoparticle clusters drop-cast onto freshly cleaved mica or highly oriented pyrolytic graphite surfaces is examined using colocalized surface-enhanced Raman spectroscopy and atomic force microscopy. The incidence of dye Raman signatures in photoluminescence hotspots identified around nanoparticle clusters is considered for both citrate- and borate-capped silver nanoparticles and found to be substantially lower in the former case, suggesting that the citrate anions impede the efficient dye adsorption. Rigorous numerical simulations of light scattering on random nanoparticle clusters are used for estimating the electromagnetic enhancement and elucidating the hotspot formation mechanism. The majority of the enhanced Raman signal, estimated to be more than 90%, is found to originate from the nanogaps between adjacent nanoparticles in the cluster, regardless of the cluster size and geometry.
Competitive repetition suppression (CoRe) clustering: a biologically inspired learning model with application to robust clustering.

PubMed

Bacciu, Davide; Starita, Antonina

2008-11-01

Determining a compact neural coding for a set of input stimuli is an issue that encompasses several biological memory mechanisms as well as various artificial neural network models. In particular, establishing the optimal network structure is still an open problem when dealing with unsupervised learning models. In this paper, we introduce a novel learning algorithm, named competitive repetition-suppression (CoRe) learning, inspired by a cortical memory mechanism called repetition suppression (RS). We show how such a mechanism is used, at various levels of the cerebral cortex, to generate compact neural representations of the visual stimuli. From the general CoRe learning model, we derive a clustering algorithm, named CoRe clustering, that can automatically estimate the unknown cluster number from the data without using a priori information concerning the input distribution. We illustrate how CoRe clustering, besides its biological plausibility, posses strong theoretical properties in terms of robustness to noise and outliers, and we provide an error function describing CoRe learning dynamics. Such a description is used to analyze CoRe relationships with the state-of-the art clustering models and to highlight CoRe similitude with rival penalized competitive learning (RPCL), showing how CoRe extends such a model by strengthening the rival penalization estimation by means of loss functions from robust statistics.
Dynamical mass estimates in M13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leonard, P.J.T.; Richer, H.B.; Fahlman, G.G.

We have used the proper motion data of Cudworth Monet to make mass estimates in the globular cluster M13 by solving the spherical Jeans equation. We find a mass inside a spherical shell centered on the cluster with a radius corresponding to 390 arcsec on the sky of 5.5 or 7.6 {times} 10{sup 5} M{circle dot}, depending on the adopted cluster distance. This large dynamical mass estimate together with the observed fact that the mass function of M13 is rising steeply at the low-mass end suggest that much of the cluster mass may be in the form of low-mass starsmore » and brown dwarfs.« less
Dynamical mass estimates in M13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leonard, P.J.T.; Richer, H.B.; Fahlman, G.G.

We have used the proper motion data of Cudworth Monet to make mass estimates in the globular cluster M13 by solving the spherical Jeans equation. We find a mass inside a spherical shell centered on the cluster with a radius corresponding to 390 arcsec on the sky of 5.5 or 7.6 {times} 10{sup 5} M{circle_dot}, depending on the adopted cluster distance. This large dynamical mass estimate together with the observed fact that the mass function of M13 is rising steeply at the low-mass end suggest that much of the cluster mass may be in the form of low-mass stars andmore » brown dwarfs.« less
Using Hierarchical Cluster Models to Systematically Identify Groups of Jobs With Similar Occupational Questionnaire Response Patterns to Assist Rule-Based Expert Exposure Assessment in Population-Based Studies

PubMed Central

Friesen, Melissa C.; Shortreed, Susan M.; Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla R.; Silverman, Debra T.; Yu, Kai

2015-01-01

Objectives: Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Methods: Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m−3 respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters’ homogeneity (defined as >75% with the same estimate
Strategies for Improving Vaccine Delivery: A Cluster-Randomized Trial.

PubMed

Fu, Linda Y; Zook, Kathleen; Gingold, Janet A; Gillespie, Catherine W; Briccetti, Christine; Cora-Bramble, Denice; Joseph, Jill G; Haimowitz, Rachel; Moon, Rachel Y

2016-06-01

New emphasis on and requirements for demonstrating health care quality have increased the need for evidence-based methods to disseminate practice guidelines. With regard to impact on pediatric immunization coverage, we aimed to compare a financial incentive program (pay-for-performance [P4P]) and a virtual quality improvement technical support (QITS) learning collaborative. This single-blinded (to outcomes assessor), cluster-randomized trial was conducted among unaffiliated pediatric practices across the United States from June 2013 to June 2014. Practices received either the P4P or QITS intervention. All practices received a Vaccinator Toolkit. P4P practices participated in a tiered financial incentives program for immunization coverage improvement. QITS practices participated in a virtual learning collaborative. Primary outcome was percentage of all needed vaccines received (PANVR). We also assessed immunization up-to-date (UTD) status. Data were analyzed from 3,147 patient records from 32 practices. Practices in the study arms reported similar QI activities (∼6 to 7 activities). We found no difference in PANVR between P4P and QITS (mean ± SE, 90.7% ± 1.1% vs 86.1% ± 1.3%, P = 0.46). Likewise, there was no difference in odds of being UTD between study arms (adjusted odds ratio 1.02, 95% confidence interval 0.68 to 1.52, P = .93). In within-group analysis, patients in both arms experienced nonsignificant increases in PANVR. Similarly, the change in adjusted odds of UTD over time was modest and nonsignificant for P4P but reached significance in the QITS arm (adjusted odds ratio 1.28, 95% confidence interval 1.02 to 1.60, P = .03). Participation in either a financial incentives program or a virtual learning collaborative led to self-reported improvements in immunization practices but minimal change in objectively measured immunization coverage. Copyright © 2016 by the American Academy of Pediatrics.
Rates of collapse and evaporation of globular clusters

NASA Technical Reports Server (NTRS)

Hut, Piet; Djorgovski, S.

1992-01-01

Observational estimates of the dynamical relaxation times of Galactic globular clusters are used here to estimate the present rate at which core collapse and evaporation are occurring in them. A core collapse rate of 2 +/- 1 per Gyr is found, which for a Galactic age of about 12 Gyr agrees well with the fact that 27 clusters have surface brightness profiles with the morphology expected for the postcollapse phase. A destruction and evaporation rate of 5 +/- 3 per Gyr is found, suggesting that a significant fraction of the Galaxy's original complement of globular clusters have perished through the combined effects of mechanisms such as relaxation-driven evaporation and shocking due to interaction with the Galactic disk and bulge.
A note on the kappa statistic for clustered dichotomous data.

PubMed

Zhou, Ming; Yang, Zhao

2014-06-30

The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic.

PubMed

Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

2016-01-01

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.
Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

PubMed Central

Ma, Yue; Yin, Fei; Zhang, Tao; Zhou, Xiaohua Andrew; Li, Xiaosong

2016-01-01

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. PMID:26820646
Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

PubMed

Saeed, Faisal; Salim, Naomie; Abdo, Ammar

2013-07-01

Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Effect of Clustering on Estimations of the UV Ionizing Background from the Proximity Effect

NASA Astrophysics Data System (ADS)

Pascarelle, S. M.; Lanzetta, K. M.; Chen, H. W.

1999-09-01

There have been several determinations of the ionizing background using the proximity effect observed in the distibution of Lyman-alpha absorption lines in the spectra of QSOs at high redshift. It is usually assumed that the distribution of lines should be the same at very small impact parameters to the QSO as it is at large impact parameters, and any decrease in line density at small impact parameters is due to ionizing radiation from the QSO. However, if these Lyman-alpha absorption lines arise in galaxies (Lanzetta et al. 1995, Chen et al. 1998), then the strength of the proximity effect may have been underestimated in previous work, since galaxies are known to cluster around QSOs. Therefore, the UV background estimations have likely been overestimated by the same factor.
Improvement of Accuracy for Background Noise Estimation Method Based on TPE-AE

NASA Astrophysics Data System (ADS)

Itai, Akitoshi; Yasukawa, Hiroshi

This paper proposes a method of a background noise estimation based on the tensor product expansion with a median and a Monte carlo simulation. We have shown that a tensor product expansion with absolute error method is effective to estimate a background noise, however, a background noise might not be estimated by using conventional method properly. In this paper, it is shown that the estimate accuracy can be improved by using proposed methods.
Prediction of line failure fault based on weighted fuzzy dynamic clustering and improved relational analysis

NASA Astrophysics Data System (ADS)

Meng, Xiaocheng; Che, Renfei; Gao, Shi; He, Juntao

2018-04-01

With the advent of large data age, power system research has entered a new stage. At present, the main application of large data in the power system is the early warning analysis of the power equipment, that is, by collecting the relevant historical fault data information, the system security is improved by predicting the early warning and failure rate of different kinds of equipment under certain relational factors. In this paper, a method of line failure rate warning is proposed. Firstly, fuzzy dynamic clustering is carried out based on the collected historical information. Considering the imbalance between the attributes, the coefficient of variation is given to the corresponding weights. And then use the weighted fuzzy clustering to deal with the data more effectively. Then, by analyzing the basic idea and basic properties of the relational analysis model theory, the gray relational model is improved by combining the slope and the Deng model. And the incremental composition and composition of the two sequences are also considered to the gray relational model to obtain the gray relational degree between the various samples. The failure rate is predicted according to the principle of weighting. Finally, the concrete process is expounded by an example, and the validity and superiority of the proposed method are verified.
Proportion estimation using prior cluster purities

NASA Technical Reports Server (NTRS)

Terrell, G. R. (Principal Investigator)

1980-01-01

The prior distribution of CLASSY component purities is studied, and this information incorporated into maximum likelihood crop proportion estimators. The method is tested on Transition Year spring small grain segments.
Dictionary-based fiber orientation estimation with improved spatial consistency.

PubMed

Ye, Chuyang; Prince, Jerry L

2018-02-01

Diffusion magnetic resonance imaging (dMRI) has enabled in vivo investigation of white matter tracts. Fiber orientation (FO) estimation is a key step in tract reconstruction and has been a popular research topic in dMRI analysis. In particular, the sparsity assumption has been used in conjunction with a dictionary-based framework to achieve reliable FO estimation with a reduced number of gradient directions. Because image noise can have a deleterious effect on the accuracy of FO estimation, previous works have incorporated spatial consistency of FOs in the dictionary-based framework to improve the estimation. However, because FOs are only indirectly determined from the mixture fractions of dictionary atoms and not modeled as variables in the objective function, these methods do not incorporate FO smoothness directly, and their ability to produce smooth FOs could be limited. In this work, we propose an improvement to Fiber Orientation Reconstruction using Neighborhood Information (FORNI), which we call FORNI+; this method estimates FOs in a dictionary-based framework where FO smoothness is better enforced than in FORNI alone. We describe an objective function that explicitly models the actual FOs and the mixture fractions of dictionary atoms. Specifically, it consists of data fidelity between the observed signals and the signals represented by the dictionary, pairwise FO dissimilarity that encourages FO smoothness, and weighted ℓ 1 -norm terms that ensure the consistency between the actual FOs and the FO configuration suggested by the dictionary representation. The FOs and mixture fractions are then jointly estimated by minimizing the objective function using an iterative alternating optimization strategy. FORNI+ was evaluated on a simulation phantom, a physical phantom, and real brain dMRI data. In particular, in the real brain dMRI experiment, we have qualitatively and quantitatively evaluated the reproducibility of the proposed method. Results demonstrate that
VizieR Online Data Catalog: Star clusters distances and extinctions (Buckner+, 2013)

NASA Astrophysics Data System (ADS)

Buckner, A. S. M.; Froebrich, D.

2014-10-01

Determining star cluster distances is essential to analyse their properties and distribution in the Galaxy. In particular, it is desirable to have a reliable, purely photometric distance estimation method for large samples of newly discovered cluster candidates e.g. from the Two Micron All Sky Survey, the UK Infrared Deep Sky Survey Galactic Plane Survey and VVV. Here, we establish an automatic method to estimate distances and reddening from near-infrared photometry alone, without the use of isochrone fitting. We employ a decontamination procedure of JHK photometry to determine the density of stars foreground to clusters and a galactic model to estimate distances. We then calibrate the method using clusters with known properties. This allows us to establish distance estimates with better than 40 percent accuracy. We apply our method to determine the extinction and distance values to 378 known open clusters and 397 cluster candidates from the list of Froebrich, Scholz & Raftery (2007MNRAS.374..399F, Cat. J/MNRAS/374/399). We find that the sample is biased towards clusters of a distance of approximately 3kpc, with typical distances between 2 and 6kpc. Using the cluster distances and extinction values, we investigate how the average extinction per kiloparsec distance changes as a function of the Galactic longitude. We find a systematic dependence that can be approximated by AH(l)[mag/kpc]=0.10+0.001x|l-180°|/° for regions more than 60° from the Galactic Centre. (1 data file).
An improved initialization center k-means clustering algorithm based on distance and density

NASA Astrophysics Data System (ADS)

Duan, Yanling; Liu, Qun; Xia, Shuyin

2018-04-01

Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.
Improved Goldstein Interferogram Filter Based on Local Fringe Frequency Estimation.

PubMed

Feng, Qingqing; Xu, Huaping; Wu, Zhefeng; You, Yanan; Liu, Wei; Ge, Shiqi

2016-11-23

The quality of an interferogram, which is limited by various phase noise, will greatly affect the further processes of InSAR, such as phase unwrapping. Interferometric SAR (InSAR) geophysical measurements', such as height or displacement, phase filtering is therefore an essential step. In this work, an improved Goldstein interferogram filter is proposed to suppress the phase noise while preserving the fringe edges. First, the proposed adaptive filter step, performed before frequency estimation, is employed to improve the estimation accuracy. Subsequently, to preserve the fringe characteristics, the estimated fringe frequency in each fixed filtering patch is removed from the original noisy phase. Then, the residual phase is smoothed based on the modified Goldstein filter with its parameter alpha dependent on both the coherence map and the residual phase frequency. Finally, the filtered residual phase and the removed fringe frequency are combined to generate the filtered interferogram, with the loss of signal minimized while reducing the noise level. The effectiveness of the proposed method is verified by experimental results based on both simulated and real data.
Improved Goldstein Interferogram Filter Based on Local Fringe Frequency Estimation

PubMed Central

Feng, Qingqing; Xu, Huaping; Wu, Zhefeng; You, Yanan; Liu, Wei; Ge, Shiqi

2016-01-01

The quality of an interferogram, which is limited by various phase noise, will greatly affect the further processes of InSAR, such as phase unwrapping. Interferometric SAR (InSAR) geophysical measurements’, such as height or displacement, phase filtering is therefore an essential step. In this work, an improved Goldstein interferogram filter is proposed to suppress the phase noise while preserving the fringe edges. First, the proposed adaptive filter step, performed before frequency estimation, is employed to improve the estimation accuracy. Subsequently, to preserve the fringe characteristics, the estimated fringe frequency in each fixed filtering patch is removed from the original noisy phase. Then, the residual phase is smoothed based on the modified Goldstein filter with its parameter alpha dependent on both the coherence map and the residual phase frequency. Finally, the filtered residual phase and the removed fringe frequency are combined to generate the filtered interferogram, with the loss of signal minimized while reducing the noise level. The effectiveness of the proposed method is verified by experimental results based on both simulated and real data. PMID:27886081

Star clusters in the Magellanic Clouds - I. Parametrization and classification of 1072 clusters in the LMC

NASA Astrophysics Data System (ADS)

Nayak, P. K.; Subramaniam, A.; Choudhury, S.; Indu, G.; Sagar, Ram

2016-12-01

We have introduced a semi-automated quantitative method to estimate the age and reddening of 1072 star clusters in the Large Magellanic Cloud (LMC) using the Optical Gravitational Lensing Experiment III survey data. This study brings out 308 newly parametrized clusters. In a first of its kind, the LMC clusters are classified into groups based on richness/mass as very poor, poor, moderate and rich clusters, similar to the classification scheme of open clusters in the Galaxy. A major cluster formation episode is found to happen at 125 ± 25 Myr in the inner LMC. The bar region of the LMC appears prominently in the age range 60-250 Myr and is found to have a relatively higher concentration of poor and moderate clusters. The eastern and the western ends of the bar are found to form clusters initially, which later propagates to the central part. We demonstrate that there is a significant difference in the distribution of clusters as a function of mass, using a movie based on the propagation (in space and time) of cluster formation in various groups. The importance of including the low-mass clusters in the cluster formation history is demonstrated. The catalogue with parameters, classification, and cleaned and isochrone fitted colour-magnitude diagrams of 1072 clusters, which are available as online material, can be further used to understand the hierarchical formation of clusters in selected regions of the LMC.
A systematic review of cluster randomised trials in residential facilities for older people suggests how to improve quality.

PubMed

Diaz-Ordaz, Karla; Froud, Robert; Sheehan, Bart; Eldridge, Sandra

2013-10-22

Previous reviews of cluster randomised trials have been critical of the quality of the trials reviewed, but none has explored determinants of the quality of these trials in a specific field over an extended period of time. Recent work suggests that correct conduct and reporting of these trials may require more than published guidelines. In this review, our aim was to assess the quality of cluster randomised trials conducted in residential facilities for older people, and to determine whether (1) statistician involvement in the trial and (2) strength of journal endorsement of the Consolidated Standards of Reporting Trials (CONSORT) statement influence quality. We systematically identified trials randomising residential facilities for older people, or parts thereof, without language restrictions, up to the end of 2010, using National Library of Medicine (Medline) via PubMed and hand-searching. We based quality assessment criteria largely on the extended CONSORT statement for cluster randomised trials. We assessed statistician involvement based on statistician co-authorship, and strength of journal endorsement of the CONSORT statement from journal websites. 73 trials met our inclusion criteria. Of these, 20 (27%) reported accounting for clustering in sample size calculations and 54 (74%) in the analyses. In 29 trials (40%), methods used to identify/recruit participants were judged by us to have potentially caused bias or reporting was unclear to reach a conclusion. Some elements of quality improved over time but this appeared not to be related to the publication of the extended CONSORT statement for these trials. Trials with statistician/epidemiologist co-authors were more likely to account for clustering in sample size calculations (unadjusted odds ratio 5.4, 95% confidence interval 1.1 to 26.0) and analyses (unadjusted OR 3.2, 1.2 to 8.5). Journal endorsement of the CONSORT statement was not associated with trial quality. Despite international attempts to improve
Potential Improvements to Remote Primary Productivity Estimation in the Southern California Current System

NASA Astrophysics Data System (ADS)

Jacox, M.; Edwards, C. A.; Kahru, M.; Rudnick, D. L.; Kudela, R. M.

2012-12-01

A 26-year record of depth integrated primary productivity (PP) in the Southern California Current System (SCCS) is analyzed with the goal of improving satellite net primary productivity (PP) estimates. The ratio of integrated primary productivity to surface chlorophyll correlates strongly to surface chlorophyll concentration (chl0). However, chl0 does not correlate to chlorophyll-specific productivity, and appears to be a proxy for vertical phytoplankton distribution rather than phytoplankton physiology. Modest improvements in PP model performance are achieved by tuning existing algorithms for the SCCS, particularly by empirical parameterization of photosynthetic efficiency in the Vertically Generalized Production Model. Much larger improvements are enabled by improving accuracy of subsurface chlorophyll and light profiles. In a simple vertically resolved production model, substitution of in situ surface data for remote sensing estimates offers only marginal improvements in model r2 and total log10 root mean squared difference, while inclusion of in situ chlorophyll and light profiles improves these metrics significantly. Autonomous underwater gliders, capable of measuring subsurface fluorescence on long-term, long-range deployments, significantly improve PP model fidelity in the SCCS. We suggest their use (and that of other autonomous profilers such as Argo floats) in conjunction with satellites as a way forward for improved PP estimation in coastal upwelling systems.
Exploring cosmic origins with CORE: Cluster science

NASA Astrophysics Data System (ADS)

Melin, J.-B.; Bonaldi, A.; Remazeilles, M.; Hagstotz, S.; Diego, J. M.; Hernández-Monteagudo, C.; Génova-Santos, R. T.; Luzzi, G.; Martins, C. J. A. P.; Grandis, S.; Mohr, J. J.; Bartlett, J. G.; Delabrouille, J.; Ferraro, S.; Tramonte, D.; Rubiño-Martín, J. A.; Macìas-Pérez, J. F.; Achúcarro, A.; Ade, P.; Allison, R.; Ashdown, M.; Ballardini, M.; Banday, A. J.; Banerji, R.; Bartolo, N.; Basak, S.; Basu, K.; Battye, R. A.; Baumann, D.; Bersanelli, M.; Bonato, M.; Borrill, J.; Bouchet, F.; Boulanger, F.; Brinckmann, T.; Bucher, M.; Burigana, C.; Buzzelli, A.; Cai, Z.-Y.; Calvo, M.; Carvalho, C. S.; Castellano, M. G.; Challinor, A.; Chluba, J.; Clesse, S.; Colafrancesco, S.; Colantoni, I.; Coppolecchia, A.; Crook, M.; D'Alessandro, G.; de Bernardis, P.; de Gasperis, G.; De Petris, M.; De Zotti, G.; Di Valentino, E.; Errard, J.; Feeney, S. M.; Fernández-Cobos, R.; Finelli, F.; Forastieri, F.; Galli, S.; Gerbino, M.; González-Nuevo, J.; Greenslade, J.; Hanany, S.; Handley, W.; Hervias-Caimapo, C.; Hills, M.; Hivon, E.; Kiiveri, K.; Kisner, T.; Kitching, T.; Kunz, M.; Kurki-Suonio, H.; Lamagna, L.; Lasenby, A.; Lattanzi, M.; Le Brun, A. M. C.; Lesgourgues, J.; Lewis, A.; Liguori, M.; Lindholm, V.; Lopez-Caniego, M.; Maffei, B.; Martinez-Gonzalez, E.; Masi, S.; Mazzotta, P.; McCarthy, D.; Melchiorri, A.; Molinari, D.; Monfardini, A.; Natoli, P.; Negrello, M.; Notari, A.; Paiella, A.; Paoletti, D.; Patanchon, G.; Piat, M.; Pisano, G.; Polastri, L.; Polenta, G.; Pollo, A.; Poulin, V.; Quartin, M.; Roman, M.; Salvati, L.; Tartari, A.; Tomasi, M.; Trappe, N.; Triqueneaux, S.; Trombetti, T.; Tucker, C.; Väliviita, J.; van de Weygaert, R.; Van Tent, B.; Vennin, V.; Vielva, P.; Vittorio, N.; Weller, J.; Young, K.; Zannoni, M.

2018-04-01

We examine the cosmological constraints that can be achieved with a galaxy cluster survey with the future CORE space mission. Using realistic simulations of the millimeter sky, produced with the latest version of the Planck Sky Model, we characterize the CORE cluster catalogues as a function of the main mission performance parameters. We pay particular attention to telescope size, key to improved angular resolution, and discuss the comparison and the complementarity of CORE with ambitious future ground-based CMB experiments that could be deployed in the next decade. A possible CORE mission concept with a 150 cm diameter primary mirror can detect of the order of 50,000 clusters through the thermal Sunyaev-Zeldovich effect (SZE). The total yield increases (decreases) by 25% when increasing (decreasing) the mirror diameter by 30 cm. The 150 cm telescope configuration will detect the most massive clusters (>1014 Msolar) at redshift z>1.5 over the whole sky, although the exact number above this redshift is tied to the uncertain evolution of the cluster SZE flux-mass relation; assuming self-similar evolution, CORE will detect 0~ 50 clusters at redshift z>1.5. This changes to 800 (200) when increasing (decreasing) the mirror size by 30 cm. CORE will be able to measure individual cluster halo masses through lensing of the cosmic microwave background anisotropies with a 1-σ sensitivity of 4×1014 Msolar, for a 120 cm aperture telescope, and 1014 Msolar for a 180 cm one. From the ground, we estimate that, for example, a survey with about 150,000 detectors at the focus of 350 cm telescopes observing 65% of the sky would be shallower than CORE and detect about 11,000 clusters, while a survey with the same number of detectors observing 25% of sky with a 10 m telescope is expected to be deeper and to detect about 70,000 clusters. When combined with the latter, CORE would reach a limiting mass of M500 ~ 2‑3 × 1013 Msolar and detect 220,000 clusters (5 sigma detection limit
Evidence for Cluster Evolution from an Improved Measurement of the Velocity Dispersion and Morphological Fraction of Cluster 1324+3011 at z=0.76

NASA Astrophysics Data System (ADS)

Lubin, Lori M.; Oke, J. B.; Postman, Marc

2002-10-01

We have carried out additional spectroscopic observations in the field of cluster Cl 1324+3011 at z=0.76. Combined with the spectroscopy recently presented by Postman, Lubin, & Oke, we now have spectroscopically confirmed 47 cluster members. With this significant number of redshifts, we measure accurately the cluster velocity dispersion to be 1016+126-93 km s-1. The distribution of velocity offsets is consistent with a Gaussian, indicating no substantial velocity substructure. As previously noted for other optically selected clusters at redshifts of z>~0.5, a comparison between the X-ray luminosity (LX) and the velocity dispersion (σ) of Cl 1324+3011 implies that this cluster is underluminous in X-rays by a factor of ~3-40 when compared with the LX-σ relation for local and moderate-redshift clusters. We also examine the morphologies of those cluster members that have available high angular resolution imaging with the Hubble Space Telescope (HST). There are 22 spectroscopically confirmed cluster members within the HST field of view. Twelve of these are visually classified as early-type (elliptical or S0) galaxies, implying an early-type fraction of 0.55+0.17-0.14 in this cluster. This fraction is a factor of ~1.5 lower than that observed in nearby rich clusters. Confirming previous cluster studies, the results for cluster Cl 1324+3011, combined with morphological studies of other massive clusters at redshifts of 0<=z<~1, suggest that the galaxy population in massive clusters is strongly evolving with redshift. This evolution implies that early-type galaxies are forming out of the excess of late-type (spiral, irregular, and peculiar) galaxies over the ~7 Gyr timescale.
Improved biovolume estimation of Microcystis aeruginosa colonies: A statistical approach.

PubMed

Alcántara, I; Piccini, C; Segura, A M; Deus, S; González, C; Martínez de la Escalera, G; Kruk, C

2018-05-27

The Microcystis aeruginosa complex (MAC) clusters many of the most common freshwater and brackish bloom-forming cyanobacteria. In monitoring protocols, biovolume estimation is a common approach to determine MAC colonies biomass and useful for prediction purposes. Biovolume (μm 3 mL -1 ) is calculated multiplying organism abundance (orgL -1 ) by colonial volume (μm 3 org -1 ). Colonial volume is estimated based on geometric shapes and requires accurate measurements of dimensions using optical microscopy. A trade-off between easy-to-measure but low-accuracy simple shapes (e.g. sphere) and time costly but high-accuracy complex shapes (e.g. ellipsoid) volume estimation is posed. Overestimations effects in ecological studies and management decisions associated to harmful blooms are significant due to the large sizes of MAC colonies. In this work, we aimed to increase the precision of MAC biovolume estimations by developing a statistical model based on two easy-to-measure dimensions. We analyzed field data from a wide environmental gradient (800 km) spanning freshwater to estuarine and seawater. We measured length, width and depth from ca. 5700 colonies under an inverted microscope and estimated colonial volume using three different recommended geometrical shapes (sphere, prolate spheroid and ellipsoid). Because of the non-spherical shape of MAC the ellipsoid resulted in the most accurate approximation, whereas the sphere overestimated colonial volume (3-80) especially for large colonies (MLD higher than 300 μm). Ellipsoid requires measuring three dimensions and is time-consuming. Therefore, we constructed different statistical models to predict organisms depth based on length and width. Splitting the data into training (2/3) and test (1/3) sets, all models resulted in low training (1.41-1.44%) and testing average error (1.3-2.0%). The models were also evaluated using three other independent datasets. The multiple linear model was finally selected to calculate MAC
Star clusters: age, metallicity and extinction from integrated spectra

NASA Astrophysics Data System (ADS)

González Delgado, Rosa M.; Cid Fernandes, Roberto

2010-01-01

Integrated optical spectra of star clusters in the Magellanic Clouds and a few Galactic globular clusters are fitted using high-resolution spectral models for single stellar populations. The goal is to estimate the age, metallicity and extinction of the clusters, and evaluate the degeneracies among these parameters. Several sets of evolutionary models that were computed with recent high-spectral-resolution stellar libraries (MILES, GRANADA, STELIB), are used as inputs to the starlight code to perform the fits. The comparison of the results derived from this method and previous estimates available in the literature allow us to evaluate the pros and cons of each set of models to determine star cluster properties. In addition, we quantify the uncertainties associated with the age, metallicity and extinction determinations resulting from variance in the ingredients for the analysis.
Joint estimation over multiple individuals improves behavioural state inference from animal movement data.

PubMed

Jonsen, Ian

2016-02-08

State-space models provide a powerful way to scale up inference of movement behaviours from individuals to populations when the inference is made across multiple individuals. Here, I show how a joint estimation approach that assumes individuals share identical movement parameters can lead to improved inference of behavioural states associated with different movement processes. I use simulated movement paths with known behavioural states to compare estimation error between nonhierarchical and joint estimation formulations of an otherwise identical state-space model. Behavioural state estimation error was strongly affected by the degree of similarity between movement patterns characterising the behavioural states, with less error when movements were strongly dissimilar between states. The joint estimation model improved behavioural state estimation relative to the nonhierarchical model for simulated data with heavy-tailed Argos location errors. When applied to Argos telemetry datasets from 10 Weddell seals, the nonhierarchical model estimated highly uncertain behavioural state switching probabilities for most individuals whereas the joint estimation model yielded substantially less uncertainty. The joint estimation model better resolved the behavioural state sequences across all seals. Hierarchical or joint estimation models should be the preferred choice for estimating behavioural states from animal movement data, especially when location data are error-prone.
Data Clustering

NASA Astrophysics Data System (ADS)

Wagstaff, Kiri L.

2012-03-01

On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained
Collisions in Compact Star Clusters.

NASA Astrophysics Data System (ADS)

Portegies Zwart, S. F.

The high stellar densities in young compact star clusters, such as the star cluster R136 in the 30 Doradus region, may lead to a large number of stellar collisions. Such collisions were recently found to be much more frequent than previous estimates. The number of collisions scales with the number of stars for clusters with the same initial relaxation time. These collisions take place in a few million years. The collision products may finally collapse into massive black holes. The fraction of the total mass in the star cluster which ends up in a single massive object scales with the total mass of the cluster and its relaxation time. This mass fraction is rather constant, within a factor two or so. Wild extrapolation from the relatively small masses of the studied systems to the cores of galactic nuclei may indicate that the massive black holes in these systems have formed in a similar way.
A 10-Week Multimodal Nutrition Education Intervention Improves Dietary Intake among University Students: Cluster Randomised Controlled Trial

PubMed Central

Wan Dali, Wan Putri Elena; Lua, Pei Lin

2013-01-01

The aim of the study was to evaluate the effectiveness of implementing multimodal nutrition education intervention (NEI) to improve dietary intake among university students. The design of study used was cluster randomised controlled design at four public universities in East Coast of Malaysia. A total of 417 university students participated in the study. They were randomly selected and assigned into two arms, that is, intervention group (IG) or control group (CG) according to their cluster. The IG received 10-week multimodal intervention using three modes (conventional lecture, brochures, and text messages) while CG did not receive any intervention. Dietary intake was assessed before and after intervention and outcomes reported as nutrient intakes as well as average daily servings of food intake. Analysis of covariance (ANCOVA) and adjusted effect size were used to determine difference in dietary changes between groups and time. Results showed that, compared to CG, participants in IG significantly improved their dietary intake by increasing their energy intake, carbohydrate, calcium, vitamin C and thiamine, fruits and 100% fruit juice, fish, egg, milk, and dairy products while at the same time significantly decreased their processed food intake. In conclusion, multimodal NEI focusing on healthy eating promotion is an effective approach to improve dietary intakes among university students. PMID:24069535
Does Integrating Family Planning into HIV Services Improve Gender Equitable Attitudes? Results from a Cluster Randomized Trial in Nyanza, Kenya.

PubMed

Newmann, Sara J; Rocca, Corinne H; Zakaras, Jennifer M; Onono, Maricianah; Bukusi, Elizabeth A; Grossman, Daniel; Cohen, Craig R

2016-09-01

This study investigated whether integrating family planning (FP) services into HIV care was associated with gender equitable attitudes among HIV-positive adults in western Kenya. Surveys were conducted with 480 women and 480 men obtaining HIV services from 18 clinics 1 year after the sites were randomized to integrated FP/HIV services (N = 12) or standard referral for FP (N = 6). We used multivariable regression, with generalized estimating equations to account for clustering, to assess whether gender attitudes (range 0-12) were associated with integrated care and with contraceptive use. Men at intervention sites had stronger gender equitable attitudes than those at control sites (adjusted mean difference in scores = 0.89, 95 % CI 0.03-1.74). Among women, attitudes did not differ by study arm. Gender equitable attitudes were not associated with contraceptive use among men (AOR = 1.06, 95 % CI 0.93-1.21) or women (AOR = 1.03, 95 % CI 0.94-1.13). Further work is needed to understand how integrating FP into HIV care affects gender relations, and how improved gender equity among men might be leveraged to improve contraceptive use and other reproductive health outcomes.
Counteracting estimation bias and social influence to improve the wisdom of crowds.

PubMed

Kao, Albert B; Berdahl, Andrew M; Hartnett, Andrew T; Lutz, Matthew J; Bak-Coleman, Joseph B; Ioannou, Christos C; Giam, Xingli; Couzin, Iain D

2018-04-01

Aggregating multiple non-expert opinions into a collective estimate can improve accuracy across many contexts. However, two sources of error can diminish collective wisdom: individual estimation biases and information sharing between individuals. Here, we measure individual biases and social influence rules in multiple experiments involving hundreds of individuals performing a classic numerosity estimation task. We first investigate how existing aggregation methods, such as calculating the arithmetic mean or the median, are influenced by these sources of error. We show that the mean tends to overestimate, and the median underestimate, the true value for a wide range of numerosities. Quantifying estimation bias, and mapping individual bias to collective bias, allows us to develop and validate three new aggregation measures that effectively counter sources of collective estimation error. In addition, we present results from a further experiment that quantifies the social influence rules that individuals employ when incorporating personal estimates with social information. We show that the corrected mean is remarkably robust to social influence, retaining high accuracy in the presence or absence of social influence, across numerosities and across different methods for averaging social information. Using knowledge of estimation biases and social influence rules may therefore be an inexpensive and general strategy to improve the wisdom of crowds. © 2018 The Author(s).
A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks.

PubMed

Gui, Jinsong; Zhou, Kai; Xiong, Naixue

2016-09-25

Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude.
Speed Profiles for Improvement of Maritime Emission Estimation

PubMed Central

Yau, Pui Shan; Lee, Shun-Cheng; Ho, Kin Fai

2012-01-01

Abstract Maritime emissions play an important role in anthropogenic emissions, particularly for cities with busy ports such as Hong Kong. Ship emissions are strongly dependent on vessel speed, and thus accurate vessel speed is essential for maritime emission studies. In this study, we determined minute-by-minute high-resolution speed profiles of container ships on four major routes in Hong Kong waters using Automatic Identification System (AIS). The activity-based ship emissions of NOx, CO, HC, CO2, SO2, and PM10 were estimated using derived vessel speed profiles, and results were compared with those using the speed limits of control zones. Estimation using speed limits resulted in up to twofold overestimation of ship emissions. Compared with emissions estimated using the speed limits of control zones, emissions estimated using vessel speed profiles could provide results with up to 88% higher accuracy. Uncertainty analysis and sensitivity analysis of the model demonstrated the significance of improvement of vessel speed resolution. From spatial analysis, it is revealed that SO2 and PM10 emissions during maneuvering within 1 nautical mile from port were the highest. They contributed 7%–22% of SO2 emissions and 8%–17% of PM10 emissions of the entire voyage in Hong Kong. PMID:23236250
Speed Profiles for Improvement of Maritime Emission Estimation.

PubMed

Yau, Pui Shan; Lee, Shun-Cheng; Ho, Kin Fai

2012-12-01

Maritime emissions play an important role in anthropogenic emissions, particularly for cities with busy ports such as Hong Kong. Ship emissions are strongly dependent on vessel speed, and thus accurate vessel speed is essential for maritime emission studies. In this study, we determined minute-by-minute high-resolution speed profiles of container ships on four major routes in Hong Kong waters using Automatic Identification System (AIS). The activity-based ship emissions of NO(x), CO, HC, CO(2), SO(2), and PM(10) were estimated using derived vessel speed profiles, and results were compared with those using the speed limits of control zones. Estimation using speed limits resulted in up to twofold overestimation of ship emissions. Compared with emissions estimated using the speed limits of control zones, emissions estimated using vessel speed profiles could provide results with up to 88% higher accuracy. Uncertainty analysis and sensitivity analysis of the model demonstrated the significance of improvement of vessel speed resolution. From spatial analysis, it is revealed that SO(2) and PM(10) emissions during maneuvering within 1 nautical mile from port were the highest. They contributed 7%-22% of SO(2) emissions and 8%-17% of PM(10) emissions of the entire voyage in Hong Kong.
Improving PERSIANN-CCS rain estimation using probabilistic approach and multi-sensors information

NASA Astrophysics Data System (ADS)

Karbalaee, N.; Hsu, K. L.; Sorooshian, S.; Kirstetter, P.; Hong, Y.

2016-12-01

This presentation discusses the recent implemented approaches to improve the rainfall estimation from Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network-Cloud Classification System (PERSIANN-CCS). PERSIANN-CCS is an infrared (IR) based algorithm being integrated in the IMERG (Integrated Multi-Satellite Retrievals for the Global Precipitation Mission GPM) to create a precipitation product in 0.1x0.1degree resolution over the chosen domain 50N to 50S every 30 minutes. Although PERSIANN-CCS has a high spatial and temporal resolution, it overestimates or underestimates due to some limitations.PERSIANN-CCS can estimate rainfall based on the extracted information from IR channels at three different temperature threshold levels (220, 235, and 253k). This algorithm relies only on infrared data to estimate rainfall indirectly from this channel which cause missing the rainfall from warm clouds and false estimation for no precipitating cold clouds. In this research the effectiveness of using other channels of GOES satellites such as visible and water vapors has been investigated. By using multi-sensors the precipitation can be estimated based on the extracted information from multiple channels. Also, instead of using the exponential function for estimating rainfall from cloud top temperature, the probabilistic method has been used. Using probability distributions of precipitation rates instead of deterministic values has improved the rainfall estimation for different type of clouds.
Integrated-light spectroscopy of globular clusters at the infrared Ca II lines

NASA Technical Reports Server (NTRS)

Armandroff, Taft E.; Zinn, Robert

1988-01-01

Integrated-light spectroscopy has been obtained for 27 globular clusters at the Ca II IR triplet. Line strengths and radial velocities have been measured from the spectra. For the well-studied clusters in the sample, the strength of the Ca II lines is very well correlated with previous metallicity estimates. Thus, the triplet is useful as a metallicity indicator in globular cluster integrated-light spectra. The greatly reduced effect of interstellar extinction at these wavelengths (compared to the blue region of the spectrum) has permitted observations of some of the most heavily reddened clusters in the Galaxy. For several such clusters, the Ca II triplet metallicities are in poor agreement with metallicity estimates from IR photometry by Malkan (1981). The strength of an interstellar band at 8621A has been used to estimate the amount of extinction towards these clusters. Using the new metallicity and radial-velocity data, the metallicity distribution, kinematics, and spatial distribution of the disk globular cluster system have been analyzed. Results very similar to those of Zinn (1985) have been found. The relation of the disk globulars to the stellar thick disk is discussed.
Cluster formation in Hessdalen lights

NASA Astrophysics Data System (ADS)

Paiva, G. S.; Taft, C. A.

2012-05-01

In this paper we show a mechanism of light ball cluster formation in Hessdalen lights (HL) by the nonlinear interaction of ion-acoustic and dusty-acoustic waves with low frequency geoelectromagnetic waves in dusty plasmas. Our theoretical model shows that the velocity of ejected light balls by HL cluster is of about 104 m s-1 in a good agreement with the observed velocity of some ejected light balls, which is estimated as 2×104 m s-1.
Estimation of Carcinogenicity using Hierarchical Clustering and Nearest Neighbor Methodologies

EPA Science Inventory

Previously a hierarchical clustering (HC) approach and a nearest neighbor (NN) approach were developed to model acute aquatic toxicity end points. These approaches were developed to correlate the toxicity for large, noncongeneric data sets. In this study these approaches applie...

Using hierarchical cluster models to systematically identify groups of jobs with similar occupational questionnaire response patterns to assist rule-based expert exposure assessment in population-based studies.

PubMed

Friesen, Melissa C; Shortreed, Susan M; Wheeler, David C; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S; Baris, Dalsu; Karagas, Margaret R; Schwenn, Molly; Johnson, Alison; Armenti, Karla R; Silverman, Debra T; Yu, Kai

2015-05-01

Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared
Radial velocities of stars in the globular cluster M4 and the cluster distance

NASA Technical Reports Server (NTRS)

Peterson, R. C.; Rees, Richard F.; Cudworth, Kyle M.

1995-01-01

The internal stellar velocity distribution of the globular cluster M4 is evaluated from nearly 200 new radial velocity measurements good to 1 km/s and a rederivation of existing proper motions. The mean radial velocity of the cluster is 70.9 +/- 0.6 km/s. The velocity dispersion is 3.5 +/- 0.3 km/s at the core, dropping marginally towards the outskirts. Such a low internal dispersion is somewhat at odds with the cluster's orbit, for which the perigalacticon is sufficiently close to the galactic center that the probability of cluster disruption is high; a tidal radius two-thirds the currently accepted value would eliminate the discrepancy. The cluster mass-to-light ratio is also small, M/L(sub V) = 1.0 +/- 0.4 in solar units. M4 thus joins M22 as a cluster of moderate and concentration with a mass-to-light ratio among the lowest known. The astrometric distance to the cluster is also smaller than expected, 1.72 +/- 0.14 kpc. This is only consistent with conventional estimates of the luminosity of horizontal branch stars provided an extinction law R = A(sub V)/E(B-V) approximately 4 is adopted, as has been suggested recently by several authors.
Heterogeneous Clustering: Operational and User Impacts

NASA Technical Reports Server (NTRS)

Salm, Saita Wood

1999-01-01

Heterogeneous clustering can improve overall utilization of multiple hosts and can provide better turnaround to users by balancing workloads across hosts. Building a cluster requires both operational changes and revisions in user scripts.
Advanced analysis of forest fire clustering

NASA Astrophysics Data System (ADS)

Kanevski, Mikhail; Pereira, Mario; Golay, Jean

2017-04-01

Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index
Cosmology with the largest galaxy cluster surveys: going beyond Fisher matrix forecasts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khedekar, Satej; Majumdar, Subhabrata, E-mail: satej@mpa-garching.mpg.de, E-mail: subha@tifr.res.in

2013-02-01

We make the first detailed MCMC likelihood study of cosmological constraints that are expected from some of the largest, ongoing and proposed, cluster surveys in different wave-bands and compare the estimates to the prevalent Fisher matrix forecasts. Mock catalogs of cluster counts expected from the surveys — eROSITA, WFXT, RCS2, DES and Planck, along with a mock dataset of follow-up mass calibrations are analyzed for this purpose. A fair agreement between MCMC and Fisher results is found only in the case of minimal models. However, for many cases, the marginalized constraints obtained from Fisher and MCMC methods can differ bymore » factors of 30-100%. The discrepancy can be alarmingly large for a time dependent dark energy equation of state, w(a); the Fisher methods are seen to under-estimate the constraints by as much as a factor of 4-5. Typically, Fisher estimates become more and more inappropriate as we move away from ΛCDM, to a constant-w dark energy to varying-w dark energy cosmologies. Fisher analysis, also, predicts incorrect parameter degeneracies. There are noticeable offsets in the likelihood contours obtained from Fisher methods that is caused due to an asymmetry in the posterior likelihood distribution as seen through a MCMC analysis. From the point of mass-calibration uncertainties, a high value of unknown scatter about the mean mass-observable relation, and its redshift dependence, is seen to have large degeneracies with the cosmological parameters σ{sub 8} and w(a) and can degrade the cosmological constraints considerably. We find that the addition of mass-calibrated cluster datasets can improve dark energy and σ{sub 8} constraints by factors of 2-3 from what can be obtained from CMB+SNe+BAO only . Finally, we show that a joint analysis of datasets of two (or more) different cluster surveys would significantly tighten cosmological constraints from using clusters only. Since, details of future cluster surveys are still being planned, we
Lensing convergence in galaxy clustering in ΛCDM and beyond

NASA Astrophysics Data System (ADS)

Villa, Eleonora; Di Dio, Enea; Lepori, Francesca

2018-04-01

We study the impact of neglecting lensing magnification in galaxy clustering analyses for future galaxy surveys, considering the ΛCDM model and two extensions: massive neutrinos and modifications of General Relativity. Our study focuses on the biases on the constraints and on the estimation of the cosmological parameters. We perform a comprehensive investigation of these two effects for the upcoming photometric and spectroscopic galaxy surveys Euclid and SKA for different redshift binning configurations. We also provide a fitting formula for the magnification bias of SKA. Our results show that the information present in the lensing contribution does improve the constraints on the modified gravity parameters whereas the lensing constraining power is negligible for the ΛCDM parameters. For photometric surveys the estimation is biased for all the parameters if lensing is not taken into account. This effect is particularly significant for the modified gravity parameters. Conversely for spectroscopic surveys the bias is below one sigma for all the parameters. Our findings show the importance of including lensing in galaxy clustering analyses for testing General Relativity and to constrain the parameters which describe its modifications.
Galaxy cluster lensing masses in modified lensing potentials

DOE PAGES

Barreira, Alexandre; Li, Baojiu; Jennings, Elise; ...

2015-10-28

In this study, we determine the concentration–mass relation of 19 X-ray selected galaxy clusters from the Cluster Lensing and Supernova Survey with Hubble survey in theories of gravity that directly modify the lensing potential. We model the clusters as Navarro–Frenk–White haloes and fit their lensing signal, in the Cubic Galileon and Nonlocal gravity models, to the lensing convergence profiles of the clusters. We discuss a number of important issues that need to be taken into account, associated with the use of non-parametric and parametric lensing methods, as well as assumptions about the background cosmology. Our results show that the concentrationmore » and mass estimates in the modified gravity models are, within the error bars, the same as in Λ cold dark matter. This result demonstrates that, for the Nonlocal model, the modifications to gravity are too weak at the cluster redshifts, and for the Galileon model, the screening mechanism is very efficient inside the cluster radius. However, at distances ~ [2–20] Mpc/h from the cluster centre, we find that the surrounding force profiles are enhanced by ~ 20–40% in the Cubic Galileon model. This has an impact on dynamical mass estimates, which means that tests of gravity based on comparisons between lensing and dynamical masses can also be applied to the Cubic Galileon model.« less
Cluster redshifts in five suspected superclusters

NASA Technical Reports Server (NTRS)

Ciardullo, R.; Ford, H.; Harms, R.

1985-01-01

Redshift surveys for rich superclusters were carried out in five regions of the sky containing surface-density enhancements of Abell clusters. While several superclusters are identified, projection effects dominate each field, and no system contains more than five rich clusters. Two systems are found to be especially interesting. The first, field 0136 10, is shown to contain a superposition of at least four distinct superclusters, with the richest system possessing a small velocity dispersion. The second system, 2206 - 22, though a region of exceedingly high Abell cluster surface density, appears to be a remarkable superposition of 23 rich clusters almost uniformly distributed in redshift space between 0.08 and 0.24. The new redshifts significantly increase the three-dimensional information available for the distance class 5 and 6 Abell clusters and allow the spatial correlation function around rich superclusters to be estimated.
FAR-FLUNG GALAXY CLUSTERS MAY REVEAL FATE OF UNIVERSE

NASA Technical Reports Server (NTRS)

2002-01-01

A selection of NASA Hubble Space Telescope snapshots of huge galaxy clusters that lie far away and far back in time. These are selected from a catalog of 92 new clusters uncovered during a six-year Hubble observing program known as the Medium Deep Survey. If the distances and masses of the clusters are confirmed by ground based telescopes, the survey may hold clues to how galaxies quickly formed into massive large-scale structures after the big bang, and what that may mean for the eventual fate of the expanding universe. The images are each a combination of two exposures in yellow and deep red taken with Hubble's Wide Field and Planetary Camera 2. Each cluster's distance is inferred from the reddening of the starlight, which is due to the expansion of space. Astronomers assume these clusters all formed early in the history of the universe. HST133617-00529 (left) This collection of spiral and elliptical galaxies lies an estimated 4 to 6 billion light-years away. It is in the constellation of Virgo not far from the 3rd magnitude star Zeta Virginis. The brighter galaxies in this cluster have red magnitudes between 20 and 22 near the limit of the Palomar Sky Survey. The bright blue galaxy (upper left) is probably a foreground galaxy, and not a cluster member. The larger of the galaxies in the cluster are probably about the size of our Milky Way Galaxy. The diagonal line at lower right is an artificial satellite trail. HST002013+28366 (upper right) This cluster of galaxies lies in the constellation of Andromeda a few degrees from the star Alpheratz in the northeast corner of the constellation Pegasus. It is at an estimated distance of 4 billion light-years, which means the light we are seeing from the cluster is as it appeared when the universe was roughly 2/3 of its present age. HST035528+09435 (lower right) At an estimated distance of about 7 to 10 billion light-years (z=1), this is one of the farthest clusters in the Hubble sample. The cluster lies in the
Using Targeted Active-Learning Exercises and Diagnostic Question Clusters to Improve Students' Understanding of Carbon Cycling in Ecosystems

ERIC Educational Resources Information Center

Maskiewicz, April Cordero; Griscom, Heather Peckham; Welch, Nicole Turrill

2012-01-01

In this study, we used targeted active-learning activities to help students improve their ways of reasoning about carbon flow in ecosystems. The results of a validated ecology conceptual inventory (diagnostic question clusters [DQCs]) provided us with information about students' understanding of and reasoning about transformation of inorganic and…
Combining cluster number counts and galaxy clustering

NASA Astrophysics Data System (ADS)

Lacasa, Fabien; Rosenfeld, Rogerio

2016-08-01

The abundance of clusters and the clustering of galaxies are two of the important cosmological probes for current and future large scale surveys of galaxies, such as the Dark Energy Survey. In order to combine them one has to account for the fact that they are not independent quantities, since they probe the same density field. It is important to develop a good understanding of their correlation in order to extract parameter constraints. We present a detailed modelling of the joint covariance matrix between cluster number counts and the galaxy angular power spectrum. We employ the framework of the halo model complemented by a Halo Occupation Distribution model (HOD). We demonstrate the importance of accounting for non-Gaussianity to produce accurate covariance predictions. Indeed, we show that the non-Gaussian covariance becomes dominant at small scales, low redshifts or high cluster masses. We discuss in particular the case of the super-sample covariance (SSC), including the effects of galaxy shot-noise, halo second order bias and non-local bias. We demonstrate that the SSC obeys mathematical inequalities and positivity. Using the joint covariance matrix and a Fisher matrix methodology, we examine the prospects of combining these two probes to constrain cosmological and HOD parameters. We find that the combination indeed results in noticeably better constraints, with improvements of order 20% on cosmological parameters compared to the best single probe, and even greater improvement on HOD parameters, with reduction of error bars by a factor 1.4-4.8. This happens in particular because the cross-covariance introduces a synergy between the probes on small scales. We conclude that accounting for non-Gaussian effects is required for the joint analysis of these observables in galaxy surveys.
Gaussian mixture clustering and imputation of microarray data.

PubMed

Ouyang, Ming; Welsh, William J; Georgopoulos, Panos

2004-04-12

In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
Weak lensing magnification of SpARCS galaxy clusters

NASA Astrophysics Data System (ADS)

Tudorica, A.; Hildebrandt, H.; Tewes, M.; Hoekstra, H.; Morrison, C. B.; Muzzin, A.; Wilson, G.; Yee, H. K. C.; Lidman, C.; Hicks, A.; Nantais, J.; Erben, T.; van der Burg, R. F. J.; Demarco, R.

2017-12-01

Context. Measuring and calibrating relations between cluster observables is critical for resource-limited studies. The mass-richness relation of clusters offers an observationally inexpensive way of estimating masses. Its calibration is essential for cluster and cosmological studies, especially for high-redshift clusters. Weak gravitational lensing magnification is a promising and complementary method to shear studies, that can be applied at higher redshifts. Aims: We aim to employ the weak lensing magnification method to calibrate the mass-richness relation up to a redshift of 1.4. We used the Spitzer Adaptation of the Red-Sequence Cluster Survey (SpARCS) galaxy cluster candidates (0.2 < z < 1.4) and optical data from the Canada France Hawaii Telescope (CFHT) to test whether magnification can be effectively used to constrain the mass of high-redshift clusters. Methods: Lyman-break galaxies (LBGs) selected using the u-band dropout technique and their colours were used as a background sample of sources. LBG positions were cross-correlated with the centres of the sample of SpARCS clusters to estimate the magnification signal, which was optimally-weighted using an externally-calibrated LBG luminosity function. The signal was measured for cluster sub-samples, binned in both redshift and richness. Results: We measured the cross-correlation between the positions of galaxy cluster candidates and LBGs and detected a weak lensing magnification signal for all bins at a detection significance of 2.6-5.5σ. In particular, the significance of the measurement for clusters with z> 1.0 is 4.1σ; for the entire cluster sample we obtained an average M200 of 1.28 -0.21+0.23 × 1014 M⊙. Conclusions: Our measurements demonstrated the feasibility of using weak lensing magnification as a viable tool for determining the average halo masses for samples of high redshift galaxy clusters. The results also established the success of using galaxy over-densities to select massive clusters at z
Refining historical limits method to improve disease cluster detection, New York City, New York, USA.

PubMed

Levin-Rector, Alison; Wilson, Elisha L; Fine, Annie D; Greene, Sharon K

2015-02-01

Since the early 2000s, the Bureau of Communicable Disease of the New York City Department of Health and Mental Hygiene has analyzed reportable infectious disease data weekly by using the historical limits method to detect unusual clusters that could represent outbreaks. This method typically produced too many signals for each to be investigated with available resources while possibly failing to signal during true disease outbreaks. We made method refinements that improved the consistency of case inclusion criteria and accounted for data lags and trends and aberrations in historical data. During a 12-week period in 2013, we prospectively assessed these refinements using actual surveillance data. The refined method yielded 74 signals, a 45% decrease from what the original method would have produced. Fewer and less biased signals included a true citywide increase in legionellosis and a localized campylobacteriosis cluster subsequently linked to live-poultry markets. Future evaluations using simulated data could complement this descriptive assessment.
Estimating Origin-Destination Matrices Using AN Efficient Moth Flame-Based Spatial Clustering Approach

NASA Astrophysics Data System (ADS)

Heidari, A. A.; Moayedi, A.; Abbaspour, R. Ali

2017-09-01

Automated fare collection (AFC) systems are regarded as valuable resources for public transport planners. In this paper, the AFC data are utilized to analysis and extract mobility patterns in a public transportation system. For this purpose, the smart card data are inserted into a proposed metaheuristic-based aggregation model and then converted to O-D matrix between stops, since the size of O-D matrices makes it difficult to reproduce the measured passenger flows precisely. The proposed strategy is applied to a case study from Haaglanden, Netherlands. In this research, moth-flame optimizer (MFO) is utilized and evaluated for the first time as a new metaheuristic algorithm (MA) in estimating transit origin-destination matrices. The MFO is a novel, efficient swarm-based MA inspired from the celestial navigation of moth insects in nature. To investigate the capabilities of the proposed MFO-based approach, it is compared to methods that utilize the K-means algorithm, gray wolf optimization algorithm (GWO) and genetic algorithm (GA). The sum of the intra-cluster distances and computational time of operations are considered as the evaluation criteria to assess the efficacy of the optimizers. The optimality of solutions of different algorithms is measured in detail. The traveler's behavior is analyzed to achieve to a smooth and optimized transport system. The results reveal that the proposed MFO-based aggregation strategy can outperform other evaluated approaches in terms of convergence tendency and optimality of the results. The results show that it can be utilized as an efficient approach to estimating the transit O-D matrices.
Improving chemical species tomography of turbulent flows using covariance estimation.

PubMed

Grauer, Samuel J; Hadwin, Paul J; Daun, Kyle J

2017-05-01

Chemical species tomography (CST) experiments can be divided into limited-data and full-rank cases. Both require solving ill-posed inverse problems, and thus the measurement data must be supplemented with prior information to carry out reconstructions. The Bayesian framework formalizes the role of additive information, expressed as the mean and covariance of a joint-normal prior probability density function. We present techniques for estimating the spatial covariance of a flow under limited-data and full-rank conditions. Our results show that incorporating a covariance estimate into CST reconstruction via a Bayesian prior increases the accuracy of instantaneous estimates. Improvements are especially dramatic in real-time limited-data CST, which is directly applicable to many industrially relevant experiments.
Improving the accuracy in detection of clustered microcalcifications with a context-sensitive classification model.

PubMed

Wang, Juan; Nishikawa, Robert M; Yang, Yongyi

2016-01-01

cases) and a set of 188 full-field digital mammogram (FFDM) images (95 cases). The FROC analysis results show that the proposed unified classification approach can significantly improve the detection accuracy of two MC detectors on both SFM and FFDM images. Despite the difference in performance between the two detectors, the unified classifiers can reduce their FP rate to a similar level in the output of the two detectors. In particular, with true-positive rate at 85%, the FP rate on SFM images for the DoG detector was reduced from 1.16 to 0.33 clusters/image (unified SVM) and 0.36 clusters/image (unified Adaboost), respectively; similarly, for the SVM detector, the FP rate was reduced from 0.45 clusters/image to 0.30 clusters/image (unified SVM) and 0.25 clusters/image (unified Adaboost), respectively. Similar FP reduction results were also achieved on FFDM images for the two MC detectors. The proposed unified classification approach can be effective for discriminating MCs from FPs caused by different factors (such as MC-like noise patterns and linear structures) in MC detection. The framework is general and can be applicable for further improving the detection accuracy of existing MC detectors.
A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks

PubMed Central

Gui, Jinsong; Zhou, Kai; Xiong, Naixue

2016-01-01

Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude. PMID:27681731
Mesenchymal stem cells and their conditioned medium improve integration of purified induced pluripotent stem cell-derived cardiomyocyte clusters into myocardial tissue.

PubMed

Rubach, Martin; Adelmann, Roland; Haustein, Moritz; Drey, Florian; Pfannkuche, Kurt; Xiao, Bing; Koester, Annette; Udink ten Cate, Floris E A; Choi, Yeong-Hoon; Neef, Klaus; Fatima, Azra; Hannes, Tobias; Pillekamp, Frank; Hescheler, Juergen; Šarić, Tomo; Brockmeier, Konrad; Khalil, Markus

2014-03-15

Induced pluripotent stem cell-derived cardiomyocytes (iPS-CMs) might become therapeutically relevant to regenerate myocardial damage. Purified iPS-CMs exhibit poor functional integration into myocardial tissue. The aim of this study was to investigate whether murine mesenchymal stem cells (MSCs) or their conditioned medium (MScond) improves the integration of murine iPS-CMs into myocardial tissue. Vital or nonvital embryonic murine ventricular tissue slices were cocultured with purified clusters of iPS-CMs in combination with murine embryonic fibroblasts (MEFs), MSCs, or MScond. Morphological integration was assessed by visual scoring and functional integration by isometric force and field potential measurements. We observed a moderate morphological integration of iPS-CM clusters into vital, but a poor integration into nonvital, slices. MEFs and MSCs but not MScond improved morphological integration of CMs into nonvital slices and enabled purified iPS-CMs to confer force. Coculture of vital slices with iPS-CMs and MEFs or MSCs resulted in an improved electrical integration. A comparable improvement of electrical coupling was achieved with the cell-free MScond, indicating that soluble factors secreted by MSCs were involved in electrical coupling. We conclude that cells such as MSCs support the engraftment and adhesion of CMs, and confer force to noncontractile tissue. Furthermore, soluble factors secreted by MSCs mediate electrical coupling of purified iPS-CM clusters to myocardial tissue. These data suggest that MSCs may increase the functional engraftment and therapeutic efficacy of transplanted iPS-CMs into infarcted myocardium.
Long-Period Planets in Open Clusters and the Evolution of Planetary Systems

NASA Astrophysics Data System (ADS)

Quinn, Samuel N.; White, Russel; Latham, David W.; Stefanik, Robert

2018-01-01

Recent discoveries of giant planets in open clusters confirm that they do form and migrate in relatively dense stellar groups, though overall occurrence rates are not yet well constrained because the small sample of giant planets discovered thus far predominantly have short periods. Moreover, planet formation rates and the architectures of planetary systems in clusters may vary significantly -- e.g., due to intercluster differences in the chemical properties that regulate the growth of planetary embryos or in the stellar space density and binary populations, which can influence the dynamical evolution of planetary systems. Constraints on the population of long-period Jovian planets -- those representing the reservoir from which many hot Jupiters likely form, and which are most vulnerable to intracluster dynamical interactions -- can help quantify how the birth environment affects formation and evolution, particularly through comparison of populations possessing a range of ages and chemical and dynamical properties. From our ongoing RV survey of open clusters, we present the discovery of several long-period planets and candidate substellar companions in the Praesepe, Coma Berenices, and Hyades open clusters. From these discoveries, we improve estimates of giant planet occurrence rates in clusters, and we note that high eccentricities in several of these systems support the prediction that the birth environment helps shape planetary system architectures.

Improving Estimates Of Phase Parameters When Amplitude Fluctuates

NASA Technical Reports Server (NTRS)

Vilnrotter, V. A.; Brown, D. H.; Hurd, W. J.

1989-01-01

Adaptive inverse filter applied to incoming signal and noise. Time-varying inverse-filtering technique developed to improve digital estimate of phase of received carrier signal. Intended for use where received signal fluctuates in amplitude as well as in phase and signal tracked by digital phase-locked loop that keeps its phase error much smaller than 1 radian. Useful in navigation systems, reception of time- and frequency-standard signals, and possibly spread-spectrum communication systems.
Correlation Functions Quantify Super-Resolution Images and Estimate Apparent Clustering Due to Over-Counting

PubMed Central

Veatch, Sarah L.; Machta, Benjamin B.; Shelby, Sarah A.; Chiang, Ethan N.; Holowka, David A.; Baird, Barbara A.

2012-01-01

We present an analytical method using correlation functions to quantify clustering in super-resolution fluorescence localization images and electron microscopy images of static surfaces in two dimensions. We use this method to quantify how over-counting of labeled molecules contributes to apparent self-clustering and to calculate the effective lateral resolution of an image. This treatment applies to distributions of proteins and lipids in cell membranes, where there is significant interest in using electron microscopy and super-resolution fluorescence localization techniques to probe membrane heterogeneity. When images are quantified using pair auto-correlation functions, the magnitude of apparent clustering arising from over-counting varies inversely with the surface density of labeled molecules and does not depend on the number of times an average molecule is counted. In contrast, we demonstrate that over-counting does not give rise to apparent co-clustering in double label experiments when pair cross-correlation functions are measured. We apply our analytical method to quantify the distribution of the IgE receptor (FcεRI) on the plasma membranes of chemically fixed RBL-2H3 mast cells from images acquired using stochastic optical reconstruction microscopy (STORM/dSTORM) and scanning electron microscopy (SEM). We find that apparent clustering of FcεRI-bound IgE is dominated by over-counting labels on individual complexes when IgE is directly conjugated to organic fluorophores. We verify this observation by measuring pair cross-correlation functions between two distinguishably labeled pools of IgE-FcεRI on the cell surface using both imaging methods. After correcting for over-counting, we observe weak but significant self-clustering of IgE-FcεRI in fluorescence localization measurements, and no residual self-clustering as detected with SEM. We also apply this method to quantify IgE-FcεRI redistribution after deliberate clustering by crosslinking with two
Improving Lidar Turbulence Estimates for Wind Energy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Newman, Jennifer F.; Clifton, Andrew; Churchfield, Matthew J.

2016-10-06

Remote sensing devices (e.g., lidars) are quickly becoming a cost-effective and reliable alternative to meteorological towers for wind energy applications. Although lidars can measure mean wind speeds accurately, these devices measure different values of turbulence intensity (TI) than an instrument on a tower. In response to these issues, a lidar TI error reduction model was recently developed for commercially available lidars. The TI error model first applies physics-based corrections to the lidar measurements, then uses machine-learning techniques to further reduce errors in lidar TI estimates. The model was tested at two sites in the Southern Plains where vertically profiling lidarsmore » were collocated with meteorological towers. This presentation primarily focuses on the physics-based corrections, which include corrections for instrument noise, volume averaging, and variance contamination. As different factors affect TI under different stability conditions, the combination of physical corrections applied in L-TERRA changes depending on the atmospheric stability during each 10-minute time period. This stability-dependent version of L-TERRA performed well at both sites, reducing TI error and bringing lidar TI estimates closer to estimates from instruments on towers. However, there is still scatter evident in the lidar TI estimates, indicating that there are physics that are not being captured in the current version of L-TERRA. Two options are discussed for modeling the remainder of the TI error physics in L-TERRA: machine learning and lidar simulations. Lidar simulations appear to be a better approach, as they can help improve understanding of atmospheric effects on TI error and do not require a large training data set.« less
Evaporation rate of nucleating clusters.

PubMed

Zapadinsky, Evgeni

2011-11-21

The Becker-Döring kinetic scheme is the most frequently used approach to vapor liquid nucleation. In the present study it has been extended so that master equations for all cluster configurations are included into consideration. In the Becker-Döring kinetic scheme the nucleation rate is calculated through comparison of the balanced steady state and unbalanced steady state solutions of the set of kinetic equations. It is usually assumed that the balanced steady state produces equilibrium cluster distribution, and the evaporation rates are identical in the balanced and unbalanced steady state cases. In the present study we have shown that the evaporation rates are not identical in the equilibrium and unbalanced steady state cases. The evaporation rate depends on the number of clusters at the limit of the cluster definition. We have shown that the ratio of the number of n-clusters at the limit of the cluster definition to the total number of n-clusters is different in equilibrium and unbalanced steady state cases. This causes difference in evaporation rates for these cases and results in a correction factor to the nucleation rate. According to rough estimation it is 10(-1) by the order of magnitude and can be lower if carrier gas effectively equilibrates the clusters. The developed approach allows one to refine the correction factor with Monte Carlo and molecular dynamic simulations.
Gas stripping and mixing in galaxy clusters: a numerical comparison study

NASA Astrophysics Data System (ADS)

Heß, Steffen; Springel, Volker

2012-11-01

The ambient hot intrahalo gas in clusters of galaxies is constantly fed and stirred by infalling galaxies, a process that can be studied in detail with cosmological hydrodynamical simulations. However, different numerical methods yield discrepant predictions for crucial hydrodynamical processes, leading for example to different entropy profiles in clusters of galaxies. In particular, the widely used Lagrangian smoothed particle hydrodynamics (SPH) scheme is suspected to strongly damp fluid instabilities and turbulence, which are both crucial to establish the thermodynamic structure of clusters. In this study, we test to which extent our recently developed Voronoi particle hydrodynamics (VPH) scheme yields different results for the stripping of gas out of infalling galaxies and for the bulk gas properties of cluster. We consider both the evolution of isolated galaxy models that are exposed to a stream of intracluster medium or are dropped into cluster models, as well as non-radiative cosmological simulations of cluster formation. We also compare our particle-based method with results obtained with a fundamentally different discretization approach as implemented in the moving-mesh code AREPO. We find that VPH leads to noticeably faster stripping of gas out of galaxies than SPH, in better agreement with the mesh-code than with SPH. We show that despite the fact that VPH in its present form is not as accurate as the moving mesh code in our investigated cases, its improved accuracy of gradient estimates makes VPH an attractive alternative to SPH.
Improving Distribution Resiliency with Microgrids and State and Parameter Estimation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tuffner, Francis K.; Williams, Tess L.; Schneider, Kevin P.

Modern society relies on low-cost reliable electrical power, both to maintain industry, as well as provide basic social services to the populace. When major disturbances occur, such as Hurricane Katrina or Hurricane Sandy, the nation’s electrical infrastructure can experience significant outages. To help prevent the spread of these outages, as well as facilitating faster restoration after an outage, various aspects of improving the resiliency of the power system are needed. Two such approaches are breaking the system into smaller microgrid sections, and to have improved insight into the operations to detect failures or mis-operations before they become critical. Breaking themore » system into smaller sections of microgrid islands, power can be maintained in smaller areas where distribution generation and energy storage resources are still available, but bulk power generation is no longer connected. Additionally, microgrid systems can maintain service to local pockets of customers when there has been extensive damage to the local distribution system. However, microgrids are grid connected a majority of the time and implementing and operating a microgrid is much different than when islanded. This report discusses work conducted by the Pacific Northwest National Laboratory that developed improvements for simulation tools to capture the characteristics of microgrids and how they can be used to develop new operational strategies. These operational strategies reduce the cost of microgrid operation and increase the reliability and resilience of the nation’s electricity infrastructure. In addition to the ability to break the system into microgrids, improved observability into the state of the distribution grid can make the power system more resilient. State estimation on the transmission system already provides great insight into grid operations and detecting abnormal conditions by leveraging existing measurements. These transmission-level approaches are expanded to
An off-axis galaxy cluster merger: Abell 0141

NASA Astrophysics Data System (ADS)

Caglar, Turgay

2018-04-01

We present structural analysis results of Abell 0141 (z = 0.23) based on X-ray data. The X-ray luminosity map demonstrates that Abell 0141 (A0141) is a bimodal galaxy cluster, which is separated on the sky by ˜0.65 Mpc with an elongation along the north-south direction. The optical galaxy density map also demonstrates this bimodality. We estimate sub-cluster ICM temperatures of 5.17^{+0.20}_{-0.19} keV for A0141N and 5.23^{+0.24}_{-0.23} keV for A0141S. We obtain X-ray morphological parameters w = 0.034 ± 0.004, c = 0.113 ± 0.004, and w = 0.039 ± 0.004, c = 0.104 ± 0.005 for A0141N and A0141S, respectively. The resulting X-ray morphological parameters indicate that both sub-clusters are moderately disturbed non-cool core structures. We find a slight brightness jump in the bridge region, and yet, there is still an absence of strong X-ray emitting gas between sub-clusters. We discover a significantly hotspot (˜10 keV) between sub-clusters, and a Mach number M = 1.69^{+0.40}_{-0.37} is obtained by using the temperature jump condition. However, we did not find direct evidence for shock-heating between sub-clusters. We estimate the sub-clusters' central entropies as K0 > 100 keV cm2, which indicates that the sub-clusters are not cool cores. We find some evidence that the system undergoes an off-axis collision; however, the cores of each sub-clusters have not yet been destroyed. Due to the orientation of X-ray tails of sub-clusters, we suggest that the northern sub-cluster moves through the south-west direction, and the southern cluster moves through the north-east direction. In conclusion, we are witnessing an earlier phase of close core passage between sub-clusters.
Improving Multidimensional Wireless Sensor Network Lifetime Using Pearson Correlation and Fractal Clustering.

PubMed

Almeida, Fernando R; Brayner, Angelo; Rodrigues, Joel J P C; Maia, Jose E Bessa

2017-06-07

An efficient strategy for reducing message transmission in a wireless sensor network (WSN) is to group sensors by means of an abstraction denoted cluster. The key idea behind the cluster formation process is to identify a set of sensors whose sensed values present some data correlation. Nowadays, sensors are able to simultaneously sense multiple different physical phenomena, yielding in this way multidimensional data. This paper presents three methods for clustering sensors in WSNs whose sensors collect multidimensional data. The proposed approaches implement the concept of multidimensional behavioral clustering . To show the benefits introduced by the proposed methods, a prototype has been implemented and experiments have been carried out on real data. The results prove that the proposed methods decrease the amount of data flowing in the network and present low root-mean-square error (RMSE).
Searching for galaxy clusters in the Kilo-Degree Survey

NASA Astrophysics Data System (ADS)

Radovich, M.; Puddu, E.; Bellagamba, F.; Roncarelli, M.; Moscardini, L.; Bardelli, S.; Grado, A.; Getman, F.; Maturi, M.; Huang, Z.; Napolitano, N.; McFarland, J.; Valentijn, E.; Bilicki, M.

2017-02-01

Aims: In this paper, we present the tools used to search for galaxy clusters in the Kilo Degree Survey (KiDS), and our first results. Methods: The cluster detection is based on an implementation of the optimal filtering technique that enables us to identify clusters as over-densities in the distribution of galaxies using their positions on the sky, magnitudes, and photometric redshifts. The contamination and completeness of the cluster catalog are derived using mock catalogs based on the data themselves. The optimal signal to noise threshold for the cluster detection is obtained by randomizing the galaxy positions and selecting the value that produces a contamination of less than 20%. Starting from a subset of clusters detected with high significance at low redshifts, we shift them to higher redshifts to estimate the completeness as a function of redshift: the average completeness is 85%. An estimate of the mass of the clusters is derived using the richness as a proxy. Results: We obtained 1858 candidate clusters with redshift 0 cluster catalogs shows that we match more than 50% of the clusters (77% in the case of the redMaPPer catalog). We also cross-matched our cluster catalog with the Abell clusters, and clusters found by XMM and in the Planck-SZ survey; however, only a small number of them lie inside the KiDS area currently available. The catalog is available at http://kids.strw.leidenuniv.nl/DR2 and at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A107
The First Photometric Analysis of the Open Clusters Dolidze 32 and 36

NASA Astrophysics Data System (ADS)

Amin, M. Y.; Elsanhory, W. H.; Haroon, A. A.

2018-06-01

We present a first study of two open clusters Dolidze 32 and Dolidze 36 in the near-infrared region JHKs with the aid of PPMXL catalog. In our study, we used a method able to separate open cluster stars from those that belong to the stellar background. Our results of calculations indicate that for both cluster Dolidze 32 and Dolidze 36 the number of probable member is 286 and 780, respectively. We have estimated the cluster center for Dolidze 32 and Dolidze 36 are α = 18h41m4s.188 , δ = -04°04'57''.144 , α = 20h02m29s.95 , δ = 42°05'49''.2 , respectively. The limiting radius for both clusters Dolidze 32 and Dolidze 36 is about 0.94 ± 0.03 pc and 0.81 ± 0.03 pc, respectively. The Color Magnitude Diagram allows us to estimate the reddening E(B - V) = 1.41 ± 0.03 mag. for Dolidze 32 and E(B - V) = 0.19 ± 0.04 mag. for Dolidze 36 in such a way that the distance modulus (m - M) is 11.36 ± 0.02 and 10.10 ± 0.03 for both clusters, respectively. On the other hand, the luminosity and mass functions of these two open clusters, Dolidze 32 and Dolidze 36, have been estimated, showing that the estimated masses are 437 ± 21 M⊙ and 678 ± 26 M⊙, respectively, while the mass function slopes are -2.56 ± 0.62 and -2.01 ± 0.70 for Dolidze 32 and Dolidze 36, respectively. Finally, the dynamical state of these two clusters shows that only Dolidze 36 can be considered as a dynamically relaxed cluster.
Biased phylodynamic inferences from analysing clusters of viral sequences

PubMed Central

Xiang, Fei; Frost, Simon D. W.

2017-01-01

Abstract Phylogenetic methods are being increasingly used to help understand the transmission dynamics of measurably evolving viruses, including HIV. Clusters of highly similar sequences are often observed, which appear to follow a ‘power law’ behaviour, with a small number of very large clusters. These clusters may help to identify subpopulations in an epidemic, and inform where intervention strategies should be implemented. However, clustering of samples does not necessarily imply the presence of a subpopulation with high transmission rates, as groups of closely related viruses can also occur due to non-epidemiological effects such as over-sampling. It is important to ensure that observed phylogenetic clustering reflects true heterogeneity in the transmitting population, and is not being driven by non-epidemiological effects. We qualify the effect of using a falsely identified ‘transmission cluster’ of sequences to estimate phylodynamic parameters including the effective population size and exponential growth rate under several demographic scenarios. Our simulation studies show that taking the maximum size cluster to re-estimate parameters from trees simulated under a randomly mixing, constant population size coalescent process systematically underestimates the overall effective population size. In addition, the transmission cluster wrongly resembles an exponential or logistic growth model 99% of the time. We also illustrate the consequences of false clusters in exponentially growing coalescent and birth-death trees, where again, the growth rate is skewed upwards. This has clear implications for identifying clusters in large viral databases, where a false cluster could result in wasted intervention resources. PMID:28852573
Multiple populations within globular clusters in Early-type galaxies Exploring their effect on stellar initial mass function estimates

NASA Astrophysics Data System (ADS)

Chantereau, W.; Usher, C.; Bastian, N.

2018-05-01

It is now well-established that most (if not all) ancient globular clusters host multiple populations, that are characterised by distinct chemical features such as helium abundance variations along with N-C and Na-O anti-correlations, at fixed [Fe/H]. These very distinct chemical features are similar to what is found in the centres of the massive early-type galaxies and may influence measurements of the global properties of the galaxies. Additionally, recent results have suggested that M/L variations found in the centres of massive early-type galaxies might be due to a bottom-heavy stellar initial mass function. We present an analysis of the effects of globular cluster-like multiple populations on the integrated properties of early-type galaxies. In particular, we focus on spectral features in the integrated optical spectrum and the global mass-to-light ratio that have been used to infer variations in the stellar initial mass function. To achieve this we develop appropriate stellar population synthesis models and take into account, for the first time, an initial-final mass relation which takes into consideration a varying He abundance. We conclude that while the multiple populations may be present in massive early-type galaxies, they are likely not responsible for the observed variations in the mass-to-light ratio and IMF sensitive line strengths. Finally, we estimate the fraction of stars with multiple populations chemistry that come from disrupted globular clusters within massive ellipticals and find that they may explain some of the observed chemical patterns in the centres of these galaxies.
Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keller, Brad M.; Nathan, Diane L.; Wang Yan

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., 'FOR PROCESSING') andmore » vendor postprocessed (i.e., 'FOR PRESENTATION'), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which
Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation

PubMed Central

Keller, Brad M.; Nathan, Diane L.; Wang, Yan; Zheng, Yuanjie; Gee, James C.; Conant, Emily F.; Kontos, Despina

2012-01-01

Purpose: The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., “FOR PROCESSING”) and vendor postprocessed (i.e., “FOR PRESENTATION”), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. Methods: This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which
Estimation of breast percent density in raw and processed full field digital mammography images via adaptive fuzzy c-means clustering and support vector machine segmentation.

PubMed

Keller, Brad M; Nathan, Diane L; Wang, Yan; Zheng, Yuanjie; Gee, James C; Conant, Emily F; Kontos, Despina

2012-08-01

The amount of fibroglandular tissue content in the breast as estimated mammographically, commonly referred to as breast percent density (PD%), is one of the most significant risk factors for developing breast cancer. Approaches to quantify breast density commonly focus on either semiautomated methods or visual assessment, both of which are highly subjective. Furthermore, most studies published to date investigating computer-aided assessment of breast PD% have been performed using digitized screen-film mammograms, while digital mammography is increasingly replacing screen-film mammography in breast cancer screening protocols. Digital mammography imaging generates two types of images for analysis, raw (i.e., "FOR PROCESSING") and vendor postprocessed (i.e., "FOR PRESENTATION"), of which postprocessed images are commonly used in clinical practice. Development of an algorithm which effectively estimates breast PD% in both raw and postprocessed digital mammography images would be beneficial in terms of direct clinical application and retrospective analysis. This work proposes a new algorithm for fully automated quantification of breast PD% based on adaptive multiclass fuzzy c-means (FCM) clustering and support vector machine (SVM) classification, optimized for the imaging characteristics of both raw and processed digital mammography images as well as for individual patient and image characteristics. Our algorithm first delineates the breast region within the mammogram via an automated thresholding scheme to identify background air followed by a straight line Hough transform to extract the pectoral muscle region. The algorithm then applies adaptive FCM clustering based on an optimal number of clusters derived from image properties of the specific mammogram to subdivide the breast into regions of similar gray-level intensity. Finally, a SVM classifier is trained to identify which clusters within the breast tissue are likely fibroglandular, which are then aggregated into a
Target Information Processing: A Joint Decision and Estimation Approach

DTIC Science & Technology

2012-03-29

ground targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important...targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important
Two serendipitous low-mass LMC clusters discovered with HST1

NASA Astrophysics Data System (ADS)

Santiago, Basilio X.; Elson, Rebecca A. W.; Sigurdsson, Steinn; Gilmore, Gerard F.

1998-04-01

We present V and I photometry of two open clusters in the LMC down to V~26. The clusters were imaged with the Wide Field and Planetary Camera 2 (WFPC2) on board the Hubble Space Telescope (HST), as part of the Medium Deep Survey Key Project. Both are low-luminosity (M_V~-3.5), low-mass (M~10^3 Msolar) systems. The chance discovery of these two clusters in two parallel WFPC2 fields suggests a significant incompleteness in the LMC cluster census near the bar. One of the clusters is roughly elliptical and compact, with a steep light profile, a central surface brightness mu_V(0)~20.2 mag arcsec^-2, a half-light radius r_hl~0.9 pc (total visual major diameter D~3 pc) and an estimated mass M~1500 Msolar. From the colour-magnitude diagram and isochrone fits we estimate its age as tau~(2-5)x10^8 yr. Its mass function has a fitted slope of Gamma=Deltalogphi(M)/DeltalogM=-1.8+/-0.7 in the range probed (0.9<~M/Msolar<~4.5). The other cluster is more irregular and sparse, having shallower density and surface brightness profiles. We obtain Gamma=-1.2+/-0.4, and estimate its mass as M~400 Msolar. A derived upper limit for its age is tau<~5x10^8 yr. Both clusters have mass functions with slopes similar to that of R136, a massive LMC cluster, for which HST results indicate Gamma~-1.2. They also seem to be relaxed in their cores and well contained in their tidal radii.
Stabilizing ultrasmall Au clusters for enhanced photoredox catalysis.

PubMed

Weng, Bo; Lu, Kang-Qiang; Tang, Zichao; Chen, Hao Ming; Xu, Yi-Jun

2018-04-18

Recently, loading ligand-protected gold (Au) clusters as visible light photosensitizers onto various supports for photoredox catalysis has attracted considerable attention. However, the efficient control of long-term photostability of Au clusters on the metal-support interface remains challenging. Herein, we report a simple and efficient method for enhancing the photostability of glutathione-protected Au clusters (Au GSH clusters) loaded on the surface of SiO 2 sphere by utilizing multifunctional branched poly-ethylenimine (BPEI) as a surface charge modifying, reducing and stabilizing agent. The sequential coating of thickness controlled TiO 2 shells can further significantly improve the photocatalytic efficiency, while such structurally designed core-shell SiO 2 -Au GSH clusters-BPEI@TiO 2 composites maintain high photostability during longtime light illumination conditions. This joint strategy via interfacial modification and composition engineering provides a facile guideline for stabilizing ultrasmall Au clusters and rational design of Au clusters-based composites with improved activity toward targeting applications in photoredox catalysis.
Distant Massive Clusters and Cosmology

NASA Technical Reports Server (NTRS)

Donahue, Megan

1999-01-01

We present a status report of our X-ray study and analysis of a complete sample of distant (z=0.5-0.8), X-ray luminous clusters of galaxies. We have obtained ASCA and ROSAT observations of the five brightest Extended Medium Sensitivity (EMSS) clusters with z > 0.5. We have constructed an observed temperature function for these clusters, and measured iron abundances for all of these clusters. We have developed an analytic expression for the behavior of the mass-temperature relation in a low-density universe. We use this mass-temperature relation together with a Press-Schechter-based model to derive the expected temperature function for different values of Omega-M. We combine this analysis with the observed temperature functions at redshifts from 0 - 0.8 to derive maximum likelihood estimates for the value of Omega-M. We report preliminary results of this analysis.
Improving estimates of genetic maps: a meta-analysis-based approach.

PubMed

Stewart, William C L

2007-07-01

Inaccurate genetic (or linkage) maps can reduce the power to detect linkage, increase type I error, and distort haplotype and relationship inference. To improve the accuracy of existing maps, I propose a meta-analysis-based method that combines independent map estimates into a single estimate of the linkage map. The method uses the variance of each independent map estimate to combine them efficiently, whether the map estimates use the same set of markers or not. As compared with a joint analysis of the pooled genotype data, the proposed method is attractive for three reasons: (1) it has comparable efficiency to the maximum likelihood map estimate when the pooled data are homogeneous; (2) relative to existing map estimation methods, it can have increased efficiency when the pooled data are heterogeneous; and (3) it avoids the practical difficulties of pooling human subjects data. On the basis of simulated data modeled after two real data sets, the proposed method can reduce the sampling variation of linkage maps commonly used in whole-genome linkage scans. Furthermore, when the independent map estimates are also maximum likelihood estimates, the proposed method performs as well as or better than when they are estimated by the program CRIMAP. Since variance estimates of maps may not always be available, I demonstrate the feasibility of three different variance estimators. Overall, the method should prove useful to investigators who need map positions for markers not contained in publicly available maps, and to those who wish to minimize the negative effects of inaccurate maps. Copyright 2007 Wiley-Liss, Inc.

Clustering of change patterns using Fourier coefficients.

PubMed

Kim, Jaehee; Kim, Haseong

2008-01-15

To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our
Improving Factor Score Estimation Through the Use of Observed Background Characteristics

PubMed Central

Curran, Patrick J.; Cole, Veronica; Bauer, Daniel J.; Hussong, Andrea M.; Gottfredson, Nisha

2016-01-01

A challenge facing nearly all studies in the psychological sciences is how to best combine multiple items into a valid and reliable score to be used in subsequent modelling. The most ubiquitous method is to compute a mean of items, but more contemporary approaches use various forms of latent score estimation. Regardless of approach, outside of large-scale testing applications, scoring models rarely include background characteristics to improve score quality. The current paper used a Monte Carlo simulation design to study score quality for different psychometric models that did and did not include covariates across levels of sample size, number of items, and degree of measurement invariance. The inclusion of covariates improved score quality for nearly all design factors, and in no case did the covariates degrade score quality relative to not considering the influences at all. Results suggest that the inclusion of observed covariates can improve factor score estimation. PMID:28757790
IPEG- IMPROVED PRICE ESTIMATION GUIDELINES (IBM PC VERSION)

NASA Technical Reports Server (NTRS)

Aster, R. W.

1994-01-01

The Improved Price Estimation Guidelines, IPEG, program provides a simple yet accurate estimate of the price of a manufactured product. IPEG facilitates sensitivity studies of price estimates at considerably less expense than would be incurred by using the Standard Assembly-line Manufacturing Industry Simulation, SAMIS, program (COSMIC program NPO-16032). A difference of less than one percent between the IPEG and SAMIS price estimates has been observed with realistic test cases. However, the IPEG simplification of SAMIS allows the analyst with limited time and computing resources to perform a greater number of sensitivity studies than with SAMIS. Although IPEG was developed for the photovoltaics industry, it is readily adaptable to any standard assembly line type of manufacturing industry. IPEG estimates the annual production price per unit. The input data includes cost of equipment, space, labor, materials, supplies, and utilities. Production on an industry wide basis or a process wide basis can be simulated. Once the IPEG input file is prepared, the original price is estimated and sensitivity studies may be performed. The IPEG user selects a sensitivity variable and a set of values. IPEG will compute a price estimate and a variety of other cost parameters for every specified value of the sensitivity variable. IPEG is designed as an interactive system and prompts the user for all required information and offers a variety of output options. The IPEG/PC program is written in TURBO PASCAL for interactive execution on an IBM PC computer under DOS 2.0 or above with at least 64K of memory. The IBM PC color display and color graphics adapter are needed to use the plotting capabilities in IPEG/PC. IPEG/PC was developed in 1984. The original IPEG program is written in SIMSCRIPT II.5 for interactive execution and has been implemented on an IBM 370 series computer with a central memory requirement of approximately 300K of 8 bit bytes. The original IPEG was developed in 1980.
IPEG- IMPROVED PRICE ESTIMATION GUIDELINES (IBM 370 VERSION)

NASA Technical Reports Server (NTRS)

Chamberlain, R. G.

1994-01-01

The Improved Price Estimation Guidelines, IPEG, program provides a simple yet accurate estimate of the price of a manufactured product. IPEG facilitates sensitivity studies of price estimates at considerably less expense than would be incurred by using the Standard Assembly-line Manufacturing Industry Simulation, SAMIS, program (COSMIC program NPO-16032). A difference of less than one percent between the IPEG and SAMIS price estimates has been observed with realistic test cases. However, the IPEG simplification of SAMIS allows the analyst with limited time and computing resources to perform a greater number of sensitivity studies than with SAMIS. Although IPEG was developed for the photovoltaics industry, it is readily adaptable to any standard assembly line type of manufacturing industry. IPEG estimates the annual production price per unit. The input data includes cost of equipment, space, labor, materials, supplies, and utilities. Production on an industry wide basis or a process wide basis can be simulated. Once the IPEG input file is prepared, the original price is estimated and sensitivity studies may be performed. The IPEG user selects a sensitivity variable and a set of values. IPEG will compute a price estimate and a variety of other cost parameters for every specified value of the sensitivity variable. IPEG is designed as an interactive system and prompts the user for all required information and offers a variety of output options. The IPEG/PC program is written in TURBO PASCAL for interactive execution on an IBM PC computer under DOS 2.0 or above with at least 64K of memory. The IBM PC color display and color graphics adapter are needed to use the plotting capabilities in IPEG/PC. IPEG/PC was developed in 1984. The original IPEG program is written in SIMSCRIPT II.5 for interactive execution and has been implemented on an IBM 370 series computer with a central memory requirement of approximately 300K of 8 bit bytes. The original IPEG was developed in 1980.
Globular Cluster Variable Stars—Atlas and Coordinate Improvement using AAVSOnet Telescopes (Abstract)

NASA Astrophysics Data System (ADS)

Welch, D.; Henden, A.; Bell, T.; Suen, C.; Fare, I.; Sills, A.

2015-12-01

(Abstract only) The variable stars of globular clusters have played and continue to play a significant role in our understanding of certain classes of variable stars. Since all stars associated with a cluster have the same age, metallicity, distance and usually very similar (if not identical reddenings), such variables can produce uniquely powerful constraints on where certain types of pulsation behaviors are excited. Advanced amateur astronomers are increasingly well-positioned to provide long-term CCD monitoring of globular cluster variable star but are hampered by a long history of poor or inaccessible finder charts and coordinates. Many of variable-rich clusters have published photographic finder charts taken in relatively poor seeing with blue-sensitive photographic plates. While useful signal-to-noise ratios are relatively straightforward to achieve for RR Lyrae, Type 2 Cepheids, and red giant variables, correct identification remains a difficult issue—particularly when images are taken at V or longer wavelengths. We describe the project and report its progress using the OC61, TMO61, and SRO telescopes of AAVSOnet after the first year of image acquisition and demonstrate several of the data products being developed for globular cluster variables.
Intra-class correlation estimates for assessment of vitamin A intake in children.

PubMed

Agarwal, Girdhar G; Awasthi, Shally; Walter, Stephen D

2005-03-01

In many community-based surveys, multi-level sampling is inherent in the design. In the design of these studies, especially to calculate the appropriate sample size, investigators need good estimates of intra-class correlation coefficient (ICC), along with the cluster size, to adjust for variation inflation due to clustering at each level. The present study used data on the assessment of clinical vitamin A deficiency and intake of vitamin A-rich food in children in a district in India. For the survey, 16 households were sampled from 200 villages nested within eight randomly-selected blocks of the district. ICCs and components of variances were estimated from a three-level hierarchical random effects analysis of variance model. Estimates of ICCs and variance components were obtained at village and block levels. Between-cluster variation was evident at each level of clustering. In these estimates, ICCs were inversely related to cluster size, but the design effect could be substantial for large clusters. At the block level, most ICC estimates were below 0.07. At the village level, many ICC estimates ranged from 0.014 to 0.45. These estimates may provide useful information for the design of epidemiological studies in which the sampled (or allocated) units range in size from households to large administrative zones.
Protein family clustering for structural genomics.

PubMed

Yan, Yongpan; Moult, John

2005-10-28

A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
Improving Multidimensional Wireless Sensor Network Lifetime Using Pearson Correlation and Fractal Clustering

PubMed Central

Almeida, Fernando R.; Brayner, Angelo; Rodrigues, Joel J. P. C.; Maia, Jose E. Bessa

2017-01-01

An efficient strategy for reducing message transmission in a wireless sensor network (WSN) is to group sensors by means of an abstraction denoted cluster. The key idea behind the cluster formation process is to identify a set of sensors whose sensed values present some data correlation. Nowadays, sensors are able to simultaneously sense multiple different physical phenomena, yielding in this way multidimensional data. This paper presents three methods for clustering sensors in WSNs whose sensors collect multidimensional data. The proposed approaches implement the concept of multidimensional behavioral clustering. To show the benefits introduced by the proposed methods, a prototype has been implemented and experiments have been carried out on real data. The results prove that the proposed methods decrease the amount of data flowing in the network and present low root-mean-square error (RMSE). PMID:28590450
The potential for improving remote primary productivity estimates through subsurface chlorophyll and irradiance measurement

NASA Astrophysics Data System (ADS)

Jacox, Michael G.; Edwards, Christopher A.; Kahru, Mati; Rudnick, Daniel L.; Kudela, Raphael M.

2015-02-01

A 26-year record of depth integrated primary productivity (PP) in the Southern California Current System (SCCS) is analyzed with the goal of improving satellite net primary productivity (PP) estimates. Modest improvements in PP model performance are achieved by tuning existing algorithms for the SCCS, particularly by parameterizing carbon fixation rate in the vertically generalized production model as a function of surface chlorophyll concentration and distance from shore. Much larger improvements are enabled by improving the accuracy of subsurface chlorophyll and light profiles. In a simple vertically resolved production model for the SCCS (VRPM-SC), substitution of in situ surface data for remote sensing estimates offers only marginal improvements in model r2 (from 0.54 to 0.56) and total log10 root mean squared difference (from 0.22 to 0.21), while inclusion of in situ chlorophyll and light profiles improves these metrics to 0.77 and 0.15, respectively. Autonomous underwater gliders, capable of measuring subsurface properties on long-term, long-range deployments, significantly improve PP model fidelity in the SCCS. We suggest their use (and that of other autonomous profilers such as Argo floats) in conjunction with satellites as a way forward for large-scale improvements in PP estimation.
Hubble Frontier Fields: systematic errors in strong lensing models of galaxy clusters - implications for cosmography

NASA Astrophysics Data System (ADS)

Acebron, Ana; Jullo, Eric; Limousin, Marceau; Tilquin, André; Giocoli, Carlo; Jauzac, Mathilde; Mahler, Guillaume; Richard, Johan

2017-09-01

Strong gravitational lensing by galaxy clusters is a fundamental tool to study dark matter and constrain the geometry of the Universe. Recently, the Hubble Space Telescope Frontier Fields programme has allowed a significant improvement of mass and magnification measurements but lensing models still have a residual root mean square between 0.2 arcsec and few arcseconds, not yet completely understood. Systematic errors have to be better understood and treated in order to use strong lensing clusters as reliable cosmological probes. We have analysed two simulated Hubble-Frontier-Fields-like clusters from the Hubble Frontier Fields Comparison Challenge, Ares and Hera. We use several estimators (relative bias on magnification, density profiles, ellipticity and orientation) to quantify the goodness of our reconstructions by comparing our multiple models, optimized with the parametric software lenstool, with the input models. We have quantified the impact of systematic errors arising, first, from the choice of different density profiles and configurations and, secondly, from the availability of constraints (spectroscopic or photometric redshifts, redshift ranges of the background sources) in the parametric modelling of strong lensing galaxy clusters and therefore on the retrieval of cosmological parameters. We find that substructures in the outskirts have a significant impact on the position of the multiple images, yielding tighter cosmological contours. The need for wide-field imaging around massive clusters is thus reinforced. We show that competitive cosmological constraints can be obtained also with complex multimodal clusters and that photometric redshifts improve the constraints on cosmological parameters when considering a narrow range of (spectroscopic) redshifts for the sources.
Clustering and Filtering Tandem Mass Spectra Acquired in Data-Independent Mode

NASA Astrophysics Data System (ADS)

Pak, Huisong; Nikitin, Frederic; Gluck, Florent; Lisacek, Frederique; Scherl, Alexander; Muller, Markus

2013-12-01

Data-independent mass spectrometry activates all ion species isolated within a given mass-to-charge window ( m/z) regardless of their abundance. This acquisition strategy overcomes the traditional data-dependent ion selection boosting data reproducibility and sensitivity. However, several tandem mass (MS/MS) spectra of the same precursor ion are acquired during chromatographic elution resulting in large data redundancy. Also, the significant number of chimeric spectra and the absence of accurate precursor ion masses hamper peptide identification. Here, we describe an algorithm to preprocess data-independent MS/MS spectra by filtering out noise peaks and clustering the spectra according to both the chromatographic elution profiles and the spectral similarity. In addition, we developed an approach to estimate the m/z value of precursor ions from clustered MS/MS spectra in order to improve database search performance. Data acquired using a small 3 m/z units precursor mass window and multiple injections to cover a m/z range of 400-1400 was processed with our algorithm. It showed an improvement in the number of both peptide and protein identifications by 8 % while reducing the number of submitted spectra by 18 % and the number of peaks by 55 %. We conclude that our clustering method is a valid approach for data analysis of these data-independent fragmentation spectra. The software including the source code is available for the scientific community.
MASSCLEANCOLORS-MASS-DEPENDENT INTEGRATED COLORS FOR STELLAR CLUSTERS DERIVED FROM 30 MILLION MONTE CARLO SIMULATIONS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Popescu, Bogdan; Hanson, M. M.

2010-04-10

We present Monte Carlo models of open stellar clusters with the purpose of mapping out the behavior of integrated colors with mass and age. Our cluster simulation package allows for stochastic variations in the stellar mass function to evaluate variations in integrated cluster properties. We find that UBVK colors from our simulations are consistent with simple stellar population (SSP) models, provided the cluster mass is large, M {sub cluster} {>=} 10{sup 6} M {sub sun}. Below this mass, our simulations show two significant effects. First, the mean value of the distribution of integrated colors moves away from the SSP predictionsmore » and is less red, in the first 10{sup 7} to 10{sup 8} years in UBV colors, and for all ages in (V - K). Second, the 1{sigma} dispersion of observed colors increases significantly with lower cluster mass. We attribute the former to the reduced number of red luminous stars in most of the lower mass clusters and the latter to the increased stochastic effect of a few of these stars on lower mass clusters. This latter point was always assumed to occur, but we now provide the first public code able to quantify this effect. We are completing a more extensive database of magnitudes and colors as a function of stellar cluster age and mass that will allow the determination of the correlation coefficients among different bands, and improve estimates of cluster age and mass from integrated photometry.« less
Intra-cluster Globular Clusters in a Simulated Galaxy Cluster

NASA Astrophysics Data System (ADS)

Ramos-Almendares, Felipe; Abadi, Mario; Muriel, Hernán; Coenda, Valeria

2018-01-01

Using a cosmological dark matter simulation of a galaxy-cluster halo, we follow the temporal evolution of its globular cluster population. To mimic the red and blue globular cluster populations, we select at high redshift (z∼ 1) two sets of particles from individual galactic halos constrained by the fact that, at redshift z = 0, they have density profiles similar to observed ones. At redshift z = 0, approximately 60% of our selected globular clusters were removed from their original halos building up the intra-cluster globular cluster population, while the remaining 40% are still gravitationally bound to their original galactic halos. As the blue population is more extended than the red one, the intra-cluster globular cluster population is dominated by blue globular clusters, with a relative fraction that grows from 60% at redshift z = 0 up to 83% for redshift z∼ 2. In agreement with observational results for the Virgo galaxy cluster, the blue intra-cluster globular cluster population is more spatially extended than the red one, pointing to a tidally disrupted origin.
Diffuse radio emission in the complex merging galaxy cluster Abell2069

NASA Astrophysics Data System (ADS)

Drabent, A.; Hoeft, M.; Pizzo, R. F.; Bonafede, A.; van Weeren, R. J.; Klein, U.

2015-03-01

Context. Galaxy clusters with signs of a recent merger in many cases show extended diffuse radio features. This emission originates from relativistic electrons that suffer synchrotron losses due to the intracluster magnetic field. The mechanisms of particle acceleration and the properties of the magnetic field are still poorly understood. Aims: We search for diffuse radio emission in galaxy clusters. Here, we study the complex galaxy cluster Abell 2069, for which X-ray observations indicate a recent merger. Methods: We investigate the cluster's radio continuum emission by deep Westerbork Synthesis Radio Telescope (WSRT) observations at 346 MHz and Giant Metrewave Radio Telescope (GMRT) observations at 322 MHz. Results: We find an extended diffuse radio feature roughly coinciding with the main component of the cluster. We classify this emission as a radio halo and estimate its lower limit flux density at 25 ± 9 mJy. Moreover, we find a second extended diffuse source located at the cluster's companion and estimate its flux density at 15 ± 2 mJy. We speculate that this is a small halo or a mini-halo. If true, this cluster is the first example of a double-halo in a single galaxy cluster.
High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.

For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less
A revised moving cluster distance to the Pleiades open cluster

NASA Astrophysics Data System (ADS)

Galli, P. A. B.; Moraux, E.; Bouy, H.; Bouvier, J.; Olivares, J.; Teixeira, R.

2017-02-01

Context. The distance to the Pleiades open cluster has been extensively debated in the literature over several decades. Although different methods point to a discrepancy in the trigonometric parallaxes produced by the Hipparcos mission, the number of individual stars with known distances is still small compared to the number of cluster members to help solve this problem. Aims: We provide a new distance estimate for the Pleiades based on the moving cluster method, which will be useful to further discuss the so-called Pleiades distance controversy and compare it with the very precise parallaxes from the Gaia space mission. Methods: We apply a refurbished implementation of the convergent point search method to an updated census of Pleiades stars to calculate the convergent point position of the cluster from stellar proper motions. Then, we derive individual parallaxes for 64 cluster members using radial velocities compiled from the literature, and approximate parallaxes for another 1146 stars based on the spatial velocity of the cluster. This represents the largest sample of Pleiades stars with individual distances to date. Results: The parallaxes derived in this work are in good agreement with previous results obtained in different studies (excluding Hipparcos) for individual stars in the cluster. We report a mean parallax of 7.44 ± 0.08 mas and distance of pc that is consistent with the weighted mean of 135.0 ± 0.6 pc obtained from the non-Hipparcos results in the literature. Conclusions: Our result for the distance to the Pleiades open cluster is not consistent with the Hipparcos catalog, but favors the recent and more precise distance determination of 136.2 ± 1.2 pc obtained from Very Long Baseline Interferometry observations. It is also in good agreement with the mean distance of 133 ± 5 pc obtained from the first trigonometric parallaxes delivered by the Gaia satellite for the brightest cluster members in common with our sample. Full Table B.2 is only
Improved Range Estimation Model for Three-Dimensional (3D) Range Gated Reconstruction

PubMed Central

Chua, Sing Yee; Guo, Ningqun; Tan, Ching Seong; Wang, Xin

2017-01-01

Accuracy is an important measure of system performance and remains a challenge in 3D range gated reconstruction despite the advancement in laser and sensor technology. The weighted average model that is commonly used for range estimation is heavily influenced by the intensity variation due to various factors. Accuracy improvement in term of range estimation is therefore important to fully optimise the system performance. In this paper, a 3D range gated reconstruction model is derived based on the operating principles of range gated imaging and time slicing reconstruction, fundamental of radiant energy, Laser Detection And Ranging (LADAR), and Bidirectional Reflection Distribution Function (BRDF). Accordingly, a new range estimation model is proposed to alleviate the effects induced by distance, target reflection, and range distortion. From the experimental results, the proposed model outperforms the conventional weighted average model to improve the range estimation for better 3D reconstruction. The outcome demonstrated is of interest to various laser ranging applications and can be a reference for future works. PMID:28872589
Impact of Sampling Density on the Extent of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2014-01-01

Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Robust nonparametric quantification of clustering density of molecules in single-molecule localization microscopy

PubMed Central

Jiang, Shenghang; Park, Seongjin; Challapalli, Sai Divya; Fei, Jingyi; Wang, Yong

2017-01-01

We report a robust nonparametric descriptor, J′(r), for quantifying the density of clustering molecules in single-molecule localization microscopy. J′(r), based on nearest neighbor distribution functions, does not require any parameter as an input for analyzing point patterns. We show that J′(r) displays a valley shape in the presence of clusters of molecules, and the characteristics of the valley reliably report the clustering features in the data. Most importantly, the position of the J′(r) valley (rJm′) depends exclusively on the density of clustering molecules (ρc). Therefore, it is ideal for direct estimation of the clustering density of molecules in single-molecule localization microscopy. As an example, this descriptor was applied to estimate the clustering density of ptsG mRNA in E. coli bacteria. PMID:28636661
Sample size determination for GEE analyses of stepped wedge cluster randomized trials.

PubMed

Li, Fan; Turner, Elizabeth L; Preisser, John S

2018-06-19

In stepped wedge cluster randomized trials, intact clusters of individuals switch from control to intervention from a randomly assigned period onwards. Such trials are becoming increasingly popular in health services research. When a closed cohort is recruited from each cluster for longitudinal follow-up, proper sample size calculation should account for three distinct types of intraclass correlations: the within-period, the inter-period, and the within-individual correlations. Setting the latter two correlation parameters to be equal accommodates cross-sectional designs. We propose sample size procedures for continuous and binary responses within the framework of generalized estimating equations that employ a block exchangeable within-cluster correlation structure defined from the distinct correlation types. For continuous responses, we show that the intraclass correlations affect power only through two eigenvalues of the correlation matrix. We demonstrate that analytical power agrees well with simulated power for as few as eight clusters, when data are analyzed using bias-corrected estimating equations for the correlation parameters concurrently with a bias-corrected sandwich variance estimator. © 2018, The International Biometric Society.

DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

PubMed

Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu

2018-04-17

Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and
Mesenchymal Stem Cells and Their Conditioned Medium Improve Integration of Purified Induced Pluripotent Stem Cell–Derived Cardiomyocyte Clusters into Myocardial Tissue

PubMed Central

Rubach, Martin; Adelmann, Roland; Haustein, Moritz; Drey, Florian; Pfannkuche, Kurt; Xiao, Bing; Koester, Annette; Udink ten Cate, Floris E.A.; Choi, Yeong-Hoon; Neef, Klaus; Fatima, Azra; Hannes, Tobias; Pillekamp, Frank; Hescheler, Juergen; Šarić, Tomo; Brockmeier, Konrad

2014-01-01

Induced pluripotent stem cell–derived cardiomyocytes (iPS-CMs) might become therapeutically relevant to regenerate myocardial damage. Purified iPS-CMs exhibit poor functional integration into myocardial tissue. The aim of this study was to investigate whether murine mesenchymal stem cells (MSCs) or their conditioned medium (MScond) improves the integration of murine iPS-CMs into myocardial tissue. Vital or nonvital embryonic murine ventricular tissue slices were cocultured with purified clusters of iPS-CMs in combination with murine embryonic fibroblasts (MEFs), MSCs, or MScond. Morphological integration was assessed by visual scoring and functional integration by isometric force and field potential measurements. We observed a moderate morphological integration of iPS-CM clusters into vital, but a poor integration into nonvital, slices. MEFs and MSCs but not MScond improved morphological integration of CMs into nonvital slices and enabled purified iPS-CMs to confer force. Coculture of vital slices with iPS-CMs and MEFs or MSCs resulted in an improved electrical integration. A comparable improvement of electrical coupling was achieved with the cell-free MScond, indicating that soluble factors secreted by MSCs were involved in electrical coupling. We conclude that cells such as MSCs support the engraftment and adhesion of CMs, and confer force to noncontractile tissue. Furthermore, soluble factors secreted by MSCs mediate electrical coupling of purified iPS-CM clusters to myocardial tissue. These data suggest that MSCs may increase the functional engraftment and therapeutic efficacy of transplanted iPS-CMs into infarcted myocardium. PMID:24219308
Seizure clusters: characteristics and treatment.

PubMed

Haut, Sheryl R

2015-04-01

Many patients with epilepsy experience 'clusters' or flurries of seizures, also termed acute repetitive seizures (ARS). Seizure clustering has a significant impact on health and quality of life. This review summarizes recent advances in the definition and neurophysiologic understanding of clustering, the epidemiology and risk factors for clustering and both inpatient and outpatient clinical implications. New treatments for seizure clustering/ARS are perhaps the area of greatest recent progress. Efforts have focused on creating a uniform definition of a seizure cluster. In neurophysiologic studies of refractory epilepsy, seizures within a cluster appear to be self-triggering. Clinical progress has been achieved towards a more precise prevalence of clustering, and consensus guidelines for epilepsy monitoring unit safety. The greatest recent advances are in the study of nonintravenous route of benzodiazepines as rescue medications for seizure clusters/ARS. Rectal benzodiazepines have been very effective but barriers to use exist. New data on buccal, intramuscular and intranasal preparations are anticipated to lead to a greater number of approved treatments. Progesterone may be effective for women who experience catamenial clusters. Seizure clustering is common, particularly in the setting of medically refractory epilepsy. Clustering worsens health and quality of life, and the field requires greater focus on clarifying of definition and clinical implications. Progress towards the development of nonintravenous routes of benzodiazepines has the potential to improve care in this area.
Integrating K-means Clustering with Kernel Density Estimation for the Development of a Conditional Weather Generation Downscaling Model

NASA Astrophysics Data System (ADS)

Chen, Y.; Ho, C.; Chang, L.

2011-12-01

In previous decades, the climate change caused by global warming increases the occurrence frequency of extreme hydrological events. Water supply shortages caused by extreme events create great challenges for water resource management. To evaluate future climate variations, general circulation models (GCMs) are the most wildly known tools which shows possible weather conditions under pre-defined CO2 emission scenarios announced by IPCC. Because the study area of GCMs is the entire earth, the grid sizes of GCMs are much larger than the basin scale. To overcome the gap, a statistic downscaling technique can transform the regional scale weather factors into basin scale precipitations. The statistic downscaling technique can be divided into three categories include transfer function, weather generator and weather type. The first two categories describe the relationships between the weather factors and precipitations respectively based on deterministic algorithms, such as linear or nonlinear regression and ANN, and stochastic approaches, such as Markov chain theory and statistical distributions. In the weather type, the method has ability to cluster weather factors, which are high dimensional and continuous variables, into weather types, which are limited number of discrete states. In this study, the proposed downscaling model integrates the weather type, using the K-means clustering algorithm, and the weather generator, using the kernel density estimation. The study area is Shihmen basin in northern of Taiwan. In this study, the research process contains two steps, a calibration step and a synthesis step. Three sub-steps were used in the calibration step. First, weather factors, such as pressures, humidities and wind speeds, obtained from NCEP and the precipitations observed from rainfall stations were collected for downscaling. Second, the K-means clustering grouped the weather factors into four weather types. Third, the Markov chain transition matrixes and the
Iterative Track Fitting Using Cluster Classification in Multi Wire Proportional Chamber

NASA Astrophysics Data System (ADS)

Primor, David; Mikenberg, Giora; Etzion, Erez; Messer, Hagit

2007-10-01

This paper addresses the problem of track fitting of a charged particle in a multi wire proportional chamber (MWPC) using cathode readout strips. When a charged particle crosses a MWPC, a positive charge is induced on a cluster of adjacent strips. In the presence of high radiation background, the cluster charge measurements may be contaminated due to background particles, leading to less accurate hit position estimation. The least squares method for track fitting assumes the same position error distribution for all hits and thus loses its optimal properties on contaminated data. For this reason, a new robust algorithm is proposed. The algorithm first uses the known spatial charge distribution caused by a single charged particle over the strips, and classifies the clusters into ldquocleanrdquo and ldquodirtyrdquo clusters. Then, using the classification results, it performs an iterative weighted least squares fitting procedure, updating its optimal weights each iteration. The performance of the suggested algorithm is compared to other track fitting techniques using a simulation of tracks with radiation background. It is shown that the algorithm improves the track fitting performance significantly. A practical implementation of the algorithm is presented for muon track fitting in the cathode strip chamber (CSC) of the ATLAS experiment.
Improving the quality of parameter estimates obtained from slug tests

USGS Publications Warehouse

Butler, J.J.; McElwee, C.D.; Liu, W.

1996-01-01

The slug test is one of the most commonly used field methods for obtaining in situ estimates of hydraulic conductivity. Despite its prevalence, this method has received criticism from many quarters in the ground-water community. This criticism emphasizes the poor quality of the estimated parameters, a condition that is primarily a product of the somewhat casual approach that is often employed in slug tests. Recently, the Kansas Geological Survey (KGS) has pursued research directed it improving methods for the performance and analysis of slug tests. Based on extensive theoretical and field research, a series of guidelines have been proposed that should enable the quality of parameter estimates to be improved. The most significant of these guidelines are: (1) three or more slug tests should be performed at each well during a given test period; (2) two or more different initial displacements (Ho) should be used at each well during a test period; (3) the method used to initiate a test should enable the slug to be introduced in a near-instantaneous manner and should allow a good estimate of Ho to be obtained; (4) data-acquisition equipment that enables a large quantity of high quality data to be collected should be employed; (5) if an estimate of the storage parameter is needed, an observation well other than the test well should be employed; (6) the method chosen for analysis of the slug-test data should be appropriate for site conditions; (7) use of pre- and post-analysis plots should be an integral component of the analysis procedure, and (8) appropriate well construction parameters should be employed. Data from slug tests performed at a number of KGS field sites demonstrate the importance of these guidelines.
Intrinsic alignment of redMaPPer clusters: cluster shape-matter density correlation

NASA Astrophysics Data System (ADS)

van Uitert, Edo; Joachimi, Benjamin

2017-07-01

We measure the alignment of the shapes of galaxy clusters, as traced by their satellite distributions, with the matter density field using the public redMaPPer catalogue based on Sloan Digital Sky Survey-Data Release 8 (SDSS-DR8), which contains 26 111 clusters up to z ˜ 0.6. The clusters are split into nine redshift and richness samples; in each of them, we detect a positive alignment, showing that clusters point towards density peaks. We interpret the measurements within the tidal alignment paradigm, allowing for a richness and redshift dependence. The intrinsic alignment (IA) amplitude at the pivot redshift z = 0.3 and pivot richness λ = 30 is A_IA^gen=12.6_{-1.2}^{+1.5}. We obtain tentative evidence that the signal increases towards higher richness and lower redshift. Our measurements agree well with results of maxBCG clusters and with dark-matter-only simulations. Comparing our results to the IA measurements of luminous red galaxies, we find that the IA amplitude of galaxy clusters forms a smooth extension towards higher mass. This suggests that these systems share a common alignment mechanism, which can be exploited to improve our physical understanding of IA.
When is Constrained Clustering Beneficial, and Why?

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Basu, Sugato; Davidson, Ian

2006-01-01

Several researchers have shown that constraints can improve the results of a variety of clustering algorithms. However, there can be a large variation in this improvement, even for a fixed number of constraints for a given data set. We present the first attempt to provide insight into this phenomenon by characterizing two constraint set properties: informativeness and coherence. We show that these measures can help explain why some constraint sets are more beneficial to clustering algorithms than others. Since they can be computed prior to clustering, these measures can aid in deciding which constraints to use in practice.
Interacting star clusters in the Large Magellanic Cloud. Overmerging problem solved by cluster group formation

NASA Astrophysics Data System (ADS)

Leon, Stéphane; Bergond, Gilles; Vallenari, Antonella

1999-04-01

We present the tidal tail distributions of a sample of candidate binary clusters located in the bar of the Large Magellanic Cloud (LMC). One isolated cluster, SL 268, is presented in order to study the effect of the LMC tidal field. All the candidate binary clusters show tidal tails, confirming that the pairs are formed by physically linked objects. The stellar mass in the tails covers a large range, from 1.8x 10(3) to 3x 10(4) \\msun. We derive a total mass estimate for SL 268 and SL 356. At large radii, the projected density profiles of SL 268 and SL 356 fall off as r(-gamma ) , with gamma = 2.27 and gamma =3.44, respectively. Out of 4 pairs or multiple systems, 2 are older than the theoretical survival time of binary clusters (going from a few 10(6) years to 10(8) years). A pair shows too large age difference between the components to be consistent with classical theoretical models of binary cluster formation (Fujimoto & Kumai \\cite{fujimoto97}). We refer to this as the ``overmerging'' problem. A different scenario is proposed: the formation proceeds in large molecular complexes giving birth to groups of clusters over a few 10(7) years. In these groups the expected cluster encounter rate is larger, and tidal capture has higher probability. Cluster pairs are not born together through the splitting of the parent cloud, but formed later by tidal capture. For 3 pairs, we tentatively identify the star cluster group (SCG) memberships. The SCG formation, through the recent cluster starburst triggered by the LMC-SMC encounter, in contrast with the quiescent open cluster formation in the Milky Way can be an explanation to the paucity of binary clusters observed in our Galaxy. Based on observations collected at the European Southern Observatory, La Silla, Chile}
An Accurate Link Correlation Estimator for Improving Wireless Protocol Performance

PubMed Central

Zhao, Zhiwei; Xu, Xianghua; Dong, Wei; Bu, Jiajun

2015-01-01

Wireless link correlation has shown significant impact on the performance of various sensor network protocols. Many works have been devoted to exploiting link correlation for protocol improvements. However, the effectiveness of these designs heavily relies on the accuracy of link correlation measurement. In this paper, we investigate state-of-the-art link correlation measurement and analyze the limitations of existing works. We then propose a novel lightweight and accurate link correlation estimation (LACE) approach based on the reasoning of link correlation formation. LACE combines both long-term and short-term link behaviors for link correlation estimation. We implement LACE as a stand-alone interface in TinyOS and incorporate it into both routing and flooding protocols. Simulation and testbed results show that LACE: (1) achieves more accurate and lightweight link correlation measurements than the state-of-the-art work; and (2) greatly improves the performance of protocols exploiting link correlation. PMID:25686314
The Mass Function of Abell Clusters

NASA Astrophysics Data System (ADS)

Chen, J.; Huchra, J. P.; McNamara, B. R.; Mader, J.

1998-12-01

The velocity dispersion and mass functions for rich clusters of galaxies provide important constraints on models of the formation of Large-Scale Structure (e.g., Frenk et al. 1990). However, prior estimates of the velocity dispersion or mass function for galaxy clusters have been based on either very small samples of clusters (Bahcall and Cen 1993; Zabludoff et al. 1994) or large but incomplete samples (e.g., the Girardi et al. (1998) determination from a sample of clusters with more than 30 measured galaxy redshifts). In contrast, we approach the problem by constructing a volume-limited sample of Abell clusters. We collected individual galaxy redshifts for our sample from two major galaxy velocity databases, the NASA Extragalactic Database, NED, maintained at IPAC, and ZCAT, maintained at SAO. We assembled a database with velocity information for possible cluster members and then selected cluster members based on both spatial and velocity data. Cluster velocity dispersions and masses were calculated following the procedures of Danese, De Zotti, and di Tullio (1980) and Heisler, Tremaine, and Bahcall (1985), respectively. The final velocity dispersion and mass functions were analyzed in order to constrain cosmological parameters by comparison to the results of N-body simulations. Our data for the cluster sample as a whole and for the individual clusters (spatial maps and velocity histograms) in our sample is available on-line at http://cfa-www.harvard.edu/ huchra/clusters. This website will be updated as more data becomes available in the master redshift compilations, and will be expanded to include more clusters and large groups of galaxies.
Clustering of financial time series

NASA Astrophysics Data System (ADS)

D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo

2013-05-01

This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.
Uncertainty in Population Estimates for Endangered Animals and Improving the Recovery Process.

PubMed

Haines, Aaron M; Zak, Matthew; Hammond, Katie; Scott, J Michael; Goble, Dale D; Rachlow, Janet L

2013-08-13

United States recovery plans contain biological information for a species listed under the Endangered Species Act and specify recovery criteria to provide basis for species recovery. The objective of our study was to evaluate whether recovery plans provide uncertainty (e.g., variance) with estimates of population size. We reviewed all finalized recovery plans for listed terrestrial vertebrate species to record the following data: (1) if a current population size was given, (2) if a measure of uncertainty or variance was associated with current estimates of population size and (3) if population size was stipulated for recovery. We found that 59% of completed recovery plans specified a current population size, 14.5% specified a variance for the current population size estimate and 43% specified population size as a recovery criterion. More recent recovery plans reported more estimates of current population size, uncertainty and population size as a recovery criterion. Also, bird and mammal recovery plans reported more estimates of population size and uncertainty compared to reptiles and amphibians. We suggest the use of calculating minimum detectable differences to improve confidence when delisting endangered animals and we identified incentives for individuals to get involved in recovery planning to improve access to quantitative data.
Size-guided multi-seed heuristic method for geometry optimization of clusters: Application to benzene clusters.

PubMed

Takeuchi, Hiroshi

2018-05-08

Since searching for the global minimum on the potential energy surface of a cluster is very difficult, many geometry optimization methods have been proposed, in which initial geometries are randomly generated and subsequently improved with different algorithms. In this study, a size-guided multi-seed heuristic method is developed and applied to benzene clusters. It produces initial configurations of the cluster with n molecules from the lowest-energy configurations of the cluster with n - 1 molecules (seeds). The initial geometries are further optimized with the geometrical perturbations previously used for molecular clusters. These steps are repeated until the size n satisfies a predefined one. The method locates putative global minima of benzene clusters with up to 65 molecules. The performance of the method is discussed using the computational cost, rates to locate the global minima, and energies of initial geometries. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
RELICS: Strong Lens Models for Five Galaxy Clusters from the Reionization Lensing Cluster Survey

NASA Astrophysics Data System (ADS)

Cerny, Catherine; Sharon, Keren; Andrade-Santos, Felipe; Avila, Roberto J.; Bradač, Maruša; Bradley, Larry D.; Carrasco, Daniela; Coe, Dan; Czakon, Nicole G.; Dawson, William A.; Frye, Brenda L.; Hoag, Austin; Huang, Kuang-Han; Johnson, Traci L.; Jones, Christine; Lam, Daniel; Lovisari, Lorenzo; Mainali, Ramesh; Oesch, Pascal A.; Ogaz, Sara; Past, Matthew; Paterno-Mahler, Rachel; Peterson, Avery; Riess, Adam G.; Rodney, Steven A.; Ryan, Russell E.; Salmon, Brett; Sendra-Server, Irene; Stark, Daniel P.; Strolger, Louis-Gregory; Trenti, Michele; Umetsu, Keiichi; Vulcani, Benedetta; Zitrin, Adi

2018-06-01

Strong gravitational lensing by galaxy clusters magnifies background galaxies, enhancing our ability to discover statistically significant samples of galaxies at {\\boldsymbol{z}}> 6, in order to constrain the high-redshift galaxy luminosity functions. Here, we present the first five lens models out of the Reionization Lensing Cluster Survey (RELICS) Hubble Treasury Program, based on new HST WFC3/IR and ACS imaging of the clusters RXC J0142.9+4438, Abell 2537, Abell 2163, RXC J2211.7–0349, and ACT-CLJ0102–49151. The derived lensing magnification is essential for estimating the intrinsic properties of high-redshift galaxy candidates, and properly accounting for the survey volume. We report on new spectroscopic redshifts of multiply imaged lensed galaxies behind these clusters, which are used as constraints, and detail our strategy to reduce systematic uncertainties due to lack of spectroscopic information. In addition, we quantify the uncertainty on the lensing magnification due to statistical and systematic errors related to the lens modeling process, and find that in all but one cluster, the magnification is constrained to better than 20% in at least 80% of the field of view, including statistical and systematic uncertainties. The five clusters presented in this paper span the range of masses and redshifts of the clusters in the RELICS program. We find that they exhibit similar strong lensing efficiencies to the clusters targeted by the Hubble Frontier Fields within the WFC3/IR field of view. Outputs of the lens models are made available to the community through the Mikulski Archive for Space Telescopes.
Managing distance and covariate information with point-based clustering.

PubMed

Whigham, Peter A; de Graaf, Brandon; Srivastava, Rashmi; Glue, Paul

2016-09-01

Geographic perspectives of disease and the human condition often involve point-based observations and questions of clustering or dispersion within a spatial context. These problems involve a finite set of point observations and are constrained by a larger, but finite, set of locations where the observations could occur. Developing a rigorous method for pattern analysis in this context requires handling spatial covariates, a method for constrained finite spatial clustering, and addressing bias in geographic distance measures. An approach, based on Ripley's K and applied to the problem of clustering with deliberate self-harm (DSH), is presented. Point-based Monte-Carlo simulation of Ripley's K, accounting for socio-economic deprivation and sources of distance measurement bias, was developed to estimate clustering of DSH at a range of spatial scales. A rotated Minkowski L1 distance metric allowed variation in physical distance and clustering to be assessed. Self-harm data was derived from an audit of 2 years' emergency hospital presentations (n = 136) in a New Zealand town (population ~50,000). Study area was defined by residential (housing) land parcels representing a finite set of possible point addresses. Area-based deprivation was spatially correlated. Accounting for deprivation and distance bias showed evidence for clustering of DSH for spatial scales up to 500 m with a one-sided 95 % CI, suggesting that social contagion may be present for this urban cohort. Many problems involve finite locations in geographic space that require estimates of distance-based clustering at many scales. A Monte-Carlo approach to Ripley's K, incorporating covariates and models for distance bias, are crucial when assessing health-related clustering. The case study showed that social network structure defined at the neighbourhood level may account for aspects of neighbourhood clustering of DSH. Accounting for covariate measures that exhibit spatial clustering, such as deprivation
Electron attenuation in free, neutral ethane clusters.

PubMed

Winkler, M; Myrseth, V; Harnes, J; Børve, K J

2014-10-28

The electron effective attenuation length (EAL) in free, neutral ethane clusters has been determined at 40 eV kinetic energy by combining carbon 1s x-ray photoelectron spectroscopy and theoretical lineshape modeling. More specifically, theory is employed to form model spectra on a grid in cluster size (N) and EAL (λ), allowing N and λ to be determined by optimizing the goodness-of-fit χ(2)(N, λ) between model and observed spectra. Experimentally, the clusters were produced in an adiabatic-expansion setup using helium as the driving gas, spanning a range of 100-600 molecules in mean cluster size. The effective attenuation length was determined to be 8.4 ± 1.9 Å, in good agreement with an independent estimate of 10 Å formed on the basis of molecular electron-scattering data and Monte Carlo simulations. The aggregation state of the clusters as well as the cluster temperature and its importance to the derived EAL value are discussed in some depth.
Electron and nuclear dynamics of molecular clusters in ultraintense laser fields. III. Coulomb explosion of deuterium clusters.

PubMed

Last, Isidore; Jortner, Joshua

2004-08-15

boundary radius (R0)I and the corresponding ion average energy (Eav)I were inferred from simulations and described in terms of an electrostatic model. Two independent estimates of (R0)I, which involve the cluster size where the CVI relation breaks down and the cluster size for the attainment of complete outer ionization, are in good agreement with each other, as well as with the electrostatic model for cluster barrier suppression. The relation (Eav)I proportional to (R0)I(2) provides the validity range of the pseudo-CVI domain for the cluster sizes and laser intensities, where the energetics of D+ ions produced by Coulomb explosion of (D)n clusters is optimized. The currently available experimental data [Madison et al., Phys. Plasmas 11, 1 (2004)] for the energetics of Coulomb explosion of (D)n clusters (Eav = 5 - 7 keV at I = 2 x 10(18) W cm(-2)), together with our simulation data, lead to the estimates of R0 = 51 - 60 A, which exceed the experimental estimate of R0 = 45 A. The predicted anisotropy of the D+ ion energies in the Coulomb explosion at I = 10(18) W cm(-2) is in accord with experiment. We also explored the laser frequency dependence of the energetics of Coulomb explosion in the range nu = 0.1 - 2.1 fs(-1) (lambda = 3000 - 140 nm), which can be rationalized in terms of the electrostatic model. (c) 2004 American Institute of Physics.
Uncertainty in Population Estimates for Endangered Animals and Improving the Recovery Process

PubMed Central

Haines, Aaron M.; Zak, Matthew; Hammond, Katie; Scott, J. Michael; Goble, Dale D.; Rachlow, Janet L.

2013-01-01

Simple Summary The objective of our study was to evaluate the mention of uncertainty (i.e., variance) associated with population size estimates within U.S. recovery plans for endangered animals. To do this we reviewed all finalized recovery plans for listed terrestrial vertebrate species. We found that more recent recovery plans reported more estimates of population size and uncertainty. Also, bird and mammal recovery plans reported more estimates of population size and uncertainty. We recommend that updated recovery plans combine uncertainty of population size estimates with a minimum detectable difference to aid in successful recovery. Abstract United States recovery plans contain biological information for a species listed under the Endangered Species Act and specify recovery criteria to provide basis for species recovery. The objective of our study was to evaluate whether recovery plans provide uncertainty (e.g., variance) with estimates of population size. We reviewed all finalized recovery plans for listed terrestrial vertebrate species to record the following data: (1) if a current population size was given, (2) if a measure of uncertainty or variance was associated with current estimates of population size and (3) if population size was stipulated for recovery. We found that 59% of completed recovery plans specified a current population size, 14.5% specified a variance for the current population size estimate and 43% specified population size as a recovery criterion. More recent recovery plans reported more estimates of current population size, uncertainty and population size as a recovery criterion. Also, bird and mammal recovery plans reported more estimates of population size and uncertainty compared to reptiles and amphibians. We suggest the use of calculating minimum detectable differences to improve confidence when delisting endangered animals and we identified incentives for individuals to get involved in recovery planning to improve access to
A Note on Cluster Effects in Latent Class Analysis

ERIC Educational Resources Information Center

Kaplan, David; Keller, Bryan

2011-01-01

This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.