NASA Astrophysics Data System (ADS)
Choi, Hon-Chit; Wen, Lingfeng; Eberl, Stefan; Feng, Dagan
2006-03-01
Dynamic Single Photon Emission Computed Tomography (SPECT) has the potential to quantitatively estimate physiological parameters by fitting compartment models to the tracer kinetics. The generalized linear least square method (GLLS) is an efficient method to estimate unbiased kinetic parameters and parametric images. However, due to the low sensitivity of SPECT, noisy data can cause voxel-wise parameter estimation by GLLS to fail. Fuzzy C-Mean (FCM) clustering and modified FCM, which also utilizes information from the immediate neighboring voxels, are proposed to improve the voxel-wise parameter estimation of GLLS. Monte Carlo simulations were performed to generate dynamic SPECT data with different noise levels and processed by general and modified FCM clustering. Parametric images were estimated by Logan and Yokoi graphical analysis and GLLS. The influx rate (K I), volume of distribution (V d) were estimated for the cerebellum, thalamus and frontal cortex. Our results show that (1) FCM reduces the bias and improves the reliability of parameter estimates for noisy data, (2) GLLS provides estimates of micro parameters (K I-k 4) as well as macro parameters, such as volume of distribution (Vd) and binding potential (BP I & BP II) and (3) FCM clustering incorporating neighboring voxel information does not improve the parameter estimates, but improves noise in the parametric images. These findings indicated that it is desirable for pre-segmentation with traditional FCM clustering to generate voxel-wise parametric images with GLLS from dynamic SPECT data.
A clustering approach to segmenting users of internet-based risk calculators.
Harle, C A; Downs, J S; Padman, R
2011-01-01
Risk calculators are widely available Internet applications that deliver quantitative health risk estimates to consumers. Although these tools are known to have varying effects on risk perceptions, little is known about who will be more likely to accept objective risk estimates. To identify clusters of online health consumers that help explain variation in individual improvement in risk perceptions from web-based quantitative disease risk information. A secondary analysis was performed on data collected in a field experiment that measured people's pre-diabetes risk perceptions before and after visiting a realistic health promotion website that provided quantitative risk information. K-means clustering was performed on numerous candidate variable sets, and the different segmentations were evaluated based on between-cluster variation in risk perception improvement. Variation in responses to risk information was best explained by clustering on pre-intervention absolute pre-diabetes risk perceptions and an objective estimate of personal risk. Members of a high-risk overestimater cluster showed large improvements in their risk perceptions, but clusters of both moderate-risk and high-risk underestimaters were much more muted in improving their optimistically biased perceptions. Cluster analysis provided a unique approach for segmenting health consumers and predicting their acceptance of quantitative disease risk information. These clusters suggest that health consumers were very responsive to good news, but tended not to incorporate bad news into their self-perceptions much. These findings help to quantify variation among online health consumers and may inform the targeted marketing of and improvements to risk communication tools on the Internet.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Optimizing weak lensing mass estimates for cluster profile uncertainty
Gruen, D.; Bernstein, G. M.; Lam, T. Y.; ...
2011-09-11
Weak lensing measurements of cluster masses are necessary for calibrating mass-observable relations (MORs) to investigate the growth of structure and the properties of dark energy. However, the measured cluster shear signal varies at fixed mass M 200m due to inherent ellipticity of background galaxies, intervening structures along the line of sight, and variations in the cluster structure due to scatter in concentrations, asphericity and substructure. We use N-body simulated halos to derive and evaluate a weak lensing circular aperture mass measurement M ap that minimizes the mass estimate variance <(M ap - M 200m) 2> in the presence of allmore » these forms of variability. Depending on halo mass and observational conditions, the resulting mass estimator improves on M ap filters optimized for circular NFW-profile clusters in the presence of uncorrelated large scale structure (LSS) about as much as the latter improve on an estimator that only minimizes the influence of shape noise. Optimizing for uncorrelated LSS while ignoring the variation of internal cluster structure puts too much weight on the profile near the cores of halos, and under some circumstances can even be worse than not accounting for LSS at all. As a result, we discuss the impact of variability in cluster structure and correlated structures on the design and performance of weak lensing surveys intended to calibrate cluster MORs.« less
Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B
2017-04-01
Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Application of adaptive cluster sampling to low-density populations of freshwater mussels
Smith, D.R.; Villella, R.F.; Lemarie, D.P.
2003-01-01
Freshwater mussels appear to be promising candidates for adaptive cluster sampling because they are benthic macroinvertebrates that cluster spatially and are frequently found at low densities. We applied adaptive cluster sampling to estimate density of freshwater mussels at 24 sites along the Cacapon River, WV, where a preliminary timed search indicated that mussels were present at low density. Adaptive cluster sampling increased yield of individual mussels and detection of uncommon species; however, it did not improve precision of density estimates. Because finding uncommon species, collecting individuals of those species, and estimating their densities are important conservation activities, additional research is warranted on application of adaptive cluster sampling to freshwater mussels. However, at this time we do not recommend routine application of adaptive cluster sampling to freshwater mussel populations. The ultimate, and currently unanswered, question is how to tell when adaptive cluster sampling should be used, i.e., when is a population sufficiently rare and clustered for adaptive cluster sampling to be efficient and practical? A cost-effective procedure needs to be developed to identify biological populations for which adaptive cluster sampling is appropriate.
Testing the accuracy of clustering redshifts with simulations
NASA Astrophysics Data System (ADS)
Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.
2018-03-01
We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.
The genetic structure of a relict population of wood frogs
Scherer, Rick; Muths, Erin; Noon, Barry; Oyler-McCance, Sara
2012-01-01
Habitat fragmentation and the associated reduction in connectivity between habitat patches are commonly cited causes of genetic differentiation and reduced genetic variation in animal populations. We used eight microsatellite markers to investigate genetic structure and levels of genetic diversity in a relict population of wood frogs (Lithobates sylvatica) in Rocky Mountain National Park, Colorado, where recent disturbances have altered hydrologic processes and fragmented amphibian habitat. We also estimated migration rates among subpopulations, tested for a pattern of isolation-by-distance, and looked for evidence of a recent population bottleneck. The results from the clustering algorithm in Program STRUCTURE indicated the population is partitioned into two genetic clusters (subpopulations), and this result was further supported by factorial component analysis. In addition, an estimate of FST (FST = 0.0675, P value \\0.0001) supported the genetic differentiation of the two clusters. Estimates of migration rates among the two subpopulations were low, as were estimates of genetic variability. Conservation of the population of wood frogs may be improved by increasing the spatial distribution of the population and improving gene flow between the subpopulations. Construction or restoration of wetlands in the landscape between the clusters has the potential to address each of these objectives.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
The observed clustering of damaging extra-tropical cyclones in Europe
NASA Astrophysics Data System (ADS)
Cusack, S.
2015-12-01
The clustering of severe European windstorms on annual timescales has substantial impacts on the re/insurance industry. Management of the risk is impaired by large uncertainties in estimates of clustering from historical storm datasets typically covering the past few decades. The uncertainties are unusually large because clustering depends on the variance of storm counts. Eight storm datasets are gathered for analysis in this study in order to reduce these uncertainties. Six of the datasets contain more than 100~years of severe storm information to reduce sampling errors, and the diversity of information sources and analysis methods between datasets sample observational errors. All storm severity measures used in this study reflect damage, to suit re/insurance applications. It is found that the shortest storm dataset of 42 years in length provides estimates of clustering with very large sampling and observational errors. The dataset does provide some useful information: indications of stronger clustering for more severe storms, particularly for southern countries off the main storm track. However, substantially different results are produced by removal of one stormy season, 1989/1990, which illustrates the large uncertainties from a 42-year dataset. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm datasets show a greater degree of clustering with increasing storm severity and suggest clustering of severe storms is much more material than weaker storms. Further, they contain signs of stronger clustering in areas off the main storm track, and weaker clustering for smaller-sized areas, though these signals are smaller than uncertainties in actual values. Both the improvement of existing storm records and development of new historical storm datasets would help to improve management of this risk.
High-Resolution Spatial Distribution and Estimation of Access to Improved Sanitation in Kenya.
Jia, Peng; Anderson, John D; Leitner, Michael; Rheingans, Richard
2016-01-01
Access to sanitation facilities is imperative in reducing the risk of multiple adverse health outcomes. A distinct disparity in sanitation exists among different wealth levels in many low-income countries, which may hinder the progress across each of the Millennium Development Goals. The surveyed households in 397 clusters from 2008-2009 Kenya Demographic and Health Surveys were divided into five wealth quintiles based on their national asset scores. A series of spatial analysis methods including excess risk, local spatial autocorrelation, and spatial interpolation were applied to observe disparities in coverage of improved sanitation among different wealth categories. The total number of the population with improved sanitation was estimated by interpolating, time-adjusting, and multiplying the surveyed coverage rates by high-resolution population grids. A comparison was then made with the annual estimates from United Nations Population Division and World Health Organization /United Nations Children's Fund Joint Monitoring Program for Water Supply and Sanitation. The Empirical Bayesian Kriging interpolation produced minimal root mean squared error for all clusters and five quintiles while predicting the raw and spatial coverage rates of improved sanitation. The coverage in southern regions was generally higher than in the north and east, and the coverage in the south decreased from Nairobi in all directions, while Nyanza and North Eastern Province had relatively poor coverage. The general clustering trend of high and low sanitation improvement among surveyed clusters was confirmed after spatial smoothing. There exists an apparent disparity in sanitation among different wealth categories across Kenya and spatially smoothed coverage rates resulted in a closer estimation of the available statistics than raw coverage rates. Future intervention activities need to be tailored for both different wealth categories and nationally where there are areas of greater needs when resources are limited.
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Improved optical mass tracer for galaxy clusters calibrated using weak lensing measurements
NASA Astrophysics Data System (ADS)
Reyes, R.; Mandelbaum, R.; Hirata, C.; Bahcall, N.; Seljak, U.
2008-11-01
We develop an improved mass tracer for clusters of galaxies from optically observed parameters, and calibrate the mass relation using weak gravitational lensing measurements. We employ a sample of ~13000 optically selected clusters from the Sloan Digital Sky Survey (SDSS) maxBCG catalogue, with photometric redshifts in the range 0.1-0.3. The optical tracers we consider are cluster richness, cluster luminosity, luminosity of the brightest cluster galaxy (BCG) and combinations of these parameters. We measure the weak lensing signal around stacked clusters as a function of the various tracers, and use it to determine the tracer with the least amount of scatter. We further use the weak lensing data to calibrate the mass normalization. We find that the best mass estimator for massive clusters is a combination of cluster richness, N200, and the luminosity of the BCG, LBCG: , where is the observed mean BCG luminosity at a given richness. This improved mass tracer will enable the use of galaxy clusters as a more powerful tool for constraining cosmological parameters.
NASA Astrophysics Data System (ADS)
Raghunathan, Srinivasan; Patil, Sanjaykumar; Baxter, Eric J.; Bianchini, Federico; Bleem, Lindsey E.; Crawford, Thomas M.; Holder, Gilbert P.; Manzotti, Alessandro; Reichardt, Christian L.
2017-08-01
We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, we examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment's beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.
Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis
NASA Astrophysics Data System (ADS)
Yen, Chi-Fu; Sivasankar, Sanjeevi
2018-03-01
Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.
Auplish, Aashima; Clarke, Alison S; Van Zanten, Trent; Abel, Kate; Tham, Charmaine; Bhutia, Thinlay N; Wilks, Colin R; Stevenson, Mark A; Firestone, Simon M
2017-05-01
Educational initiatives targeting at-risk populations have long been recognized as a mainstay of ongoing rabies control efforts. Cluster-based studies are often utilized to assess levels of knowledge, attitudes and practices of a population in response to education campaigns. The design of cluster-based studies requires estimates of intra-cluster correlation coefficients obtained from previous studies. This study estimates the school-level intra-cluster correlation coefficient (ICC) for rabies knowledge change following an educational intervention program. A cross-sectional survey was conducted with 226 students from 7 schools in Sikkim, India, using cluster sampling. In order to assess knowledge uptake, rabies education sessions with pre- and post-session questionnaires were administered. Paired differences of proportions were estimated for questions answered correctly. A mixed effects logistic regression model was developed to estimate school-level and student-level ICCs and to test for associations between gender, age, school location and educational level. The school- and student-level ICCs for rabies knowledge and awareness were 0.04 (95% CI: 0.01, 0.19) and 0.05 (95% CI: 0.2, 0.09), respectively. These ICCs suggest design effect multipliers of 5.45 schools and 1.05 students per school, will be required when estimating sample sizes and designing future cluster randomized trials. There was a good baseline level of rabies knowledge (mean pre-session score 71%), however, key knowledge gaps were identified in understanding appropriate behavior around scared dogs, potential sources of rabies and how to correctly order post rabies exposure precaution steps. After adjusting for the effect of gender, age, school location and education level, school and individual post-session test scores improved by 19%, with similar performance amongst boys and girls attending schools in urban and rural regions. The proportion of participants that were able to correctly order post-exposure precautionary steps following educational intervention increased by 87%. The ICC estimates presented in this study will aid in designing cluster-based studies evaluating educational interventions as part of disease control programs. This study demonstrates the likely benefits of educational intervention incorporating bite prevention and rabies education. Copyright © 2017 Elsevier B.V. All rights reserved.
Data processing 1: Advancements in machine analysis of multispectral data
NASA Technical Reports Server (NTRS)
Swain, P. H.
1972-01-01
Multispectral data processing procedures are outlined beginning with the data display process used to accomplish data editing and proceeding through clustering, feature selection criterion for error probability estimation, and sample clustering and sample classification. The effective utilization of large quantities of remote sensing data by formulating a three stage sampling model for evaluation of crop acreage estimates represents an improvement in determining the cost benefit relationship associated with remote sensing technology.
The observed clustering of damaging extratropical cyclones in Europe
NASA Astrophysics Data System (ADS)
Cusack, Stephen
2016-04-01
The clustering of severe European windstorms on annual timescales has substantial impacts on the (re-)insurance industry. Our knowledge of the risk is limited by large uncertainties in estimates of clustering from typical historical storm data sets covering the past few decades. Eight storm data sets are gathered for analysis in this study in order to reduce these uncertainties. Six of the data sets contain more than 100 years of severe storm information to reduce sampling errors, and observational errors are reduced by the diversity of information sources and analysis methods between storm data sets. All storm severity measures used in this study reflect damage, to suit (re-)insurance applications. The shortest storm data set of 42 years provides indications of stronger clustering with severity, particularly for regions off the main storm track in central Europe and France. However, clustering estimates have very large sampling and observational errors, exemplified by large changes in estimates in central Europe upon removal of one stormy season, 1989/1990. The extended storm records place 1989/1990 into a much longer historical context to produce more robust estimates of clustering. All the extended storm data sets show increased clustering between more severe storms from return periods (RPs) of 0.5 years to the longest measured RPs of about 20 years. Further, they contain signs of stronger clustering off the main storm track, and weaker clustering for smaller-sized areas, though these signals are more uncertain as they are drawn from smaller data samples. These new ultra-long storm data sets provide new information on clustering to improve our management of this risk.
Friesen, Melissa C; Shortreed, Susan M; Wheeler, David C; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S; Baris, Dalsu; Karagas, Margaret R; Schwenn, Molly; Johnson, Alison; Armenti, Karla R; Silverman, Debra T; Yu, Kai
2015-05-01
Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job's estimate and the mean estimate for all jobs within the cluster. Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.
Augmenting Satellite Precipitation Estimation with Lightning Information
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mahrooghy, Majid; Anantharaj, Valentine G; Younan, Nicolas H.
2013-01-01
We have used lightning information to augment the Precipitation Estimation from Remotely Sensed Imagery using an Artificial Neural Network - Cloud Classification System (PERSIANN-CCS). Co-located lightning data are used to segregate cloud patches, segmented from GOES-12 infrared data, into either electrified (EL) or non-electrified (NEL) patches. A set of features is extracted separately for the EL and NEL cloud patches. The features for the EL cloud patches include new features based on the lightning information. The cloud patches are classified and clustered using self-organizing maps (SOM). Then brightness temperature and rain rate (T-R) relationships are derived for the different clusters.more » Rain rates are estimated for the cloud patches based on their representative T-R relationship. The Equitable Threat Score (ETS) for daily precipitation estimates is improved by almost 12% for the winter season. In the summer, no significant improvements in ETS are noted.« less
Raghunathan, Srinivasan; Patil, Sanjaykumar; Baxter, Eric J.; ...
2017-08-25
We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, wemore » examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment’s beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raghunathan, Srinivasan; Patil, Sanjaykumar; Baxter, Eric J.
We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, wemore » examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment’s beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raghunathan, Srinivasan; Patil, Sanjaykumar; Bianchini, Federico
We develop a Maximum Likelihood estimator (MLE) to measure the masses of galaxy clusters through the impact of gravitational lensing on the temperature and polarization anisotropies of the cosmic microwave background (CMB). We show that, at low noise levels in temperature, this optimal estimator outperforms the standard quadratic estimator by a factor of two. For polarization, we show that the Stokes Q/U maps can be used instead of the traditional E- and B-mode maps without losing information. We test and quantify the bias in the recovered lensing mass for a comprehensive list of potential systematic errors. Using realistic simulations, wemore » examine the cluster mass uncertainties from CMB-cluster lensing as a function of an experiment's beam size and noise level. We predict the cluster mass uncertainties will be 3 - 6% for SPT-3G, AdvACT, and Simons Array experiments with 10,000 clusters and less than 1% for the CMB-S4 experiment with a sample containing 100,000 clusters. The mass constraints from CMB polarization are very sensitive to the experimental beam size and map noise level: for a factor of three reduction in either the beam size or noise level, the lensing signal-to-noise improves by roughly a factor of two.« less
Friesen, Melissa C.; Shortreed, Susan M.; Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla R.; Silverman, Debra T.; Yu, Kai
2015-01-01
Objectives: Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Methods: Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m−3 respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters’ homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job’s estimate and the mean estimate for all jobs within the cluster. Results: Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. Conclusions: This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. PMID:25477475
Slope angle estimation method based on sparse subspace clustering for probe safe landing
NASA Astrophysics Data System (ADS)
Li, Haibo; Cao, Yunfeng; Ding, Meng; Zhuang, Likui
2018-06-01
To avoid planetary probes landing on steep slopes where they may slip or tip over, a new method of slope angle estimation based on sparse subspace clustering is proposed to improve accuracy. First, a coordinate system is defined and established to describe the measured data of light detection and ranging (LIDAR). Second, this data is processed and expressed with a sparse representation. Third, on this basis, the data is made to cluster to determine which subspace it belongs to. Fourth, eliminating outliers in subspace, the correct data points are used for the fitting planes. Finally, the vectors normal to the planes are obtained using the plane model, and the angle between the normal vectors is obtained through calculation. Based on the geometric relationship, this angle is equal in value to the slope angle. The proposed method was tested in a series of experiments. The experimental results show that this method can effectively estimate the slope angle, can overcome the influence of noise and obtain an exact slope angle. Compared with other methods, this method can minimize the measuring errors and further improve the estimation accuracy of the slope angle.
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis
2015-01-01
ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
Solav, Dana; Rubin, M B; Cereatti, Andrea; Camomilla, Valentina; Wolf, Alon
2016-04-01
Accurate estimation of the position and orientation (pose) of a bone from a cluster of skin markers is limited mostly by the relative motion between the bone and the markers, which is known as the soft tissue artifact (STA). This work presents a method, based on continuum mechanics, to describe the kinematics of a cluster affected by STA. The cluster is characterized by triangular cosserat point elements (TCPEs) defined by all combinations of three markers. The effects of the STA on the TCPEs are quantified using three parameters describing the strain in each TCPE and the relative rotation and translation between TCPEs. The method was evaluated using previously collected ex vivo kinematic data. Femur pose was estimated from 12 skin markers on the thigh, while its reference pose was measured using bone pins. Analysis revealed that instantaneous subsets of TCPEs exist which estimate bone position and orientation more accurately than the Procrustes Superimposition applied to the cluster of all markers. It has been shown that some of these parameters correlate well with femur pose errors, which suggests that they can be used to select, at each instant, subsets of TCPEs leading an improved estimation of the underlying bone pose.
Solav, Dana; Camomilla, Valentina; Cereatti, Andrea; Barré, Arnaud; Aminian, Kamiar; Wolf, Alon
2017-09-06
The aim of this study was to analyze the accuracy of bone pose estimation based on sub-clusters of three skin-markers characterized by triangular Cosserat point elements (TCPEs) and to evaluate the capability of four instantaneous physical parameters, which can be measured non-invasively in vivo, to identify the most accurate TCPEs. Moreover, TCPE pose estimations were compared with the estimations of two least squares minimization methods applied to the cluster of all markers, using rigid body (RBLS) and homogeneous deformation (HDLS) assumptions. Analysis was performed on previously collected in vivo treadmill gait data composed of simultaneous measurements of the gold-standard bone pose by bi-plane fluoroscopy tracking the subjects' knee prosthesis and a stereophotogrammetric system tracking skin-markers affected by soft tissue artifact. Femur orientation and position errors estimated from skin-marker clusters were computed for 18 subjects using clusters of up to 35 markers. Results based on gold-standard data revealed that instantaneous subsets of TCPEs exist which estimate the femur pose with reasonable accuracy (median root mean square error during stance/swing: 1.4/2.8deg for orientation, 1.5/4.2mm for position). A non-invasive and instantaneous criteria to select accurate TCPEs for pose estimation (4.8/7.3deg, 5.8/12.3mm), was compared with RBLS (4.3/6.6deg, 6.9/16.6mm) and HDLS (4.6/7.6deg, 6.7/12.5mm). Accounting for homogeneous deformation, using HDLS or selected TCPEs, yielded more accurate position estimations than RBLS method, which, conversely, yielded more accurate orientation estimations. Further investigation is required to devise effective criteria for cluster selection that could represent a significant improvement in bone pose estimation accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tadross, A. L.
2005-12-01
The main physical parameters; the cluster center, distance, radius, age, reddening, and visual absorbtion; have been re-estimated and improved for the open cluster NGC 7086. The metal abundance, galactic distances, membership richness, luminosity function, mass function, and the total mass of NGC 7086 have been examined for the first time here using Monet et al. (2003) catalog.
McGarvey, Richard; Burch, Paul; Matthews, Janet M
2016-01-01
Natural populations of plants and animals spatially cluster because (1) suitable habitat is patchy, and (2) within suitable habitat, individuals aggregate further into clusters of higher density. We compare the precision of random and systematic field sampling survey designs under these two processes of species clustering. Second, we evaluate the performance of 13 estimators for the variance of the sample mean from a systematic survey. Replicated simulated surveys, as counts from 100 transects, allocated either randomly or systematically within the study region, were used to estimate population density in six spatial point populations including habitat patches and Matérn circular clustered aggregations of organisms, together and in combination. The standard one-start aligned systematic survey design, a uniform 10 x 10 grid of transects, was much more precise. Variances of the 10 000 replicated systematic survey mean densities were one-third to one-fifth of those from randomly allocated transects, implying transect sample sizes giving equivalent precision by random survey would need to be three to five times larger. Organisms being restricted to patches of habitat was alone sufficient to yield this precision advantage for the systematic design. But this improved precision for systematic sampling in clustered populations is underestimated by standard variance estimators used to compute confidence intervals. True variance for the survey sample mean was computed from the variance of 10 000 simulated survey mean estimates. Testing 10 published and three newly proposed variance estimators, the two variance estimators (v) that corrected for inter-transect correlation (ν₈ and ν(W)) were the most accurate and also the most precise in clustered populations. These greatly outperformed the two "post-stratification" variance estimators (ν₂ and ν₃) that are now more commonly applied in systematic surveys. Similar variance estimator performance rankings were found with a second differently generated set of spatial point populations, ν₈ and ν(W) again being the best performers in the longer-range autocorrelated populations. However, no systematic variance estimators tested were free from bias. On balance, systematic designs bring more narrow confidence intervals in clustered populations, while random designs permit unbiased estimates of (often wider) confidence interval. The search continues for better estimators of sampling variance for the systematic survey mean.
Heudtlass, Peter; Guha-Sapir, Debarati; Speybroeck, Niko
2018-05-31
The crude death rate (CDR) is one of the defining indicators of humanitarian emergencies. When data from vital registration systems are not available, it is common practice to estimate the CDR from household surveys with cluster-sampling design. However, sample sizes are often too small to compare mortality estimates to emergency thresholds, at least in a frequentist framework. Several authors have proposed Bayesian methods for health surveys in humanitarian crises. Here, we develop an approach specifically for mortality data and cluster-sampling surveys. We describe a Bayesian hierarchical Poisson-Gamma mixture model with generic (weakly informative) priors that could be used as default in absence of any specific prior knowledge, and compare Bayesian and frequentist CDR estimates using five different mortality datasets. We provide an interpretation of the Bayesian estimates in the context of an emergency threshold and demonstrate how to interpret parameters at the cluster level and ways in which informative priors can be introduced. With the same set of weakly informative priors, Bayesian CDR estimates are equivalent to frequentist estimates, for all practical purposes. The probability that the CDR surpasses the emergency threshold can be derived directly from the posterior of the mean of the mixing distribution. All observation in the datasets contribute to the estimation of cluster-level estimates, through the hierarchical structure of the model. In a context of sparse data, Bayesian mortality assessments have advantages over frequentist ones already when using only weakly informative priors. More informative priors offer a formal and transparent way of combining new data with existing data and expert knowledge and can help to improve decision-making in humanitarian crises by complementing frequentist estimates.
Improving cluster-based missing value estimation of DNA microarray data.
Brás, Lígia P; Menezes, José C
2007-06-01
We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.
NASA Technical Reports Server (NTRS)
Amis, M. L.; Martin, M. V.; Mcguire, W. G.; Shen, S. S. (Principal Investigator)
1982-01-01
Studies completed in fiscal year 1981 in support of the clustering/classification and preprocessing activities of the Domestic Crops and Land Cover project. The theme throughout the study was the improvement of subanalysis district (usually county level) crop hectarage estimates, as reflected in the following three objectives: (1) to evaluate the current U.S. Department of Agriculture Statistical Reporting Service regression approach to crop area estimation as applied to the problem of obtaining subanalysis district estimates; (2) to develop and test alternative approaches to subanalysis district estimation; and (3) to develop and test preprocessing techniques for use in improving subanalysis district estimates.
Memory color assisted illuminant estimation through pixel clustering
NASA Astrophysics Data System (ADS)
Zhang, Heng; Quan, Shuxue
2010-01-01
The under constrained nature of illuminant estimation determines that in order to resolve the problem, certain assumptions are needed, such as the gray world theory. Including more constraints in this process may help explore the useful information in an image and improve the accuracy of the estimated illuminant, providing that the constraints hold. Based on the observation that most personal images have contents of one or more of the following categories: neutral objects, human beings, sky, and plants, we propose a method for illuminant estimation through the clustering of pixels of gray and three dominant memory colors: skin tone, sky blue, and foliage green. Analysis shows that samples of the above colors cluster around small areas under different illuminants and their characteristics can be used to effectively detect pixels falling into each of the categories. The algorithm requires the knowledge of the spectral sensitivity response of the camera, and a spectral database consisted of the CIE standard illuminants and reflectance or radiance database of samples of the above colors.
Planck 2015 results. XXIV. Cosmology from Sunyaev-Zeldovich cluster counts
NASA Astrophysics Data System (ADS)
Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Battye, R.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Weller, J.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.
2016-09-01
We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing of background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. Improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts
Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...
2016-09-20
In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ade, P. A. R.; Aghanim, N.; Arnaud, M.
In this work, we present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck cluster cosmology sample. The counts are consistent with those from 2013 and yield compatible constraints under the same modelling assumptions. Taking advantage of the larger catalogue, we extend our analysis to the two-dimensional distribution in redshift and signal-to-noise. We use mass estimates from two recent studies of gravitational lensing ofmore » background galaxies by Planck clusters to provide priors on the hydrostatic bias parameter, (1-b). In addition, we use lensing of cosmic microwave background (CMB) temperature fluctuations by Planck clusters as an independent constraint on this parameter. These various calibrations imply constraints on the present-day amplitude of matter fluctuations in varying degrees of tension with those from the Planck analysis of primary fluctuations in the CMB; for the lowest estimated values of (1-b) the tension is mild, only a little over one standard deviation, while it remains substantial (3.7σ) for the largest estimated value. We also examine constraints on extensions to the base flat ΛCDM model by combining the cluster and CMB constraints. The combination appears to favour non-minimal neutrino masses, but this possibility does little to relieve the overall tension because it simultaneously lowers the implied value of the Hubble parameter, thereby exacerbating the discrepancy with most current astrophysical estimates. In conclusion, improving the precision of cluster mass calibrations from the current 10%-level to 1% would significantly strengthen these combined analyses and provide a stringent test of the base ΛCDM model.« less
Information Filtering via Clustering Coefficients of User-Object Bipartite Networks
NASA Astrophysics Data System (ADS)
Guo, Qiang; Leng, Rui; Shi, Kerui; Liu, Jian-Guo
The clustering coefficient of user-object bipartite networks is presented to evaluate the overlap percentage of neighbors rating lists, which could be used to measure interest correlations among neighbor sets. The collaborative filtering (CF) information filtering algorithm evaluates a given user's interests in terms of his/her friends' opinions, which has become one of the most successful technologies for recommender systems. In this paper, different from the object clustering coefficient, users' clustering coefficients of user-object bipartite networks are introduced to improve the user similarity measurement. Numerical results for MovieLens and Netflix data sets show that users' clustering effects could enhance the algorithm performance. For MovieLens data set, the algorithmic accuracy, measured by the average ranking score, can be improved by 12.0% and the diversity could be improved by 18.2% and reach 0.649 when the recommendation list equals to 50. For Netflix data set, the accuracy could be improved by 14.5% at the optimal case and the popularity could be reduced by 13.4% comparing with the standard CF algorithm. Finally, we investigate the sparsity effect on the performance. This work indicates the user clustering coefficients is an effective factor to measure the user similarity, meanwhile statistical properties of user-object bipartite networks should be investigated to estimate users' tastes.
Intrinsic scatter of caustic masses and hydrostatic bias: An observational study
NASA Astrophysics Data System (ADS)
Andreon, S.; Trinchieri, G.; Moretti, A.; Wang, J.
2017-10-01
All estimates of cluster mass have some intrinsic scatter and perhaps some bias with true mass even in the absence of measurement errors for example caused by cluster triaxiality and large scale structure. Knowledge of the bias and scatter values is fundamental for both cluster cosmology and astrophysics. In this paper we show that the intrinsic scatter of a mass proxy can be constrained by measurements of the gas fraction because masses with higher values of intrinsic scatter with true mass produce more scattered gas fractions. Moreover, the relative bias of two mass estimates can be constrained by comparing the mean gas fraction at the same (nominal) cluster mass. Our observational study addresses the scatter between caustic (I.e., dynamically estimated) and true masses, and the relative bias of caustic and hydrostatic masses. For these purposes, we used the X-ray Unbiased Cluster Sample, a cluster sample selected independently from the intracluster medium content with reliable masses: 34 galaxy clusters in the nearby (0.050 < z < 0.135) Universe, mostly with 14 < log M500/M⊙ ≲ 14.5, and with caustic masses. We found a 35% scatter between caustic and true masses. Furthermore, we found that the relative bias between caustic and hydrostatic masses is small, 0.06 ± 0.05 dex, improving upon past measurements. The small scatter found confirms our previous measurements of a highly variable amount of feedback from cluster to cluster, which is the cause of the observed large variety of core-excised X-ray luminosities and gas masses.
Hierarchical clustering method for improved prostate cancer imaging in diffuse optical tomography
NASA Astrophysics Data System (ADS)
Kavuri, Venkaiah C.; Liu, Hanli
2013-03-01
We investigate the feasibility of trans-rectal near infrared (NIR) based diffuse optical tomography (DOT) for early detection of prostate cancer using a transrectal ultrasound (TRUS) compatible imaging probe. For this purpose, we designed a TRUS-compatible, NIR-based image system (780nm), in which the photo diodes were placed on the trans-rectal probe. DC signals were recorded and used for estimating the absorption coefficient. We validated the system using laboratory phantoms. For further improvement, we also developed a hierarchical clustering method (HCM) to improve the accuracy of image reconstruction with limited prior information. We demonstrated the method using computer simulations laboratory phantom experiments.
Integrated spectral properties of 7 galactic open clusters
NASA Astrophysics Data System (ADS)
Ahumada, A. V.; Clariá, J. J.; Bica, E.; Piatti, A. E.
2000-01-01
This paper presents flux-calibrated integrated spectra in the range 3600-9000 Ä for 7 concentrated, relatively populous Galactic open clusters. We perform simultaneous estimates of age and foreground interstellar reddening by comparing the continuum distribution and line strengths of the cluster spectra with those of template cluster spectra with known parameters. For five clusters these two parameters have been determined for the first time (Ruprecht 144, BH 132, Pismis 21, Lyng\\aa 11 and BH 217), while the results here derived for the remaining two clusters (Hogg 15 and Melotte 105) show very good agreement with previous studies based mainly on colour-magnitude diagrams. We also provide metallicity estimates for six clusters from the equivalent widths of CaII triplet and TiO features. The present cluster sample improves the age resolution around solar metal content in the cluster spectral library for population synthesis. We compare the properties of the present sample with those of clusters in similar directions. Hogg 15 and Pismis 21 are among the most reddened clusters in sectors centered at l = 270o and l = 0o, respectively. Besides, the present results would favour an important dissolution rate of star clusters in these zones. Based on observations made at Complejo Astronómico El Leoncito, which is operated under agreement between the Consejo Nacional de Investigaciones Científicas y Técnicas de la República Argentina and the National Universities of La Plata, Córdoba and San Juan, Argentina.
NASA Technical Reports Server (NTRS)
Chapman, G. M. (Principal Investigator); Carnes, J. G.
1981-01-01
Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.
NASA Astrophysics Data System (ADS)
Mokdad, Fatiha; Haddad, Boualem
2017-06-01
In this paper, two new infrared precipitation estimation approaches based on the concept of k-means clustering are first proposed, named the NAW-Kmeans and the GPI-Kmeans methods. Then, they are adapted to the southern Mediterranean basin, where the subtropical climate prevails. The infrared data (10.8 μm channel) acquired by MSG-SEVIRI sensor in winter and spring 2012 are used. Tests are carried out in eight areas distributed over northern Algeria: Sebra, El Bordj, Chlef, Blida, Bordj Menael, Sidi Aich, Beni Ourthilane, and Beni Aziz. The validation is performed by a comparison of the estimated rainfalls to rain gauges observations collected by the National Office of Meteorology in Dar El Beida (Algeria). Despite the complexity of the subtropical climate, the obtained results indicate that the NAW-Kmeans and the GPI-Kmeans approaches gave satisfactory results for the considered rain rates. Also, the proposed schemes lead to improvement in precipitation estimation performance when compared to the original algorithms NAW (Nagri, Adler, and Wetzel) and GPI (GOES Precipitation Index).
Xing, Jian; Burkom, Howard; Moniz, Linda; Edgerton, James; Leuze, Michael; Tokars, Jerome
2009-01-01
Background The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. Methods The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Results Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. Conclusion The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution or other syndromes can yield further improvement. PMID:19615075
NASA Astrophysics Data System (ADS)
Wang, Audrey; Price, David T.
2007-03-01
A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.
Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi
2015-01-01
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability--the basis of cluster generation--is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.
Rutterford, Clare; Taljaard, Monica; Dixon, Stephanie; Copas, Andrew; Eldridge, Sandra
2015-06-01
To assess the quality of reporting and accuracy of a priori estimates used in sample size calculations for cluster randomized trials (CRTs). We reviewed 300 CRTs published between 2000 and 2008. The prevalence of reporting sample size elements from the 2004 CONSORT recommendations was evaluated and a priori estimates compared with those observed in the trial. Of the 300 trials, 166 (55%) reported a sample size calculation. Only 36 of 166 (22%) reported all recommended descriptive elements. Elements specific to CRTs were the worst reported: a measure of within-cluster correlation was specified in only 58 of 166 (35%). Only 18 of 166 articles (11%) reported both a priori and observed within-cluster correlation values. Except in two cases, observed within-cluster correlation values were either close to or less than a priori values. Even with the CONSORT extension for cluster randomization, the reporting of sample size elements specific to these trials remains below that necessary for transparent reporting. Journal editors and peer reviewers should implement stricter requirements for authors to follow CONSORT recommendations. Authors should report observed and a priori within-cluster correlation values to enable comparisons between these over a wider range of trials. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Effects of additional data on Bayesian clustering.
Yamazaki, Keisuke
2017-10-01
Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Pillepich, Annalisa; Porciani, Cristiano; Reiprich, Thomas H.
2012-05-01
Starting in late 2013, the eRosita telescope will survey the X-ray sky with unprecedented sensitivity. Assuming a detection limit of 50 photons in the (0.5-2.0) keV energy band with a typical exposure time of 1.6 ks, we predict that eRosita will detect ˜9.3 × 104 clusters of galaxies more massive than 5 × 1013 h-1 M⊙, with the currently planned all-sky survey. Their median redshift will be z≃ 0.35. We perform a Fisher-matrix analysis to forecast the constraining power of ? on the Λ cold dark matter (ΛCDM) cosmology and, simultaneously, on the X-ray scaling relations for galaxy clusters. Special attention is devoted to the possibility of detecting primordial non-Gaussianity. We consider two experimental probes: the number counts and the angular clustering of a photon-count limited sample of clusters. We discuss how the cluster sample should be split to optimize the analysis and we show that redshift information of the individual clusters is vital to break the strong degeneracies among the model parameters. For example, performing a 'tomographic' analysis based on photometric-redshift estimates and combining one- and two-point statistics will give marginal 1σ errors of Δσ8≃ 0.036 and ΔΩm≃ 0.012 without priors, and improve the current estimates on the slope of the luminosity-mass relation by a factor of 3. Regarding primordial non-Gaussianity, ? clusters alone will give ΔfNL≃ 9, 36 and 144 for the local, orthogonal and equilateral model, respectively. Measuring redshifts with spectroscopic accuracy would further tighten the constraints by nearly 40 per cent (barring fNL which displays smaller improvements). Finally, combining ? data with the analysis of temperature anisotropies in the cosmic microwave background by the Planck satellite should give sensational constraints on both the cosmology and the properties of the intracluster medium.
Centre-excised X-ray luminosity as an efficient mass proxy for future galaxy cluster surveys
Mantz, Adam B.; Allen, Steven W.; Morris, R. Glenn; ...
2017-10-02
The cosmological constraining power of modern galaxy cluster catalogues can be improved by obtaining low-scatter mass proxy measurements for even a small fraction of sources. In the context of large upcoming surveys that will reveal the cluster population down to the group scale and out to high redshifts, efficient strategies for obtaining such mass proxies will be valuable. Here in this work, we use high-quality weak-lensing and X-ray mass estimates for massive clusters in current X-ray-selected catalogues to revisit the scaling relations of the projected, centre-excised X-ray luminosity (L ce), which previous work suggests correlates tightly with total mass. Ourmore » data confirm that this is the case with Lce having an intrinsic scatter at fixed mass comparable to that of gas mass, temperature or YX. Compared to the other proxies, however, Lce is less susceptible to systematic uncertainties due to background modelling, and can be measured precisely with shorter exposures. This opens up the possibility of using L ce to estimate masses for large numbers of clusters discovered by new X-ray surveys (e.g. eROSITA) directly from the survey data, as well as for clusters discovered at other wavelengths with relatively short follow-up observations. We describe a simple procedure for making such estimates from X-ray surface brightness data, and comment on the spatial resolution required to apply this method as a function of cluster mass and redshift. Lastly, we also explore the potential impact of Chandra and XMM–Newton follow-up observations over the next decade on dark energy constraints from new cluster surveys.« less
Centre-excised X-ray luminosity as an efficient mass proxy for future galaxy cluster surveys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mantz, Adam B.; Allen, Steven W.; Morris, R. Glenn
The cosmological constraining power of modern galaxy cluster catalogues can be improved by obtaining low-scatter mass proxy measurements for even a small fraction of sources. In the context of large upcoming surveys that will reveal the cluster population down to the group scale and out to high redshifts, efficient strategies for obtaining such mass proxies will be valuable. Here in this work, we use high-quality weak-lensing and X-ray mass estimates for massive clusters in current X-ray-selected catalogues to revisit the scaling relations of the projected, centre-excised X-ray luminosity (L ce), which previous work suggests correlates tightly with total mass. Ourmore » data confirm that this is the case with Lce having an intrinsic scatter at fixed mass comparable to that of gas mass, temperature or YX. Compared to the other proxies, however, Lce is less susceptible to systematic uncertainties due to background modelling, and can be measured precisely with shorter exposures. This opens up the possibility of using L ce to estimate masses for large numbers of clusters discovered by new X-ray surveys (e.g. eROSITA) directly from the survey data, as well as for clusters discovered at other wavelengths with relatively short follow-up observations. We describe a simple procedure for making such estimates from X-ray surface brightness data, and comment on the spatial resolution required to apply this method as a function of cluster mass and redshift. Lastly, we also explore the potential impact of Chandra and XMM–Newton follow-up observations over the next decade on dark energy constraints from new cluster surveys.« less
Multiple Regression Redshift Calibration for Clusters of Galaxies
NASA Astrophysics Data System (ADS)
Kalinkov, M.; Kuneva, I.; Valtchanov, I.
A new procedure for calibration of distances to ACO (Abell et al.1989) clusters of galaxies has been developed. In the previous version of the Reference Catalog of ACO Clusters of Galaxies (Kalinkov & Kuneva 1992) an attempt has been made to compare various calibration schemes. For the Version 93 we have made some refinements. Many improvements from the early days of the photometric calibration have been made --- from Rowan-Robinson (1972), Corwin (1974), Kalinkov & Kuneva (1975), Mills Hoskins (1977) to more complicated --- Leir & van den Bergh (1977), Postman et al.(1985), Kalinkov Kuneva (1985, 1986, 1990), Scaramella et al.(1991), Zucca et al. (1993). It was shown that it is impossible to use the same calibration relation for northern (A) and southern (ACO) clusters of galaxies. Therefore the calibration have to be made separately for both catalogs. Moreover it is better if one could find relations for the 274 A-clusters, studied by the authors of ACO. We use the luminosity distance for H0=100km/s/Mpc and q0 = 0.5 and we have 1200 clusters with measured redshifts. The first step is to fit log(z) on m10 (magnitude of the tenth rank galaxy) for A-clusters and on m1, m3 and m10 for ACO clusters. The second step is to take into account the K-correction and the Scott effect (Postman et al.1985) with iterative process. To avoid the initial errors of the redshift estimates in A- and ACO catalogs we adopt Hubble's law for the apparent radial distribution of galaxies in clusters. This enable us to calculate a new cluster richness from preliminary redshift estimate. This is the third step. Further continues the study of the correlation matrix between log(z) and prospective predictors --- new richness groups, BM, RS and A types, radio and X-ray fluxes, apparent separations between the first three brightest galaxies, mean population (gal/sq.deg), Multiple linear as well as nonlinear regression estimators are found. Many clusters that deviate by more than 2.5 sigmas are rejected. Each case is examined for observational errors, substructuring, foreground and background. Some of the clusters are doubtful --- most probably they have to be excluded from the catalogs. The multiple regressions allow us to estimate redshift in the range 0.02 to 0.2 with an error of 7 percent.
Using scan statistics for congenital anomalies surveillance: the EUROCAT methodology.
Teljeur, Conor; Kelly, Alan; Loane, Maria; Densem, James; Dolk, Helen
2015-11-01
Scan statistics have been used extensively to identify temporal clusters of health events. We describe the temporal cluster detection methodology adopted by the EUROCAT (European Surveillance of Congenital Anomalies) monitoring system. Since 2001, EUROCAT has implemented variable window width scan statistic for detecting unusual temporal aggregations of congenital anomaly cases. The scan windows are based on numbers of cases rather than being defined by time. The methodology is imbedded in the EUROCAT Central Database for annual application to centrally held registry data. The methodology was incrementally adapted to improve the utility and to address statistical issues. Simulation exercises were used to determine the power of the methodology to identify periods of raised risk (of 1-18 months). In order to operationalize the scan methodology, a number of adaptations were needed, including: estimating date of conception as unit of time; deciding the maximum length (in time) and recency of clusters of interest; reporting of multiple and overlapping significant clusters; replacing the Monte Carlo simulation with a lookup table to reduce computation time; and placing a threshold on underlying population change and estimating the false positive rate by simulation. Exploration of power found that raised risk periods lasting 1 month are unlikely to be detected except when the relative risk and case counts are high. The variable window width scan statistic is a useful tool for the surveillance of congenital anomalies. Numerous adaptations have improved the utility of the original methodology in the context of temporal cluster detection in congenital anomalies.
Solution of the sign problem in the Potts model at fixed fermion number
NASA Astrophysics Data System (ADS)
Alexandru, Andrei; Bergner, Georg; Schaich, David; Wenger, Urs
2018-06-01
We consider the heavy-dense limit of QCD at finite fermion density in the canonical formulation and approximate it by a three-state Potts model. In the strong-coupling limit, the model is free of the sign problem. Away from the strong coupling, the sign problem is solved by employing a cluster algorithm which allows to average each cluster over the Z (3 ) sectors. Improved estimators for physical quantities can be constructed by taking into account the triality of the clusters, that is, their transformation properties with respect to Z (3 ) transformations.
State estimation and prediction using clustered particle filters.
Lee, Yoonsang; Majda, Andrew J
2016-12-20
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.
State estimation and prediction using clustered particle filters
Lee, Yoonsang; Majda, Andrew J.
2016-01-01
Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332
NASA Astrophysics Data System (ADS)
Cui, Jia; Hong, Bei; Jiang, Xuepeng; Chen, Qinghua
2017-05-01
With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rozo, Eduardo; /U. Chicago /Chicago U., KICP; Wu, Hao-Yi
2011-11-04
When extracting the weak lensing shear signal, one may employ either locally normalized or globally normalized shear estimators. The former is the standard approach when estimating cluster masses, while the latter is the more common method among peak finding efforts. While both approaches have identical signal-to-noise in the weak lensing limit, it is possible that higher order corrections or systematic considerations make one estimator preferable over the other. In this paper, we consider the efficacy of both estimators within the context of stacked weak lensing mass estimation in the Dark Energy Survey (DES). We find that the two estimators havemore » nearly identical statistical precision, even after including higher order corrections, but that these corrections must be incorporated into the analysis to avoid observationally relevant biases in the recovered masses. We also demonstrate that finite bin-width effects may be significant if not properly accounted for, and that the two estimators exhibit different systematics, particularly with respect to contamination of the source catalog by foreground galaxies. Thus, the two estimators may be employed as a systematic cross-check of each other. Stacked weak lensing in the DES should allow for the mean mass of galaxy clusters to be calibrated to {approx}2% precision (statistical only), which can improve the figure of merit of the DES cluster abundance experiment by a factor of {approx}3 relative to the self-calibration expectation. A companion paper investigates how the two types of estimators considered here impact weak lensing peak finding efforts.« less
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi
2015-01-01
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. PMID:26339613
Blier, Pierre; Gommoll, Carl; Chen, Changzheng; Kramer, Kenneth
2017-03-01
To evaluate the effects of levomilnacipran extended-release (LVM-ER; 40-120mg/day) on noradrenergic (NA) and anxiety-related symptoms in adults with major depressive disorder (MDD) and explore the relationship between these symptoms and functional impairment. Data were pooled from 5 randomized, double-blind, placebo-controlled trials (N=2598). Anxiety and NA Cluster scores were developed by adding selected item scores from the Montgomery-Åsberg Depression Rating Scale (MADRS) and 17-item Hamilton Depression Rating Scale (HAMD 17 ). A path analysis was conducted to estimate the direct effects of LVM-ER on functional impairment (Sheehan Disability Scale [SDS] total score) and the indirect effects through changes in NA and Anxiety Cluster scores. Mean improvements from baseline in NA and Anxiety Cluster scores were significantly greater with LVM-ER versus placebo (both P<0.001), as were the response rates (≥50% score improvement): NA Cluster (44% vs 34%; odds ratio=1.56; P<0.0001); Anxiety Cluster (39% vs 36%; odds ratio=1.19; P=0.041). Mean improvement in SDS total score was also significantly greater with LVM-ER versus placebo (-7.3 vs -5.6; P<0.0001). LVM-ER had an indirect effect on change in SDS total score that was mediated more strongly through NA Cluster score change (86%) than Anxiety Cluster score change (18%); the direct effect was negligible. NA and Anxiety Cluster scores, developed based on the face validity of individual MADRS and HAMD 17 items, were not predefined as efficacy outcomes in any of the studies. In adults with MDD, LVM-ER indirectly improved functional impairment mainly through improvements in NA symptoms and less so via anxiety symptoms. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
A new estimate of the Hubble constant using the Virgo cluster distance
NASA Astrophysics Data System (ADS)
Visvanathan, N.
The Hubble constant, which defines the size and age of the universe, remains substantially uncertain. Attention is presently given to an improved distance to the Virgo Cluster obtained by means of the 1.05-micron luminosity-H I width relation of spirals. In order to improve the absolute calibration of the relation, accurate distances to the nearby SMC, LMC, N6822, SEX A and N300 galaxies have also been obtained, on the basis of the near-IR P-L relation of the Cepheids. A value for the global Hubble constant of 67 + or 4 km/sec per Mpc is obtained.
As-built design specification for proportion estimate software subsystem
NASA Technical Reports Server (NTRS)
Obrien, S. (Principal Investigator)
1980-01-01
The Proportion Estimate Processor evaluates four estimation techniques in order to get an improved estimate of the proportion of a scene that is planted in a selected crop. The four techniques to be evaluated were provided by the techniques development section and are: (1) random sampling; (2) proportional allocation, relative count estimate; (3) proportional allocation, Bayesian estimate; and (4) sequential Bayesian allocation. The user is given two options for computation of the estimated mean square error. These are referred to as the cluster calculation option and the segment calculation option. The software for the Proportion Estimate Processor is operational on the IBM 3031 computer.
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing
Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh; ...
2018-01-04
Here, we study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halomore » shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.« less
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh
Here, we study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halomore » shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.« less
The ellipticity of galaxy cluster haloes from satellite galaxies and weak lensing
NASA Astrophysics Data System (ADS)
Shin, Tae-hyeon; Clampitt, Joseph; Jain, Bhuvnesh; Bernstein, Gary; Neil, Andrew; Rozo, Eduardo; Rykoff, Eli
2018-04-01
We study the ellipticity of galaxy cluster haloes as characterized by the distribution of cluster galaxies and as measured with weak lensing. We use Monte Carlo simulations of elliptical cluster density profiles to estimate and correct for Poisson noise bias, edge bias and projection effects. We apply our methodology to 10 428 Sloan Digital Sky Survey clusters identified by the redMaPPer algorithm with richness above 20. We find a mean ellipticity =0.271 ± 0.002 (stat) ±0.031 (sys) corresponding to an axis ratio = 0.573 ± 0.002 (stat) ±0.039 (sys). We compare this ellipticity of the satellites to the halo shape, through a stacked lensing measurement using optimal estimators of the lensing quadrupole based on Clampitt and Jain (2016). We find a best-fitting axis ratio of 0.56 ± 0.09 (stat) ±0.03 (sys), consistent with the ellipticity of the satellite distribution. Thus, cluster galaxies trace the shape of the dark matter halo to within our estimated uncertainties. Finally, we restack the satellite and lensing ellipticity measurements along the major axis of the cluster central galaxy's light distribution. From the lensing measurements, we infer a misalignment angle with an root-mean-square of 30° ± 10° when stacking on the central galaxy. We discuss applications of halo shape measurements to test the effects of the baryonic gas and active galactic nucleus feedback, as well as dark matter and gravity. The major improvements in signal-to-noise ratio expected with the ongoing Dark Energy Survey and future surveys from Large Synoptic Survey Telescope, Euclid, and Wide Field Infrared Survey Telescope will make halo shapes a useful probe of these effects.
MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umetsu, Keiichi
2013-05-20
Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less
A nonparametric clustering technique which estimates the number of clusters
NASA Technical Reports Server (NTRS)
Ramey, D. B.
1983-01-01
In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Fretheim, Atle; Soumerai, Stephen B; Zhang, Fang; Oxman, Andrew D; Ross-Degnan, Dennis
2013-08-01
We reanalyzed the data from a cluster-randomized controlled trial (C-RCT) of a quality improvement intervention for prescribing antihypertensive medication. Our objective was to estimate the effectiveness of the intervention using both interrupted time-series (ITS) and RCT methods, and to compare the findings. We first conducted an ITS analysis using data only from the intervention arm of the trial because our main objective was to compare the findings from an ITS analysis with the findings from the C-RCT. We used segmented regression methods to estimate changes in level or slope coincident with the intervention, controlling for baseline trend. We analyzed the C-RCT data using generalized estimating equations. Last, we estimated the intervention effect by including data from both study groups and by conducting a controlled ITS analysis of the difference between the slope and level changes in the intervention and control groups. The estimates of absolute change resulting from the intervention were ITS analysis, 11.5% (95% confidence interval [CI]: 9.5, 13.5); C-RCT, 9.0% (95% CI: 4.9, 13.1); and the controlled ITS analysis, 14.0% (95% CI: 8.6, 19.4). ITS analysis can provide an effect estimate that is concordant with the results of a cluster-randomized trial. A broader range of comparisons from other RCTs would help to determine whether these are generalizable results. Copyright © 2013 Elsevier Inc. All rights reserved.
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Comparison of four statistical and machine learning methods for crash severity prediction.
Iranitalab, Amirfarrokh; Khattak, Aemal
2017-11-01
Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Vavagiakis, Eve Marie; De Bernardis, Francesco; Aiola, Simone; Battaglia, Nicholas; Niemack, Michael D.; ACTPol Collaboration
2017-06-01
We have made improved measurements of the kinematic Sunyaev-Zel’dovich (kSZ) effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). We used a map of the Cosmic Microwave Background (CMB) from two seasons of observations each by ACT and the Atacama Cosmology Telescope Polarimeter (ACTPol) receiver. We evaluated the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog via 600 square degrees of overlapping sky area. The measurement of the kSZ signal arising from the large-scale motions of clusters was made by fitting data to an analytical model. The free parameter of the fit determined the optical depth to microwave photon scattering for the cluster sample. We estimated the covariance matrix of the mean pairwise momentum as a function of galaxy separation using CMB simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based uncertainties gave signal-to-noise estimates between 3.6 and 4.1 for various luminosity cuts. Additionally, we explored a novel approach to estimating cluster optical depths from the average thermal Sunyaev-Zel’dovich (tSZ) signal at the BOSS DR11 catalog positions. Our results were broadly consistent with those obtained from the kSZ signal. In the future, the tSZ signal may provide a valuable probe of cluster optical depths, enabling the extraction of velocities from the kSZ sourced mean pairwise momenta. New CMB maps from three seasons of ACTPol observations with multi-frequency coverage overlap with nearly four times as many DR11 sources and promise to improve statistics and systematics for SZ measurements. With these and other upcoming data, the pairwise kSZ signal is poised to become a powerful new cosmological tool, able to probe large physical scales to inform neutrino physics and test models of modified gravity and dark energy.
Three estimates of the association between linear growth failure and cognitive ability.
Cheung, Y B; Lam, K F
2009-09-01
To compare three estimators of association between growth stunting as measured by height-for-age Z-score and cognitive ability in children, and to examine the extent statistical adjustment for covariates is useful for removing confounding due to socio-economic status. Three estimators, namely random-effects, within- and between-cluster estimators, for panel data were used to estimate the association in a survey of 1105 pairs of siblings who were assessed for anthropometry and cognition. Furthermore, a 'combined' model was formulated to simultaneously provide the within- and between-cluster estimates. Random-effects and between-cluster estimators showed strong association between linear growth and cognitive ability, even after adjustment for a range of socio-economic variables. In contrast, the within-cluster estimator showed a much more modest association: For every increase of one Z-score in linear growth, cognitive ability increased by about 0.08 standard deviation (P < 0.001). The combined model verified that the between-cluster estimate was significantly larger than the within-cluster estimate (P = 0.004). Residual confounding by socio-economic situations may explain a substantial proportion of the observed association between linear growth and cognition in studies that attempt to control the confounding by means of multivariable regression analysis. The within-cluster estimator provides more convincing and modest results about the strength of association.
Julie, E Golden; Selvi, S Tamil
2016-01-01
Wireless sensor networks (WSNs) consist of sensor nodes with limited processing capability and limited nonrechargeable battery power. Energy consumption in WSN is a significant issue in networks for improving network lifetime. It is essential to develop an energy aware clustering protocol in WSN to reduce energy consumption for increasing network lifetime. In this paper, a neuro-fuzzy energy aware clustering scheme (NFEACS) is proposed to form optimum and energy aware clusters. NFEACS consists of two parts: fuzzy subsystem and neural network system that achieved energy efficiency in forming clusters and cluster heads in WSN. NFEACS used neural network that provides effective training set related to energy and received signal strength of all nodes to estimate the expected energy for tentative cluster heads. Sensor nodes with higher energy are trained with center location of base station to select energy aware cluster heads. Fuzzy rule is used in fuzzy logic part that inputs to form clusters. NFEACS is designed for WSN handling mobility of node. The proposed scheme NFEACS is compared with related clustering schemes, cluster-head election mechanism using fuzzy logic, and energy aware fuzzy unequal clustering. The experiment results show that NFEACS performs better than the other related schemes.
Julie, E. Golden; Selvi, S. Tamil
2016-01-01
Wireless sensor networks (WSNs) consist of sensor nodes with limited processing capability and limited nonrechargeable battery power. Energy consumption in WSN is a significant issue in networks for improving network lifetime. It is essential to develop an energy aware clustering protocol in WSN to reduce energy consumption for increasing network lifetime. In this paper, a neuro-fuzzy energy aware clustering scheme (NFEACS) is proposed to form optimum and energy aware clusters. NFEACS consists of two parts: fuzzy subsystem and neural network system that achieved energy efficiency in forming clusters and cluster heads in WSN. NFEACS used neural network that provides effective training set related to energy and received signal strength of all nodes to estimate the expected energy for tentative cluster heads. Sensor nodes with higher energy are trained with center location of base station to select energy aware cluster heads. Fuzzy rule is used in fuzzy logic part that inputs to form clusters. NFEACS is designed for WSN handling mobility of node. The proposed scheme NFEACS is compared with related clustering schemes, cluster-head election mechanism using fuzzy logic, and energy aware fuzzy unequal clustering. The experiment results show that NFEACS performs better than the other related schemes. PMID:26881269
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qishi; Berry, M. L..; Grieme, M.
We propose a localization-based radiation source detection (RSD) algorithm using the Ratio of Squared Distance (ROSD) method. Compared with the triangulation-based method, the advantages of this ROSD method are multi-fold: i) source location estimates based on four detectors improve their accuracy, ii) ROSD provides closed-form source location estimates and thus eliminates the imaginary-roots issue, and iii) ROSD produces a unique source location estimate as opposed to two real roots (if any) in triangulation, and obviates the need to identify real phantom roots during clustering.
Probing the dynamical and X-ray mass proxies of the cluster of galaxies Abell S1101
NASA Astrophysics Data System (ADS)
Rabitz, Andreas; Zhang, Yu-Ying; Schwope, Axel; Verdugo, Miguel; Reiprich, Thomas H.; Klein, Matthias
2017-01-01
Context. The galaxy cluster Abell S1101 (S1101 hereafter) deviates significantly from the X-ray luminosity versus velocity dispersion relation (L-σ) of galaxy clusters in our previous study. Given reliable X-ray luminosity measurement combining XMM-Newton and ROSAT, this could most likely be caused by the bias in the velocity dispersion due to interlopers and low member statistic in the previous sample of member galaxies, which was solely based on 20 galaxy redshifts drawn from the literature. Aims: We intend to increase the galaxy member statistics to perform precision measurements of the velocity dispersion and dynamical mass of S1101. We aim for a detailed substructure and dynamical state characterization of this cluster, and a comparison of mass estimates derived from (I) the velocity dispersion (Mvir), (II) the caustic mass computation (Mcaustic), and (III) mass proxies from X-ray observations and the Sunyaev-Zel'dovich (SZ) effect. Methods: We carried out new optical spectroscopic observations of the galaxies in this cluster field with VIMOS, obtaining a sample of 60 member galaxies for S1101. We revised the cluster redshift and velocity dispersion measurements based on this sample and also applied the Dressler-Shectman substructure test. Results: The completeness of cluster members within r200 was significantly improved for this cluster. Tests for dynamical substructure do not show evidence of major disturbances or merging activities in S1101. We find good agreement between the dynamical cluster mass measurements and X-ray mass estimates, which confirms the relaxed state of the cluster displayed in the 2D substructure test. The SZ mass proxy is slightly higher than the other estimates. The updated measurement of σ erased the deviation of S1101 in the L-σ relation. We also noticed a background structure in the cluster field of S1101. This structure is a galaxy group that is very close to the cluster S1101 in projection but at almost twice its redshift. However the mass of this structure is too low to significantly bias the observed bolometric X-ray luminosity of S1101. Hence, we can conclude that the deviation of S1101 in the L-σ relation in our previous study can be explained by low member statistics and galaxy interlopers, which are known to introduce biases in the estimated velocity dispersion. We have made use of VLT/VIMOS observations taken with the ESO Telescope at the Paranal Observatory under programme 087.A-0096.
NASA Software Cost Estimation Model: An Analogy Based Estimation Model
NASA Technical Reports Server (NTRS)
Hihn, Jairus; Juster, Leora; Menzies, Tim; Mathew, George; Johnson, James
2015-01-01
The cost estimation of software development activities is increasingly critical for large scale integrated projects such as those at DOD and NASA especially as the software systems become larger and more complex. As an example MSL (Mars Scientific Laboratory) developed at the Jet Propulsion Laboratory launched with over 2 million lines of code making it the largest robotic spacecraft ever flown (Based on the size of the software). Software development activities are also notorious for their cost growth, with NASA flight software averaging over 50% cost growth. All across the agency, estimators and analysts are increasingly being tasked to develop reliable cost estimates in support of program planning and execution. While there has been extensive work on improving parametric methods there is very little focus on the use of models based on analogy and clustering algorithms. In this paper we summarize our findings on effort/cost model estimation and model development based on ten years of software effort estimation research using data mining and machine learning methods to develop estimation models based on analogy and clustering. The NASA Software Cost Model performance is evaluated by comparing it to COCOMO II, linear regression, and K- nearest neighbor prediction model performance on the same data set.
Principles of proportional recovery after stroke generalize to neglect and aphasia.
Marchi, N A; Ptak, R; Di Pietro, M; Schnider, A; Guggisberg, A G
2017-08-01
Motor recovery after stroke can be characterized into two different patterns. A majority of patients recover about 70% of initial impairment, whereas some patients with severe initial deficits show little or no improvement. Here, we investigated whether recovery from visuospatial neglect and aphasia is also separated into two different groups and whether similar proportions of recovery can be expected for the two cognitive functions. We assessed 35 patients with neglect and 14 patients with aphasia at 3 weeks and 3 months after stroke using standardized tests. Recovery patterns were classified with hierarchical clustering and the proportion of recovery was estimated from initial impairment using a linear regression analysis. Patients were reliably clustered into two different groups. For patients in the first cluster (n = 40), recovery followed a linear model where improvement was proportional to initial impairment and achieved 71% of maximal possible recovery for both cognitive deficits. Patients in the second cluster (n = 9) exhibited poor recovery (<25% of initial impairment). Our findings indicate that improvement from neglect or aphasia after stroke shows the same dichotomy and proportionality as observed in motor recovery. This is suggestive of common underlying principles of plasticity, which apply to motor and cognitive functions. © 2017 EAN.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.
2007-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Progeny Clustering: A Method to Identify Biological Phenotypes
Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.
2015-01-01
Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Improved phase arrival estimate and location for local earthquakes in South Korea
NASA Astrophysics Data System (ADS)
Morton, E. A.; Rowe, C. A.; Begnaud, M. L.
2012-12-01
The Korean Institute of Geoscience and Mineral Resources (KIGAM) and the Korean Meteorological Agency (KMA) regularly report local (distance < ~1200 km) seismicity recorded with their networks; we obtain preliminary event location estimates as well as waveform data, but no phase arrivals are reported, so the data are not immediately useful for earthquake location. Our goal is to identify seismic events that are sufficiently well-located to provide accurate seismic travel-time information for events within the KIGAM and KMA networks, and also recorded by some regional stations. Toward that end, we are using a combination of manual phase identification and arrival-time picking, with waveform cross-correlation, to cluster events that have occurred in close proximity to one another, which allows for improved phase identification by comparing the highly correlating waveforms. We cross-correlate the known events with one another on 5 seismic stations and cluster events that correlate above a correlation coefficient threshold of 0.7, which reveals few clusters containing few events each. The small number of repeating events suggests that the online catalogs have had mining and quarry blasts removed before publication, as these can contribute significantly to repeating seismic sources in relatively aseismic regions such as South Korea. The dispersed source locations in our catalog, however, are ideal for seismic velocity modeling by providing superior sampling through the dense seismic station arrangement, which produces favorable event-to-station ray path coverage. Following careful manual phase picking on 104 events chosen to provide adequate ray coverage, we re-locate the events to obtain improved source coordinates. The re-located events are used with Thurber's Simul2000 pseudo-bending local tomography code to estimate the crustal structure on the Korean Peninsula, which is an important contribution to ongoing calibration for events of interest in the region.
Computer aided detection of clusters of microcalcifications on full field digital mammograms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ge Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.
2006-08-15
We are developing a computer-aided detection (CAD) system to identify microcalcification clusters (MCCs) automatically on full field digital mammograms (FFDMs). The CAD system includes six stages: preprocessing; image enhancement; segmentation of microcalcification candidates; false positive (FP) reduction for individual microcalcifications; regional clustering; and FP reduction for clustered microcalcifications. At the stage of FP reduction for individual microcalcifications, a truncated sum-of-squares error function was used to improve the efficiency and robustness of the training of an artificial neural network in our CAD system for FFDMs. At the stage of FP reduction for clustered microcalcifications, morphological features and features derived from themore » artificial neural network outputs were extracted from each cluster. Stepwise linear discriminant analysis (LDA) was used to select the features. An LDA classifier was then used to differentiate clustered microcalcifications from FPs. A data set of 96 cases with 192 images was collected at the University of Michigan. This data set contained 96 MCCs, of which 28 clusters were proven by biopsy to be malignant and 68 were proven to be benign. The data set was separated into two independent data sets for training and testing of the CAD system in a cross-validation scheme. When one data set was used to train and validate the convolution neural network (CNN) in our CAD system, the other data set was used to evaluate the detection performance. With the use of a truncated error metric, the training of CNN could be accelerated and the classification performance was improved. The CNN in combination with an LDA classifier could substantially reduce FPs with a small tradeoff in sensitivity. By using the free-response receiver operating characteristic methodology, it was found that our CAD system can achieve a cluster-based sensitivity of 70, 80, and 90 % at 0.21, 0.61, and 1.49 FPs/image, respectively. For case-based performance evaluation, a sensitivity of 70, 80, and 90 % can be achieved at 0.07, 0.17, and 0.65 FPs/image, respectively. We also used a data set of 216 mammograms negative for clustered microcalcifications to further estimate the FP rate of our CAD system. The corresponding FP rates were 0.15, 0.31, and 0.86 FPs/image for cluster-based detection when negative mammograms were used for estimation of FP rates.« less
Empirical entropic contributions in computational docking: evaluation in APS reductase complexes.
Chang, Max W; Belew, Richard K; Carroll, Kate S; Olson, Arthur J; Goodsell, David S
2008-08-01
The results from reiterated docking experiments may be used to evaluate an empirical vibrational entropy of binding in ligand-protein complexes. We have tested several methods for evaluating the vibrational contribution to binding of 22 nucleotide analogues to the enzyme APS reductase. These include two cluster size methods that measure the probability of finding a particular conformation, a method that estimates the extent of the local energetic well by looking at the scatter of conformations within clustered results, and an RMSD-based method that uses the overall scatter and clustering of all conformations. We have also directly characterized the local energy landscape by randomly sampling around docked conformations. The simple cluster size method shows the best performance, improving the identification of correct conformations in multiple docking experiments. 2008 Wiley Periodicals, Inc.
Wu, Zhichao; Medeiros, Felipe A
2018-03-20
Visual field testing is an important endpoint in glaucoma clinical trials, and the testing paradigm used can have a significant impact on the sample size requirements. To investigate this, this study included 353 eyes of 247 glaucoma patients seen over a 3-year period to extract real-world visual field rates of change and variability estimates to provide sample size estimates from computer simulations. The clinical trial scenario assumed that a new treatment was added to one of two groups that were both under routine clinical care, with various treatment effects examined. Three different visual field testing paradigms were evaluated: a) evenly spaced testing, b) United Kingdom Glaucoma Treatment Study (UKGTS) follow-up scheme, which adds clustered tests at the beginning and end of follow-up in addition to evenly spaced testing, and c) clustered testing paradigm, with clusters of tests at the beginning and end of the trial period and two intermediary visits. The sample size requirements were reduced by 17-19% and 39-40% using the UKGTS and clustered testing paradigms, respectively, when compared to the evenly spaced approach. These findings highlight how the clustered testing paradigm can substantially reduce sample size requirements and improve the feasibility of future glaucoma clinical trials.
NASA Astrophysics Data System (ADS)
Laura, Jason; Skinner, James A.; Hunter, Marc A.
2017-08-01
In this paper we present the Large Crater Clustering (LCC) tool set, an ArcGIS plugin that supports the quantitative approximation of a primary impact location from user-identified locations of possible secondary impact craters or the long-axes of clustered secondary craters. The identification of primary impact craters directly supports planetary geologic mapping and topical science studies where the chronostratigraphic age of some geologic units may be known, but more distant features have questionable geologic ages. Previous works (e.g., McEwen et al., 2005; Dundas and McEwen, 2007) have shown that the source of secondary impact craters can be estimated from secondary impact craters. This work adapts those methods into a statistically robust tool set. We describe the four individual tools within the LCC tool set to support: (1) processing individually digitized point observations (craters), (2) estimating the directional distribution of a clustered set of craters, back projecting the potential flight paths (crater clusters or linearly approximated catenae or lineaments), (3) intersecting projected paths, and (4) intersecting back-projected trajectories to approximate the local of potential source primary craters. We present two case studies using secondary impact features mapped in two regions of Mars. We demonstrate that the tool is able to quantitatively identify primary impacts and supports the improved qualitative interpretation of potential secondary crater flight trajectories.
Detection and Characterization of Galaxy Systems at Intermediate Redshift.
NASA Astrophysics Data System (ADS)
Barrena, Rafael
2004-11-01
This thesis is divided into two very related parts. In the first part we implement and apply a galaxy cluster detection method, based on multiband observations in visible. For this purpose, we use a new algorithm, the Voronoi Galaxy Cluster Finder, which identifies overdensities over a Poissonian field of objects. By applying this algorithm over four photometric bands (B, V, R and I) we reduce the possibility of detecting galaxy projection effects and spurious detections instead of real galaxy clusters. The B, V, R and I photometry allows a good characterization of galaxy systems. Therefore, we analyze the colour and early-type sequences in the colour-magnitude diagrams of the detected clusters. This analysis helps us to confirm the selected candidates as actual galaxy systems. In addition, by comparing observational early-type sequences with a semiempirical model we can estimate a photometric redshift for the detected clusters. We will apply this detection method on four 0.5x0.5 square degrees areas, that partially overlap the Postman Distant Cluster Survey (PDCS). The observations were performed as part of the International Time Programme 1999-B using the Wide Field Camera mounted at Isaac Newton Telescope (Roque de los Muchachos Observatory, La Palma island, Spain). The B and R data obtained were completed with V and I photometry performed by Marc Postman. The comparison of our cluster catalogue with that of PDCS reveals that our work is a clear improvement in the cluster detection techniques. Our method efficiently selects galaxy clusters, in particular low mass galaxy systems, even at relative high redshift, and estimate a precise photometric redshift. The validation of our method comes by observing spectroscopically several selected candidates. By comparing photometric and spectroscopic redshifts we conclude: 1) our photometric estimation method gives an precision lower than 0.1; 2) our detection technique is even able to detect galaxy systems at z~0.7 using visible photometric bands. In the second part of this thesis we analyze in detail the dynamical state of 1E0657-56 (z=0.296), a hot galaxy cluster with strong X-ray and radio emissions. Using spectroscopic and photometric observations in visible (obtained with the New Technology Telescope and the Very Large Telescope, both located at La Silla Observatory, Chile) we analyze the velocity field, morphology, colour and star formation in the galaxy population of this cluster. 1E0657-56 is involved in a collision event. We identify the substructure involved in this collision and we propose a dynamical model that allows us to investigate the origins of X-ray and radio emissions and the relation between them. The analysis of 1E0657-56 presented in this thesis constitutes a good example of what kind of properties could be studied in some of the clusters catalogued in first part of this thesis. In addition, the detailed analysis of this cluster represents an improvement in the study of the origin of X-ray and radio emissions and merging processes in galaxy clusters.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-10-30
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-01-01
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms. PMID:26528984
Improved Event Location Uncertainty Estimates
2006-09-21
validation purposes, we use GT0-2 event clusters. These include the Nevada Lop Nor, Semipalatinsk , and Novaya Zemlys test sites , as well as the Azgir...uncertainties. Furthermore, the tails of real seismic data distributions are heavier than Gaussian. The main objectives of this project are to develop, test
Kasaie, Parastu; Mathema, Barun; Kelton, W David; Azman, Andrew S; Pennington, Jeff; Dowdy, David W
2015-01-01
In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission ("recent transmission proportion"), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional 'n-1' approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the 'n-1' technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the 'n-1' model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models' performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data.
Kasaie, Parastu; Mathema, Barun; Kelton, W. David; Azman, Andrew S.; Pennington, Jeff; Dowdy, David W.
2015-01-01
In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission (“recent transmission proportion”), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional ‘n-1’ approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the ‘n-1’ technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the ‘n-1’ model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models’ performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data. PMID:26679499
NASA Astrophysics Data System (ADS)
Hayden, Brian; Perlmutter, Saul; Boone, Kyle; Nordin, Jakob; Rubin, David; Lidman, Chris; Deustua, Susana E.; Fruchter, Andrew S.; Aldering, Greg Scott; Brodwin, Mark; Cunha, Carlos E.; Eisenhardt, Peter R.; Gonzalez, Anthony H.; Jee, James; Hildebrandt, Hendrik; Hoekstra, Henk; Santos, Joana; Stanford, S. Adam; Stern, Daniel; Fassbender, Rene; Richard, Johan; Rosati, Piero; Wechsler, Risa H.; Muzzin, Adam; Willis, Jon; Boehringer, Hans; Gladders, Michael; Goobar, Ariel; Amanullah, Rahman; Hook, Isobel; Huterer, Dragan; Huang, Xiaosheng; Kim, Alex G.; Kowalski, Marek; Linder, Eric; Pain, Reynald; Saunders, Clare; Suzuki, Nao; Barbary, Kyle H.; Rykoff, Eli S.; Meyers, Joshua; Spadafora, Anthony L.; Sofiatti, Caroline; Wilson, Gillian; Rozo, Eduardo; Hilton, Matt; Ruiz-Lapuente, Pilar; Luther, Kyle; Yen, Mike; Fagrelius, Parker; Dixon, Samantha; Williams, Steven
2017-01-01
The Supernova Cosmology Project has finished executing a large (174 orbits, cycles 22-23) Hubble Space Telescope program, which has measured ~30 type Ia Supernovae above z~1 in the highest-redshift, most massive galaxy clusters known to date. Our SN Ia sample closely matches our pre-survey predictions; this sample will improve the constraint by a factor of 3 on the Dark Energy equation of state above z~1, allowing an unprecedented probe of Dark Energy time variation. When combined with the improved cluster mass calibration from gravitational lensing provided by the deep WFC3-IR observations of the clusters, See Change will triple the Dark Energy Task Force Figure of Merit. With the primary observing campaign completed, we present the preliminary supernova sample and our path forward to the supernova cosmology results. We also compare the number of SNe Ia discovered in each cluster with our pre-survey expectations based on cluster mass and SFR estimates. Our extensive HST and ground-based campaign has already produced unique results; we have confirmed several of the highest redshift cluster members known to date, confirmed the redshift of one of the most massive galaxy clusters at z~1.2 expected across the entire sky, and characterized one of the most extreme starburst environments yet known in a z~1.7 cluster. We have also discovered a lensed SN Ia at z=2.22 magnified by a factor of ~2.7, which is the highest spectroscopic redshift SN Ia currently known.
Turbulence measurements in clusters of galaxies with XMM-Newton
NASA Astrophysics Data System (ADS)
Pinto, C.; Fabian, A.; de Plaa, J.; Sanders, J.
2014-07-01
The kinematics structure of the intracluster medium (ICM) in clusters of galaxies is related to the their evolution. AGN feedback, sloshing of gas within the potential well, and galaxy mergers are thought to generate ICM velocity widths of several hundred km/s. Appropriate determinations of turbulent broadening are crucial not only to understand the effects of the central engine onto the evolution of the clusters, but are also mandatory to obtain realistic (emission) line fits and abundances estimate. We have analyzed the data from the CHEERS catalog which includes 1.5 Ms of new observations (PI: Jelle de Plaa) and archival data for a total of 29 clusters and groups of galaxies, and elliptical galaxies. This campaign provided us with a unique database that significantly improves the quality of the existing observations and the measurements of chemical abundances and turbulent broadening. We have applied the continuum-subtraction spectral-fitting method of Sanders and Fabian and measured turbulence, temperatures, and abundances for the sources in the catalog. For some sources we obtain tight estimates of velocity broadening which is related to the past AGN activity and mergers. We will show our results at the conference and their relevance in the context of future missions.
Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms
NASA Astrophysics Data System (ADS)
Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut
1998-06-01
X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.
Image Location Estimation by Salient Region Matching.
Qian, Xueming; Zhao, Yisi; Han, Junwei
2015-11-01
Nowadays, locations of images have been widely used in many application scenarios for large geo-tagged image corpora. As to images which are not geographically tagged, we estimate their locations with the help of the large geo-tagged image set by content-based image retrieval. In this paper, we exploit spatial information of useful visual words to improve image location estimation (or content-based image retrieval performances). We proposed to generate visual word groups by mean-shift clustering. To improve the retrieval performance, spatial constraint is utilized to code the relative position of visual words. We proposed to generate a position descriptor for each visual word and build fast indexing structure for visual word groups. Experiments show the effectiveness of our proposed approach.
Campos, G S; Reimann, F A; Cardoso, L L; Ferreira, C E R; Junqueira, V S; Schmidt, P I; Braccini Neto, J; Yokoo, M J I; Sollero, B P; Boligon, A A; Cardoso, F F
2018-05-07
The objective of the present study was to evaluate the accuracy and bias of direct and blended genomic predictions using different methods and cross-validation techniques for growth traits (weight and weight gains) and visual scores (conformation, precocity, muscling and size) obtained at weaning and at yearling in Hereford and Braford breeds. Phenotypic data contained 126,290 animals belonging to the Delta G Connection genetic improvement program, and a set of 3,545 animals genotyped with the 50K chip and 131 sires with the 777K. After quality control, 41,045 markers remained for all animals. An animal model was used to estimate (co)variances components and to predict breeding values, which were later used to calculate the deregressed estimated breeding values (DEBV). Animals with genotype and phenotype for the traits studied were divided into four or five groups by random and k-means clustering cross-validation strategies. The values of accuracy of the direct genomic values (DGV) were moderate to high magnitude for at weaning and at yearling traits, ranging from 0.19 to 0.45 for the k-means and 0.23 to 0.78 for random clustering among all traits. The greatest gain in relation to the pedigree BLUP (PBLUP) was 9.5% with the BayesB method with both the k-means and the random clustering. Blended genomic value accuracies ranged from 0.19 to 0.56 for k-means and from 0.21 to 0.82 for random clustering. The analyzes using the historical pedigree and phenotypes contributed additional information to calculate the GEBV and in general, the largest gains were for the single-step (ssGBLUP) method in bivariate analyses with a mean increase of 43.00% among all traits measured at weaning and of 46.27% for those evaluated at yearling. The accuracy values for the marker effects estimation methods were lower for k-means clustering, indicating that the training set relationship to the selection candidates is a major factor affecting accuracy of genomic predictions. The gains in accuracy obtained with genomic blending methods, mainly ssGBLUP in bivariate analyses, indicate that genomic predictions should be used as a tool to improve genetic gains in relation to the traditional PBLUP selection.
The relative impact of baryons and cluster shape on weak lensing mass estimates of galaxy clusters
NASA Astrophysics Data System (ADS)
Lee, B. E.; Le Brun, A. M. C.; Haq, M. E.; Deering, N. J.; King, L. J.; Applegate, D.; McCarthy, I. G.
2018-05-01
Weak gravitational lensing depends on the integrated mass along the line of sight. Baryons contribute to the mass distribution of galaxy clusters and the resulting mass estimates from lensing analysis. We use the cosmo-OWLS suite of hydrodynamic simulations to investigate the impact of baryonic processes on the bias and scatter of weak lensing mass estimates of clusters. These estimates are obtained by fitting NFW profiles to mock data using MCMC techniques. In particular, we examine the difference in estimates between dark matter-only runs and those including various prescriptions for baryonic physics. We find no significant difference in the mass bias when baryonic physics is included, though the overall mass estimates are suppressed when feedback from AGN is included. For lowest-mass systems for which a reliable mass can be obtained (M200 ≈ 2 × 1014M⊙), we find a bias of ≈-10 per cent. The magnitude of the bias tends to decrease for higher mass clusters, consistent with no bias for the most massive clusters which have masses comparable to those found in the CLASH and HFF samples. For the lowest mass clusters, the mass bias is particularly sensitive to the fit radii and the limits placed on the concentration prior, rendering reliable mass estimates difficult. The scatter in mass estimates between the dark matter-only and the various baryonic runs is less than between different projections of individual clusters, highlighting the importance of triaxiality.
Measuring the scatter in the cluster optical richness-mass relation with machine learning
NASA Astrophysics Data System (ADS)
Boada, Steven Alvaro
The distribution of massive clusters of galaxies depends strongly on the total cosmic mass density, the mass variance, and the dark energy equation of state. As such, measures of galaxy clusters can provide constraints on these parameters and even test models of gravity, but only if observations of clusters can lead to accurate estimates of their total masses. Here, we carry out a study to investigate the ability of a blind spectroscopic survey to recover accurate galaxy cluster masses through their line-of- sight velocity dispersions (LOSVD) using probability based and machine learning methods. We focus on the Hobby Eberly Telescope Dark Energy Experiment (HETDEX), which will employ new Visible Integral-Field Replicable Unit Spectrographs (VIRUS), over 420 degree2 on the sky with a 1/4.5 fill factor. VIRUS covers the blue/optical portion of the spectrum (3500 - 5500 A), allowing surveys to measure redshifts for a large sample of galaxies out to z < 0.5 based on their absorption or emission (e.g., [O II], Mg II, Ne V) features. We use a detailed mock galaxy catalog from a semi-analytic model to simulate surveys observed with VIRUS, including: (1) Survey, a blind, HETDEX-like survey with an incomplete but uniform spectroscopic selection function; and (2) Targeted, a survey which targets clusters directly, obtaining spectra of all galaxies in a VIRUS-sized field. For both surveys, we include realistic uncertainties from galaxy magnitude and line-flux limits. We benchmark both surveys against spectroscopic observations with perfect" knowledge of galaxy line-of-sight velocities. With Survey observations, we can recover cluster masses to ˜ 0.1 dex which can be further improved to < 0.1 dex with Targeted observations. This level of cluster mass recovery provides important measurements of the intrinsic scatter in the optical richness-cluster mass relation, and enables constraints on the key cosmological parameter, sigma 8, to < 20%. As a demonstration of the methods developed previously, we present a pilot survey with integral field spectroscopy of ten galaxy clusters optically selected from the Sloan Digital Sky Survey's DR8 at z = 0.2 - 0.3. Eight of the clusters are rich (lambda > 60) systems with total inferred masses (1.58 -17.37) x1014 M (M 200c), and two are poor (lambda < 15) systems with inferred total masses ˜ 0.5 x 1014 M? (M200c ). We use the Mitchell Spectrograph, (formerly the VIRUS-P spectrograph, a prototype of the HETDEX VIRUS instrument) located on the McDonald Observatory 2.7m telescope, to measure spectroscopic redshifts and line-of-sight velocities of the galaxies in and around each cluster, determine cluster membership and derive LOSVDs. We test both a LOSVD-cluster mass scaling relation and a machine learning based approach to infer total cluster mass. After comparing the cluster mass estimates to the literature, we use these independent cluster mass measurements to estimate the absolute cluster mass scale, and intrinsic scatter in the optical richness-mass relationship. We measure the intrinsic scatter in richness at fixed cluster mass to be sigmaM/lambda = 0.27 +/- 0.07 dex in excellent agreement with previous estimates of sigmaM/lambda ˜ 0.2 - 0.3 dex. We discuss the importance of the data used to train the machine learning methods and suggest various strategies to import the accuracy of the bias (offset) and scatter in the optical richness-cluster mass relation. This demonstrates the power of blind spectroscopic surveys such as HETDEX to provide robust cluster mass estimates which can aid in the determination of cosmological parameters and help to calibrate the observable-mass relation for future photometric large area-sky surveys.
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Johnson, J. K.
1979-01-01
An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
Shah, Prakesh S.; McDonald, Sarah D.; Barrett, Jon; Synnes, Anne; Robson, Kate; Foster, Jonathan; Pasquier, Jean-Charles; Joseph, K.S.; Piedboeuf, Bruno; Lacaze-Masmonteil, Thierry; O'Brien, Karel; Shivananda, Sandesh; Chaillet, Nils; Pechlivanoglou, Petros
2018-01-01
Background: Preterm birth (birth before 37 wk of gestation) occurs in about 8% of pregnancies in Canada and is associated with high mortality and morbidity rates that substantially affect infants, their families and the health care system. Our overall goal is to create a transdisciplinary platform, the Canadian Preterm Birth Network (CPTBN), where investigators, stakeholders and families will work together to improve childhood outcomes of preterm neonates. Methods: Our national cohort will include 24 maternal-fetal/obstetrical units, 31 neonatal intensive care units and 26 neonatal follow-up programs across Canada with planned linkages to provincial health information systems. Three broad clusters of projects will be undertaken. Cluster 1 will focus on quality-improvement efforts that use the Evidence-based Practice for Improving Quality method to evaluate information from the CPTBN database and review the current literature, then identify potentially better health care practices and implement identified strategies. Cluster 2 will assess the impact of current practices and practice changes in maternal, perinatal and neonatal care on maternal, neonatal and neurodevelopmental outcomes. Cluster 3 will evaluate the effect of preterm birth on babies, their families and the health care system by integrating CPTBN data, parent feedback, and national and provincial database information in order to identify areas where more parental support is needed, and also generate robust estimates of resource use, cost and cost-effectiveness around preterm neonatal care. Interpretation: These collaborative efforts will create a flexible, transdisciplinary, evaluable and informative research and quality-improvement platform that supports programs, projects and partnerships focused on improving outcomes of preterm neonates. PMID:29348260
Relative risk estimates from spatial and space-time scan statistics: Are they biased?
Prates, Marcos O.; Kulldorff, Martin; Assunção, Renato M.
2014-01-01
The purely spatial and space-time scan statistics have been successfully used by many scientists to detect and evaluate geographical disease clusters. Although the scan statistic has high power in correctly identifying a cluster, no study has considered the estimates of the cluster relative risk in the detected cluster. In this paper we evaluate whether there is any bias on these estimated relative risks. Intuitively, one may expect that the estimated relative risks has upward bias, since the scan statistic cherry picks high rate areas to include in the cluster. We show that this intuition is correct for clusters with low statistical power, but with medium to high power the bias becomes negligible. The same behaviour is not observed for the prospective space-time scan statistic, where there is an increasing conservative downward bias of the relative risk as the power to detect the cluster increases. PMID:24639031
An Information-Theoretic-Cluster Visualization for Self-Organizing Maps.
Brito da Silva, Leonardo Enzo; Wunsch, Donald C
2018-06-01
Improved data visualization will be a significant tool to enhance cluster analysis. In this paper, an information-theoretic-based method for cluster visualization using self-organizing maps (SOMs) is presented. The information-theoretic visualization (IT-vis) has the same structure as the unified distance matrix, but instead of depicting Euclidean distances between adjacent neurons, it displays the similarity between the distributions associated with adjacent neurons. Each SOM neuron has an associated subset of the data set whose cardinality controls the granularity of the IT-vis and with which the first- and second-order statistics are computed and used to estimate their probability density functions. These are used to calculate the similarity measure, based on Renyi's quadratic cross entropy and cross information potential (CIP). The introduced visualizations combine the low computational cost and kernel estimation properties of the representative CIP and the data structure representation of a single-linkage-based grouping algorithm to generate an enhanced SOM-based visualization. The visual quality of the IT-vis is assessed by comparing it with other visualization methods for several real-world and synthetic benchmark data sets. Thus, this paper also contains a significant literature survey. The experiments demonstrate the IT-vis cluster revealing capabilities, in which cluster boundaries are sharply captured. Additionally, the information-theoretic visualizations are used to perform clustering of the SOM. Compared with other methods, IT-vis of large SOMs yielded the best results in this paper, for which the quality of the final partitions was evaluated using external validity indices.
NASA Astrophysics Data System (ADS)
Schrabback, T.; Applegate, D.; Dietrich, J. P.; Hoekstra, H.; Bocquet, S.; Gonzalez, A. H.; von der Linden, A.; McDonald, M.; Morrison, C. B.; Raihan, S. F.; Allen, S. W.; Bayliss, M.; Benson, B. A.; Bleem, L. E.; Chiu, I.; Desai, S.; Foley, R. J.; de Haan, T.; High, F. W.; Hilbert, S.; Mantz, A. B.; Massey, R.; Mohr, J.; Reichardt, C. L.; Saro, A.; Simon, P.; Stern, C.; Stubbs, C. W.; Zenteno, A.
2018-02-01
We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (zmedian = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z)M500c/1014 M⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81^{+0.24}_{-0.14}(stat.) {± } 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schrabback, T.; et al.
We present an HST/ACS weak gravitational lensing analysis of 13 massive high-redshift (z_median=0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the sourcemore » redshift distribution is based on CANDELS data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the mass-concentration relation using simulations. In combination with temperature estimates from Chandra we constrain the normalisation of the mass-temperature scaling relation ln(E(z) M_500c/10^14 M_sun)=A+1.5 ln(kT/7.2keV) to A=1.81^{+0.24}_{-0.14}(stat.) +/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.« less
Yelland, Lisa N; Salter, Amy B; Ryan, Philip
2011-10-15
Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
The Halo Boundary of Galaxy Clusters in the SDSS
NASA Astrophysics Data System (ADS)
Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh; Adhikari, Susmita; Dalal, Neal; Kravtsov, Andrey; More, Surhud; Rozo, Eduardo; Rykoff, Eli; Sheth, Ravi K.
2017-05-01
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the “infalling” regime outside the halo to the “collapsed” regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxy colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a “splashback”-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. With upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters
Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
Study of a few cluster candidates in the Magellanic Bridge
NASA Astrophysics Data System (ADS)
Choudhury, Samyaday; Subramaniam Subramaniam, Annapurni; Sohn, Young-Jong
2018-06-01
The Magellanic Clouds (LMC & SMC) are gas rich, metal poor, dwarf satellite galaxies to our Milky Way that are interacting with each other. The Magellanic Bridge (MB), joining the larger and smaller Cloud is considered to be a signature of this interaction process. Studies have revealed that the MB, apart from gas also hosts stellar populations and star clusters. The number of clusters, with well-estimated parameters within the MB is still underway. In this work, we study a sample of 9 previously cataloged star clusters in the MB region. We use Washington C, Harris R and Cousins I bands data from literature, taken using the 4-m Blanco telescope to estimate the cluster properties (size, age, reddening). We also identify and separate out genuine cluster candidates from possible clusters/asterism. The increase in number of genuine cluster candidates with well-estimated parameters is important in the context of understanding cluster formation and evolution in such low-metallicity, and tidally disrupted environment. The clusters studied here can also help estimate distances to different parts of the MB, as recent studies indicate that portions of MB near the SMC is a closer to us, than the LMC.
Grady, Sue C
2010-12-01
Research on local racial residential segregation and health generally utilize census tract boundaries as a proxy from within which to estimate individual exposures. Census tracts however, may not accurately reflect the neighborhood environments in which people live and interact. Census tract geography may also capture non-exchangeable populations in socially stratified cities, impacting statistical assumptions of independence. To address these concerns, this study assessed the impact of racial residential segregation (i.e., racial isolation and racial clusters) on low birth weight (i.e., intrauterine growth retardation (IUGR) and preterm birth) in the Detroit Metropolitan Area using optimized neighborhood boundary definitions. Automated zone-matching (AZM) methodology was applied to redefine neighborhood (zones). Maternal and infant health information was obtained from Michigan's vital statistics birth registry (n=137,965) for the years 2004-2006. Multilevel models were estimated to assess the effect of high racial isolation and high racial clusters on IUGR and preterm birth, controlling for maternal race, single marital status, smoking and area-level poverty. The results from this study showed that high racial isolation had a significant effect on IUGR, while the odds of preterm birth were higher in racially clustered zones. African American mothers were at increased odds of having IUGR or preterm infants than other mothers; however, these disparities reduced in highly segregated zones. The predicted incidence of IUGR across racially isolated zones and census tracts differed indicating a modifiable area unit problem (MAUP). MAUP effects were not observed in models predicting preterm incidence in high racial clusters or IUGR or preterm incidence in high poverty areas, demonstrating the stability-reliability of these estimates. Future research should continue to optimize neighborhood boundary definitions, while assessing the sensitivity of segregation measures to changes in scale, to improve our understanding of segregation impacts on racial disparities in low birth weight. Copyright © 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, Xin; Liu, Li; Zhou, Sida; Yue, Zhenjiang
2016-09-01
Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.
Population-based epidemiology of non-fatal injuries in Tehran, Iran.
Hashemi, Esmatolsadat; Zangi, Mahdi; Sadeghi-Bazargani, Homayoun; Soares, Joaquim; Viitasara, Eija; Mohammadi, Reza
2018-01-01
Background: Our aim in this survey was to explore descriptive epidemiology of injuries in Tehran in 2012 and to report the recalled estimates of injury incidence rates. Methods: A population survey was conducted in Tehran during 2012, within which a total of 8626 participants were enrolled. The cluster sampling was used to draw samples in 100 clusters with a pre-specified cluster size of 25 households per cluster. Data were collected on demographic features, accident and injury characteristics based on the International Classification of Diseases (ICD10). Results: A total of 618 injuries per 3 months were reported, within which 597 cases (96.6%)were unintentional injuries. More than 82% of all injuries were those caused by exposure to inanimate mechanical forces, traffic accidents, falls and burns. Above 80% of the traffic injuries happened among men (P<0.001). About 43% of the unintentional injuries were mild injuries.After the age of 40, women, unlike men, had higher risks for being injured. The estimated annual incidence rate for all types of injuries was 284.8 per 1000 (95% CI: 275.4-294.4) and for unintentional injuries was 275.2 per 1000. Conclusion: Injuries are major health problems in Tehran with a highly reported incidence. The status is not substantially improved over the recent years which urges the need to be adequately and emergently addressed. As the incidence rate was estimated based on participant recalls, the real incidence rate may even be higher than those reported in the current study.
Linear clusters of galaxies - A999 and A1016
NASA Astrophysics Data System (ADS)
Chapman, G. N. F.; Geller, M. J.; Huchra, J. P.
1987-09-01
The authors have measured 44 new redshifts in A 999 and 40 in A 1016: these clusters are both "linear" according to Rood and Sastry (1971) and Struble and Rood (1982, 1984). With 20 cluster members in A 999 and 22 in A 1016, the authors can estimate the probability that these clusters are actually drawn from spherically symmetric distributions. By comparing the clusters with Monte Carlo King models, they find that A 999 is probably intrinsically spherically symmetric, but A 1016 is probably linear. The authors estimate that ⪆2% of a catalog of spherically symmetric clusters might be erroneously classified as linear. They use the data to estimate the virial masses for these systems. The authors reassess the cluster-galaxy alignment analysis of Adams, Strom, and Strom (1980) and examine the relationship between the luminosity and morphological type of the cluster members and the cluster itself.
Spatial cluster detection using dynamic programming.
Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F
2012-03-25
The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.
Spatial cluster detection using dynamic programming
2012-01-01
Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103
Open-Source Sequence Clustering Methods Improve the State Of the Art.
Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob
2016-01-01
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Intra-class correlation estimates for assessment of vitamin A intake in children.
Agarwal, Girdhar G; Awasthi, Shally; Walter, Stephen D
2005-03-01
In many community-based surveys, multi-level sampling is inherent in the design. In the design of these studies, especially to calculate the appropriate sample size, investigators need good estimates of intra-class correlation coefficient (ICC), along with the cluster size, to adjust for variation inflation due to clustering at each level. The present study used data on the assessment of clinical vitamin A deficiency and intake of vitamin A-rich food in children in a district in India. For the survey, 16 households were sampled from 200 villages nested within eight randomly-selected blocks of the district. ICCs and components of variances were estimated from a three-level hierarchical random effects analysis of variance model. Estimates of ICCs and variance components were obtained at village and block levels. Between-cluster variation was evident at each level of clustering. In these estimates, ICCs were inversely related to cluster size, but the design effect could be substantial for large clusters. At the block level, most ICC estimates were below 0.07. At the village level, many ICC estimates ranged from 0.014 to 0.45. These estimates may provide useful information for the design of epidemiological studies in which the sampled (or allocated) units range in size from households to large administrative zones.
Salmaso, S.; Rota, M. C.; Ciofi Degli Atti, M. L.; Tozzi, A. E.; Kreidl, P.
1999-01-01
In 1998, a series of regional cluster surveys (the ICONA Study) was conducted simultaneously in 19 out of the 20 regions in Italy to estimate the mandatory immunization coverage of children aged 12-24 months with oral poliovirus (OPV), diphtheria-tetanus (DT) and viral hepatitis B (HBV) vaccines, as well as optional immunization coverage with pertussis, measles and Haemophilus influenzae b (Hib) vaccines. The study children were born in 1996 and selected from birth registries using the Expanded Programme of Immunization (EPI) cluster sampling technique. Interviews with parents were conducted to determine each child's immunization status and the reasons for any missed or delayed vaccinations. The study population comprised 4310 children aged 12-24 months. Coverage for both mandatory and optional vaccinations differed by region. The overall coverage for mandatory vaccines (OPV, DT and HBV) exceeded 94%, but only 79% had been vaccinated in accord with the recommended schedule (i.e. during the first year of life). Immunization coverage for pertussis increased from 40% (1993 survey) to 88%, but measles coverage (56%) remained inadequate for controlling the disease; Hib coverage was 20%. These results confirm that in Italy the coverage of only mandatory immunizations is satisfactory. Pertussis immunization coverage has improved dramatically since the introduction of acellular vaccines. A greater effort to educate parents and physicians is still needed to improve the coverage of optional vaccinations in all regions. PMID:10593033
Salmaso, S; Rota, M C; Ciofi Degli Atti, M L; Tozzi, A E; Kreidl, P
1999-01-01
In 1998, a series of regional cluster surveys (the ICONA Study) was conducted simultaneously in 19 out of the 20 regions in Italy to estimate the mandatory immunization coverage of children aged 12-24 months with oral poliovirus (OPV), diphtheria-tetanus (DT) and viral hepatitis B (HBV) vaccines, as well as optional immunization coverage with pertussis, measles and Haemophilus influenzae b (Hib) vaccines. The study children were born in 1996 and selected from birth registries using the Expanded Programme of Immunization (EPI) cluster sampling technique. Interviews with parents were conducted to determine each child's immunization status and the reasons for any missed or delayed vaccinations. The study population comprised 4310 children aged 12-24 months. Coverage for both mandatory and optional vaccinations differed by region. The overall coverage for mandatory vaccines (OPV, DT and HBV) exceeded 94%, but only 79% had been vaccinated in accord with the recommended schedule (i.e. during the first year of life). Immunization coverage for pertussis increased from 40% (1993 survey) to 88%, but measles coverage (56%) remained inadequate for controlling the disease; Hib coverage was 20%. These results confirm that in Italy the coverage of only mandatory immunizations is satisfactory. Pertussis immunization coverage has improved dramatically since the introduction of acellular vaccines. A greater effort to educate parents and physicians is still needed to improve the coverage of optional vaccinations in all regions.
SEMIPARAMETRIC EFFICIENT ESTIMATION FOR SHARED-FRAILTY MODELS WITH DOUBLY-CENSORED CLUSTERED DATA
Wang, Jane-Ling
2018-01-01
In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left censoring. Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left censoring feature of the data. The new algorithm not only resolves this challenge but also accommodate the additional frailty variable effectively. Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data. PMID:29527068
Link prediction with node clustering coefficient
NASA Astrophysics Data System (ADS)
Wu, Zhihao; Lin, Youfang; Wang, Jing; Gregory, Steve
2016-06-01
Predicting missing links in incomplete complex networks efficiently and accurately is still a challenging problem. The recently proposed Cannistrai-Alanis-Ravai (CAR) index shows the power of local link/triangle information in improving link-prediction accuracy. Inspired by the idea of employing local link/triangle information, we propose a new similarity index with more local structure information. In our method, local link/triangle structure information can be conveyed by clustering coefficient of common-neighbors directly. The reason why clustering coefficient has good effectiveness in estimating the contribution of a common-neighbor is that it employs links existing between neighbors of a common-neighbor and these links have the same structural position with the candidate link to this common-neighbor. In our experiments, three estimators: precision, AUP and AUC are used to evaluate the accuracy of link prediction algorithms. Experimental results on ten tested networks drawn from various fields show that our new index is more effective in predicting missing links than CAR index, especially for networks with low correlation between number of common-neighbors and number of links between common-neighbors.
Lensing convergence in galaxy clustering in ΛCDM and beyond
NASA Astrophysics Data System (ADS)
Villa, Eleonora; Di Dio, Enea; Lepori, Francesca
2018-04-01
We study the impact of neglecting lensing magnification in galaxy clustering analyses for future galaxy surveys, considering the ΛCDM model and two extensions: massive neutrinos and modifications of General Relativity. Our study focuses on the biases on the constraints and on the estimation of the cosmological parameters. We perform a comprehensive investigation of these two effects for the upcoming photometric and spectroscopic galaxy surveys Euclid and SKA for different redshift binning configurations. We also provide a fitting formula for the magnification bias of SKA. Our results show that the information present in the lensing contribution does improve the constraints on the modified gravity parameters whereas the lensing constraining power is negligible for the ΛCDM parameters. For photometric surveys the estimation is biased for all the parameters if lensing is not taken into account. This effect is particularly significant for the modified gravity parameters. Conversely for spectroscopic surveys the bias is below one sigma for all the parameters. Our findings show the importance of including lensing in galaxy clustering analyses for testing General Relativity and to constrain the parameters which describe its modifications.
NASA Technical Reports Server (NTRS)
Reese, Erik D.; Mroczkowski, Tony; Menanteau, Felipe; Hilton, Matt; Sievers, Jonathan; Aguirre, Paula; Appel, John William; Baker, Andrew J.; Bond, J. Richard; Das, Sudeep;
2011-01-01
We present follow-up observations with the Sunyaev-Zel'dovich Array (SZA) of optically-confirmed galaxy clusters found in the equatorial survey region of the Atacama Cosmology Telescope (ACT): ACT-CL J0022-0036, ACT-CL J2051+0057, and ACT-CL J2337+0016. ACT-CL J0022-0036 is a newly-discovered, massive (10(exp 15) Msun), high-redshift (z=0.81) cluster revealed by ACT through the Sunyaev-Zel'dovich effect (SZE). Deep, targeted observations with the SZA allow us to probe a broader range of cluster spatial scales, better disentangle cluster decrements from radio point source emission, and derive more robust integrated SZE flux and mass estimates than we can with ACT data alone. For the two clusters we detect with the SZA we compute integrated SZE signal and derive masses from the SZA data only. ACT-CL J2337+0016, also known as Abell 2631, has archival Chandra data that allow an additional X-ray-based mass estimate. Optical richness is also used to estimate cluster masses and shows good agreement with the SZE and X-ray-based estimates. Based on the point sources detected by the SZA in these three cluster fields and an extrapolation to ACT's frequency, we estimate that point sources could be contaminating the SZE decrement at the less than = 20% level for some fraction of clusters.
NASA Technical Reports Server (NTRS)
Reese, Erik; Mroczkowski, Tony; Menateau, Felipe; Hilton, Matt; Sievers, Jonathan; Aguirre, Paula; Appel, John William; Baker, Andrew J.; Bond, J. Richard; Das, Sudeep;
2011-01-01
We present follow-up observations with the Sunyaev-Zel'dovich Array (SZA) of optically-confirmed galaxy clusters found in the equatorial survey region of the Atacama Cosmology Telescope (ACT): ACT-CL J0022-0036, ACT-CL J2051+0057, and ACT-CL J2337+0016. ACT-CL J0022-0036 is a newly-discovered, massive ( approximately equals 10(exp 15) Solar M), high-redshift (z = 0.81) cluster revealed by ACT through the Sunyaev-Zeldovich effect (SZE). Deep, targeted observations with the SZA allow us to probe a broader range of cluster spatial scales, better disentangle cluster decrements from radio point source emission, and derive more robust integrated SZE flux and mass estimates than we can with ACT data alone. For the two clusters we detect with the SZA we compute integrated SZE signal and derive masses from the SZA data only. ACT-CL J2337+0016, also known as Abell 2631, has archival Chandra data that allow an additional X-ray-based mass estimate. Optical richness is also used to estimate cluster masses and shows good agreement with the SZE and X-ray-based estimates. Based on the point sources detected by the SZA in these three cluster fields and an extrapolation to ACT's frequency, we estimate that point sources could be contaminating the SZE decrement at the approx < 20% level for some fraction of clusters.
Properties of star clusters - I. Automatic distance and extinction estimates
NASA Astrophysics Data System (ADS)
Buckner, Anne S. M.; Froebrich, Dirk
2013-12-01
Determining star cluster distances is essential to analyse their properties and distribution in the Galaxy. In particular, it is desirable to have a reliable, purely photometric distance estimation method for large samples of newly discovered cluster candidates e.g. from the Two Micron All Sky Survey, the UK Infrared Deep Sky Survey Galactic Plane Survey and VVV. Here, we establish an automatic method to estimate distances and reddening from near-infrared photometry alone, without the use of isochrone fitting. We employ a decontamination procedure of JHK photometry to determine the density of stars foreground to clusters and a galactic model to estimate distances. We then calibrate the method using clusters with known properties. This allows us to establish distance estimates with better than 40 per cent accuracy. We apply our method to determine the extinction and distance values to 378 known open clusters and 397 cluster candidates from the list of Froebrich, Scholz & Raftery. We find that the sample is biased towards clusters of a distance of approximately 3 kpc, with typical distances between 2 and 6 kpc. Using the cluster distances and extinction values, we investigate how the average extinction per kiloparsec distance changes as a function of the Galactic longitude. We find a systematic dependence that can be approximated by AH(l) [mag kpc-1] = 0.10 + 0.001 × |l - 180°|/° for regions more than 60° from the Galactic Centre.
NASA Astrophysics Data System (ADS)
Monna, A.; Seitz, S.; Zitrin, A.; Geller, M. J.; Grillo, C.; Mercurio, A.; Greisel, N.; Halkola, A.; Suyu, S. H.; Postman, M.; Rosati, P.; Balestra, I.; Biviano, A.; Coe, D.; Fabricant, D. G.; Hwang, H. S.; Koekemoer, A.
2015-02-01
We use velocity dispersion measurements of 21 individual cluster members in the core of Abell 383, obtained with Multiple Mirror Telescope Hectospec, to separate the galaxy and the smooth dark halo (DH) lensing contributions. While lensing usually constrains the overall, projected mass density, the innovative use of velocity dispersion measurements as a proxy for masses of individual cluster members breaks inherent degeneracies and allows us to (a) refine the constraints on single galaxy masses and on the galaxy mass-to-light scaling relation and, as a result, (b) refine the constraints on the DM-only map, a high-end goal of lens modelling. The knowledge of cluster member velocity dispersions improves the fit by 17 per cent in terms of the image reproduction χ2, or 20 per cent in terms of the rms. The constraints on the mass parameters improve by ˜10 per cent for the DH, while for the galaxy component, they are refined correspondingly by ˜50 per cent, including the galaxy halo truncation radius. For an L* galaxy with M^{*}B=-20.96, for example, we obtain best-fitting truncation radius r_tr^{*}=20.5^{+9.6}_{-6.7} kpc and velocity dispersion σ* = 324 ± 17 km s-1. Moreover, by performing the surface brightness reconstruction of the southern giant arc, we improve the constraints on rtr of two nearby cluster members, which have measured velocity dispersions, by more than ˜30 per cent. We estimate the stripped mass for these two galaxies, getting results that are consistent with numerical simulations. In the future, we plan to apply this analysis to other galaxy clusters for which velocity dispersions of member galaxies are available.
Old, L.; Wojtak, R.; Pearce, F. R.; ...
2017-12-20
With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Old, L.; Wojtak, R.; Pearce, F. R.
With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, Kaylea; Nagai, Daisuke; Yu, Liang
2014-02-20
The use of galaxy clusters as cosmological probes hinges on our ability to measure their masses accurately and with high precision. Hydrostatic mass is one of the most common methods for estimating the masses of individual galaxy clusters, which suffer from biases due to departures from hydrostatic equilibrium. Using a large, mass-limited sample of massive galaxy clusters from a high-resolution hydrodynamical cosmological simulation, in this work we show that in addition to turbulent and bulk gas velocities, acceleration of gas introduces biases in the hydrostatic mass estimate of galaxy clusters. In unrelaxed clusters, the acceleration bias is comparable to themore » bias due to non-thermal pressure associated with merger-induced turbulent and bulk gas motions. In relaxed clusters, the mean mass bias due to acceleration is small (≲ 3%), but the scatter in the mass bias can be reduced by accounting for gas acceleration. Additionally, this acceleration bias is greater in the outskirts of higher redshift clusters where mergers are more frequent and clusters are accreting more rapidly. Since gas acceleration cannot be observed directly, it introduces an irreducible bias for hydrostatic mass estimates. This acceleration bias places limits on how well we can recover cluster masses from future X-ray and microwave observations. We discuss implications for cluster mass estimates based on X-ray, Sunyaev-Zel'dovich effect, and gravitational lensing observations and their impact on cluster cosmology.« less
NASA Astrophysics Data System (ADS)
Nelson, Kaylea; Lau, Erwin T.; Nagai, Daisuke; Rudd, Douglas H.; Yu, Liang
2014-02-01
The use of galaxy clusters as cosmological probes hinges on our ability to measure their masses accurately and with high precision. Hydrostatic mass is one of the most common methods for estimating the masses of individual galaxy clusters, which suffer from biases due to departures from hydrostatic equilibrium. Using a large, mass-limited sample of massive galaxy clusters from a high-resolution hydrodynamical cosmological simulation, in this work we show that in addition to turbulent and bulk gas velocities, acceleration of gas introduces biases in the hydrostatic mass estimate of galaxy clusters. In unrelaxed clusters, the acceleration bias is comparable to the bias due to non-thermal pressure associated with merger-induced turbulent and bulk gas motions. In relaxed clusters, the mean mass bias due to acceleration is small (lsim 3%), but the scatter in the mass bias can be reduced by accounting for gas acceleration. Additionally, this acceleration bias is greater in the outskirts of higher redshift clusters where mergers are more frequent and clusters are accreting more rapidly. Since gas acceleration cannot be observed directly, it introduces an irreducible bias for hydrostatic mass estimates. This acceleration bias places limits on how well we can recover cluster masses from future X-ray and microwave observations. We discuss implications for cluster mass estimates based on X-ray, Sunyaev-Zel'dovich effect, and gravitational lensing observations and their impact on cluster cosmology.
Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello
2018-04-22
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
Chen, Ling; Feng, Yanqin; Sun, Jianguo
2017-10-01
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong
2012-01-01
Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181
Deducing the Milky Way's Massive Cluster Population
NASA Astrophysics Data System (ADS)
Hanson, M. M.; Popescu, B.; Larsen, S. S.; Ivanov, V. D.
2010-11-01
Recent near-infrared surveys of the galactic plane have been used to identify new massive cluster candidates. Follow up study indicates about half are not true, gravitationally-bound clusters. These false positives are created by high density fields of unassociated stars, often due to a sight-line of reduced extinction. What is not so easy to estimate is the number of false negatives, clusters which exist but are not currently being detected by our surveys. In order to derive critical characteristics of the Milky Way's massive cluster population, such as cluster mass function and cluster lifetimes, one must be able to estimate the characteristics of these false negatives. Our group has taken on the daunting task of attempting such an estimate by first creating the stellar cluster imaging simulation program, MASSCLEAN. I will present our preliminary models and methods for deriving the biases of current searches.
R package to estimate intracluster correlation coefficient with confidence interval for binary data.
Chakraborty, Hrishikesh; Hossain, Akhtar
2018-03-01
The Intracluster Correlation Coefficient (ICC) is a major parameter of interest in cluster randomized trials that measures the degree to which responses within the same cluster are correlated. There are several types of ICC estimators and its confidence intervals (CI) suggested in the literature for binary data. Studies have compared relative weaknesses and advantages of ICC estimators as well as its CI for binary data and suggested situations where one is advantageous in practical research. The commonly used statistical computing systems currently facilitate estimation of only a very few variants of ICC and its CI. To address the limitations of current statistical packages, we developed an R package, ICCbin, to facilitate estimating ICC and its CI for binary responses using different methods. The ICCbin package is designed to provide estimates of ICC in 16 different ways including analysis of variance methods, moments based estimation, direct probabilistic methods, correlation based estimation, and resampling method. CI of ICC is estimated using 5 different methods. It also generates cluster binary data using exchangeable correlation structure. ICCbin package provides two functions for users. The function rcbin() generates cluster binary data and the function iccbin() estimates ICC and it's CI. The users can choose appropriate ICC and its CI estimate from the wide selection of estimates from the outputs. The R package ICCbin presents very flexible and easy to use ways to generate cluster binary data and to estimate ICC and it's CI for binary response using different methods. The package ICCbin is freely available for use with R from the CRAN repository (https://cran.r-project.org/package=ICCbin). We believe that this package can be a very useful tool for researchers to design cluster randomized trials with binary outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Factors influencing the quality of life of haemodialysis patients according to symptom cluster.
Shim, Hye Yeung; Cho, Mi-Kyoung
2018-05-01
To identify the characteristics in each symptom cluster and factors influencing the quality of life of haemodialysis patients in Korea according to cluster. Despite developments in renal replacement therapy, haemodialysis still restricts the activities of daily living due to pain and impairs physical functioning induced by the disease and its complications. Descriptive survey. Two hundred and thirty dialysis patients aged >18 years. They completed self-administered questionnaires of Dialysis Symptom Index and Kidney Disease Quality of Life instrument-Short Form 1.3. To determine the optimal number of clusters, the collected data were analysed using polytomous variable latent class analysis in R software (poLCA) to estimate the latent class models and the latent class regression models for polytomous outcome variables. Differences in characteristics, symptoms and QOL according to the symptom cluster of haemodialysis patients were analysed using the independent t test and chi-square test. The factors influencing the QOL according to symptom cluster were identified using hierarchical multiple regression analysis. Physical and emotional symptoms were significantly more severe, and the QOL was significantly worse in Cluster 1 than in Cluster 2. The factors influencing the QOL were spouse, job, insurance type and physical and emotional symptoms in Cluster 1, with these variables having an explanatory power of 60.9%. Physical and emotional symptoms were the only influencing factors in Cluster 2, and they had an explanatory power of 37.4%. Mitigating the symptoms experienced by haemodialysis patients and improving their QOL require educational and therapeutic symptom management interventions that are tailored according to the characteristics and symptoms in each cluster. The findings of this study are expected to lead to practical guidelines for addressing the symptoms experienced by haemodialysis patients, and they provide basic information for developing nursing interventions to manage these symptoms and improve the QOL of these patients. © 2017 John Wiley & Sons Ltd.
Accounting for One-Group Clustering in Effect-Size Estimation
ERIC Educational Resources Information Center
Citkowicz, Martyna; Hedges, Larry V.
2013-01-01
In some instances, intentionally or not, study designs are such that there is clustering in one group but not in the other. This paper describes methods for computing effect size estimates and their variances when there is clustering in only one group and the analysis has not taken that clustering into account. The authors provide the effect size…
The halo boundary of galaxy clusters in the SDSS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the "infalling" regime outside the halo to the "collapsed" regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a "splashback"-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. As a result, with upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The Halo Boundary of Galaxy Clusters in the SDSS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baxter, Eric; Jain, Bhuvnesh; Sheth, Ravi K.
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the “infalling” regime outside the halo to the “collapsed” regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a “splashback”-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. With upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
The halo boundary of galaxy clusters in the SDSS
Baxter, Eric; Chang, Chihway; Jain, Bhuvnesh; ...
2017-05-18
Analytical models and simulations predict a rapid decline in the halo density profile associated with the transition from the "infalling" regime outside the halo to the "collapsed" regime within the halo. Using data from SDSS, we explore evidence for such a feature in the density profiles of galaxy clusters using several different approaches. We first estimate the steepening of the outer galaxy density profile around clusters, finding evidence for truncation of the halo profile. Next, we measure the galaxy density profile around clusters using two sets of galaxies selected on color. We find evidence of an abrupt change in galaxymore » colors that coincides with the location of the steepening of the density profile. Since galaxies that have completed orbits within the cluster are more likely to be quenched of star formation and thus appear redder, this abrupt change in galaxy color can be associated with the transition from single-stream to multi-stream regimes. We also use a standard model comparison approach to measure evidence for a "splashback"-like feature, but find that this approach is very sensitive to modeling assumptions. Finally, we perform measurements using an independent cluster catalog to test for potential systematic errors associated with cluster selection. We identify several avenues for future work: improved understanding of the small-scale galaxy profile, lensing measurements, identification of proxies for the halo accretion rate, and other tests. As a result, with upcoming data from the DES, KiDS, and HSC surveys, we can expect significant improvements in the study of halo boundaries.« less
NASA Astrophysics Data System (ADS)
Acebron, Ana; Jullo, Eric; Limousin, Marceau; Tilquin, André; Giocoli, Carlo; Jauzac, Mathilde; Mahler, Guillaume; Richard, Johan
2017-09-01
Strong gravitational lensing by galaxy clusters is a fundamental tool to study dark matter and constrain the geometry of the Universe. Recently, the Hubble Space Telescope Frontier Fields programme has allowed a significant improvement of mass and magnification measurements but lensing models still have a residual root mean square between 0.2 arcsec and few arcseconds, not yet completely understood. Systematic errors have to be better understood and treated in order to use strong lensing clusters as reliable cosmological probes. We have analysed two simulated Hubble-Frontier-Fields-like clusters from the Hubble Frontier Fields Comparison Challenge, Ares and Hera. We use several estimators (relative bias on magnification, density profiles, ellipticity and orientation) to quantify the goodness of our reconstructions by comparing our multiple models, optimized with the parametric software lenstool, with the input models. We have quantified the impact of systematic errors arising, first, from the choice of different density profiles and configurations and, secondly, from the availability of constraints (spectroscopic or photometric redshifts, redshift ranges of the background sources) in the parametric modelling of strong lensing galaxy clusters and therefore on the retrieval of cosmological parameters. We find that substructures in the outskirts have a significant impact on the position of the multiple images, yielding tighter cosmological contours. The need for wide-field imaging around massive clusters is thus reinforced. We show that competitive cosmological constraints can be obtained also with complex multimodal clusters and that photometric redshifts improve the constraints on cosmological parameters when considering a narrow range of (spectroscopic) redshifts for the sources.
Machine learning approaches for estimation of prediction interval for the model output.
Shrestha, Durga L; Solomatine, Dimitri P
2006-03-01
A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
Cluster mass estimators from CMB temperature and polarization lensing
NASA Astrophysics Data System (ADS)
Hu, Wayne; DeDeo, Simon; Vale, Chris
2007-12-01
Upcoming Sunyaev Zel'dovich surveys are expected to return ~104 intermediate mass clusters at high redshift. Their average masses must be known to the same accuracy as desired for the dark energy properties. Internal to the surveys, the cosmic microwave background (CMB) potentially provides a source for lensing mass measurements whose distance is precisely known and behind all clusters. We develop statistical mass estimators from six quadratic combinations of CMB temperature and polarization fields that can simultaneously recover large-scale structure and cluster mass profiles. The performance of these estimators on idealized Navarro Frenk White (NFW) clusters suggests that surveys with a ~1' beam and 10\\,\\muK^{\\prime} noise in uncontaminated temperature maps can make a ~10σ detection, or equivalently a ~10% mass measurement for each 103 set of clusters. With internal or external acoustic scale E-polarization measurements, the ET cross-correlation estimator can provide a stringent test for contaminants on a first detection at ~1/3 the significance. For surveys that reach below 3\\,\\muK^{\\prime}, the EB cross-correlation estimator should provide the most precise measurements and potentially the strongest control over contaminants.
ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.
Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka
2015-01-01
Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
Local pulmonary structure classification for computer-aided nodule detection
NASA Astrophysics Data System (ADS)
Bahlmann, Claus; Li, Xianlin; Okada, Kazunori
2006-03-01
We propose a new method of classifying the local structure types, such as nodules, vessels, and junctions, in thoracic CT scans. This classification is important in the context of computer aided detection (CAD) of lung nodules. The proposed method can be used as a post-process component of any lung CAD system. In such a scenario, the classification results provide an effective means of removing false positives caused by vessels and junctions thus improving overall performance. As main advantage, the proposed solution transforms the complex problem of classifying various 3D topological structures into much simpler 2D data clustering problem, to which more generic and flexible solutions are available in literature, and which is better suited for visualization. Given a nodule candidate, first, our solution robustly fits an anisotropic Gaussian to the data. The resulting Gaussian center and spread parameters are used to affine-normalize the data domain so as to warp the fitted anisotropic ellipsoid into a fixed-size isotropic sphere. We propose an automatic method to extract a 3D spherical manifold, containing the appropriate bounding surface of the target structure. Scale selection is performed by a data driven entropy minimization approach. The manifold is analyzed for high intensity clusters, corresponding to protruding structures. Techniques involve EMclustering with automatic mode number estimation, directional statistics, and hierarchical clustering with a modified Bhattacharyya distance. The estimated number of high intensity clusters explicitly determines the type of pulmonary structures: nodule (0), attached nodule (1), vessel (2), junction (>3). We show accurate classification results for selected examples in thoracic CT scans. This local procedure is more flexible and efficient than current state of the art and will help to improve the accuracy of general lung CAD systems.
Estimating Function Approaches for Spatial Point Processes
NASA Astrophysics Data System (ADS)
Deng, Chong
Spatial point pattern data consist of locations of events that are often of interest in biological and ecological studies. Such data are commonly viewed as a realization from a stochastic process called spatial point process. To fit a parametric spatial point process model to such data, likelihood-based methods have been widely studied. However, while maximum likelihood estimation is often too computationally intensive for Cox and cluster processes, pairwise likelihood methods such as composite likelihood, Palm likelihood usually suffer from the loss of information due to the ignorance of correlation among pairs. For many types of correlated data other than spatial point processes, when likelihood-based approaches are not desirable, estimating functions have been widely used for model fitting. In this dissertation, we explore the estimating function approaches for fitting spatial point process models. These approaches, which are based on the asymptotic optimal estimating function theories, can be used to incorporate the correlation among data and yield more efficient estimators. We conducted a series of studies to demonstrate that these estmating function approaches are good alternatives to balance the trade-off between computation complexity and estimating efficiency. First, we propose a new estimating procedure that improves the efficiency of pairwise composite likelihood method in estimating clustering parameters. Our approach combines estimating functions derived from pairwise composite likeli-hood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate its efficacy through a simulation study and an application to the longleaf pine data. Second, we further explore the quasi-likelihood approach on fitting second-order intensity function of spatial point processes. However, the original second-order quasi-likelihood is barely feasible due to the intense computation and high memory requirement needed to solve a large linear system. Motivated by the existence of geometric regular patterns in the stationary point processes, we find a lower dimension representation of the optimal weight function and propose a reduced second-order quasi-likelihood approach. Through a simulation study, we show that the proposed method not only demonstrates superior performance in fitting the clustering parameter but also merits in the relaxation of the constraint of the tuning parameter, H. Third, we studied the quasi-likelihood type estimating funciton that is optimal in a certain class of first-order estimating functions for estimating the regression parameter in spatial point process models. Then, by using a novel spectral representation, we construct an implementation that is computationally much more efficient and can be applied to more general setup than the original quasi-likelihood method.
Hierarchical modeling of cluster size in wildlife surveys
Royle, J. Andrew
2008-01-01
Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Liu, Jingxia; Colditz, Graham A
2018-05-01
There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the "working correlation structure" is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs-exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Cerón-Muñoz, M F; Tonhati, H; Costa, C N; Rojas-Sarmiento, D; Echeverri Echeverri, D M
2004-08-01
Descriptive herd variables (DVHE) were used to explain genotype by environment interactions (G x E) for milk yield (MY) in Brazilian and Colombian production environments and to develop a herd-cluster model to estimate covariance components and genetic parameters for each herd environment group. Data consisted of 180,522 lactation records of 94,558 Holstein cows from 937 Brazilian and 400 Colombian herds. Herds in both countries were jointly grouped in thirds according to 8 DVHE: production level, phenotypic variability, age at first calving, calving interval, percentage of imported semen, lactation length, and herd size. For each DVHE, REML bivariate animal model analyses were used to estimate genetic correlations for MY between upper and lower thirds of the data. Based on estimates of genetic correlations, weights were assigned to each DVHE to group herds in a cluster analysis using the FASTCLUS procedure in SAS. Three clusters were defined, and genetic and residual variance components were heterogeneous among herd clusters. Estimates of heritability in clusters 1 and 3 were 0.28 and 0.29, respectively, but the estimate was larger (0.39) in Cluster 2. The genetic correlations of MY from different clusters ranged from 0.89 to 0.97. The herd-cluster model based on DVHE properly takes into account G x E by grouping similar environments accordingly and seems to be an alternative to simply considering country borders to distinguish between environments.
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.
Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo
2018-07-01
Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
Candel, Math J J M; Van Breukelen, Gerard J P
2010-06-30
Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.
Cosmology with the largest galaxy cluster surveys: going beyond Fisher matrix forecasts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khedekar, Satej; Majumdar, Subhabrata, E-mail: satej@mpa-garching.mpg.de, E-mail: subha@tifr.res.in
2013-02-01
We make the first detailed MCMC likelihood study of cosmological constraints that are expected from some of the largest, ongoing and proposed, cluster surveys in different wave-bands and compare the estimates to the prevalent Fisher matrix forecasts. Mock catalogs of cluster counts expected from the surveys — eROSITA, WFXT, RCS2, DES and Planck, along with a mock dataset of follow-up mass calibrations are analyzed for this purpose. A fair agreement between MCMC and Fisher results is found only in the case of minimal models. However, for many cases, the marginalized constraints obtained from Fisher and MCMC methods can differ bymore » factors of 30-100%. The discrepancy can be alarmingly large for a time dependent dark energy equation of state, w(a); the Fisher methods are seen to under-estimate the constraints by as much as a factor of 4-5. Typically, Fisher estimates become more and more inappropriate as we move away from ΛCDM, to a constant-w dark energy to varying-w dark energy cosmologies. Fisher analysis, also, predicts incorrect parameter degeneracies. There are noticeable offsets in the likelihood contours obtained from Fisher methods that is caused due to an asymmetry in the posterior likelihood distribution as seen through a MCMC analysis. From the point of mass-calibration uncertainties, a high value of unknown scatter about the mean mass-observable relation, and its redshift dependence, is seen to have large degeneracies with the cosmological parameters σ{sub 8} and w(a) and can degrade the cosmological constraints considerably. We find that the addition of mass-calibrated cluster datasets can improve dark energy and σ{sub 8} constraints by factors of 2-3 from what can be obtained from CMB+SNe+BAO only . Finally, we show that a joint analysis of datasets of two (or more) different cluster surveys would significantly tighten cosmological constraints from using clusters only. Since, details of future cluster surveys are still being planned, we emphasize that optimal survey design must be done using MCMC analysis rather than Fisher forecasting.« less
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
Berwanger, Otávio; Guimarães, Hélio P; Laranjeira, Ligia N; Cavalcanti, Alexandre B; Kodama, Alessandra; Zazula, Ana Denise; Santucci, Eliana; Victor, Elivane; Flato, Uri A; Tenuta, Marcos; Carvalho, Vitor; Mira, Vera Lucia; Pieper, Karen S; Mota, Luiz Henrique; Peterson, Eric D; Lopes, Renato D
2012-03-01
Translating evidence into clinical practice in the management of acute coronary syndromes (ACS) is challenging. Few ACS quality improvement interventions have been rigorously evaluated to determine their impact on patient care and clinical outcomes. We designed a pragmatic, 2-arm, cluster-randomized trial involving 34 clusters (Brazilian public hospitals). Clusters were randomized to receive a multifaceted quality improvement intervention (experimental group) or routine practice (control group). The 6-month educational intervention included reminders, care algorithms, a case manager, and distribution of educational materials to health care providers. The primary end point was a composite of evidence-based post-ACS therapies within 24 hours of admission, with the secondary measure of major cardiovascular clinical events (death, nonfatal myocardial infarction, nonfatal cardiac arrest, and nonfatal stroke). Prescription of evidence-based therapies at hospital discharge were also evaluated as part of the secondary outcomes. All analyses were performed by the intention-to-treat principle and took the cluster design into account using individual-level regression modeling (generalized estimating equations). If proven effective, this multifaceted intervention would have wide use as a means of promoting optimal use of evidence-based interventions for the management of ACS. Copyright © 2012 Mosby, Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Behera, Rakesh K.; Watanabe, Taku; Andersson, David A.; Uberuaga, Blas P.; Deo, Chaitanya S.
2016-04-01
Oxygen interstitials in UO2+x significantly affect the thermophysical properties and microstructural evolution of the oxide nuclear fuel. In hyperstoichiometric Urania (UO2+x), these oxygen interstitials form different types of defect clusters, which have different migration behavior. In this study we have used kinetic Monte Carlo (kMC) to evaluate diffusivities of oxygen interstitials accounting for mono- and di-interstitial clusters. Our results indicate that the predicted diffusivities increase significantly at higher non-stoichiometry (x > 0.01) for di-interstitial clusters compared to a mono-interstitial only model. The diffusivities calculated at higher temperatures compare better with experimental values than at lower temperatures (< 973 K). We have discussed the resulting activation energies achieved for diffusion with all the mono- and di-interstitial models. We have carefully performed sensitivity analysis to estimate the effect of input di-interstitial binding energies on the predicted diffusivities and activation energies. While this article only discusses mono- and di-interstitials in evaluating oxygen diffusion response in UO2+x, future improvements to the model will primarily focus on including energetic definitions of larger stable interstitial clusters reported in the literature. The addition of larger clusters to the kMC model is expected to improve the comparison of oxygen transport in UO2+x with experiment.
Knox, Stephanie A; Chondros, Patty
2004-01-01
Background Cluster sample study designs are cost effective, however cluster samples violate the simple random sample assumption of independence of observations. Failure to account for the intra-cluster correlation of observations when sampling through clusters may lead to an under-powered study. Researchers therefore need estimates of intra-cluster correlation for a range of outcomes to calculate sample size. We report intra-cluster correlation coefficients observed within a large-scale cross-sectional study of general practice in Australia, where the general practitioner (GP) was the primary sampling unit and the patient encounter was the unit of inference. Methods Each year the Bettering the Evaluation and Care of Health (BEACH) study recruits a random sample of approximately 1,000 GPs across Australia. Each GP completes details of 100 consecutive patient encounters. Intra-cluster correlation coefficients were estimated for patient demographics, morbidity managed and treatments received. Intra-cluster correlation coefficients were estimated for descriptive outcomes and for associations between outcomes and predictors and were compared across two independent samples of GPs drawn three years apart. Results Between April 1999 and March 2000, a random sample of 1,047 Australian general practitioners recorded details of 104,700 patient encounters. Intra-cluster correlation coefficients for patient demographics ranged from 0.055 for patient sex to 0.451 for language spoken at home. Intra-cluster correlations for morbidity variables ranged from 0.005 for the management of eye problems to 0.059 for management of psychological problems. Intra-cluster correlation for the association between two variables was smaller than the descriptive intra-cluster correlation of each variable. When compared with the April 2002 to March 2003 sample (1,008 GPs) the estimated intra-cluster correlation coefficients were found to be consistent across samples. Conclusions The demonstrated precision and reliability of the estimated intra-cluster correlations indicate that these coefficients will be useful for calculating sample sizes in future general practice surveys that use the GP as the primary sampling unit. PMID:15613248
Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.
Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra
2016-11-20
The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A partial list of southern clusters of galaxies
NASA Technical Reports Server (NTRS)
Quintana, H.; White, R. A.
1990-01-01
An inspection of 34 SRC/ESO J southern sky fields is the basis of the present list of clusters of galaxies and their approximate classifications in terms of cluster concentration, defined independently of richness and shape-symmetry. Where possible, an estimate of the cluster morphological population is provided. The Bautz-Morgan classification was applied using a strict comparison with clusters on the Palomar Sky Survey. Magnitudes were estimated on the basis of galaxies with photoelectric or photographic magnitudes.
the-wizz: clustering redshift estimation for everyone
NASA Astrophysics Data System (ADS)
Morrison, C. B.; Hildebrandt, H.; Schmidt, S. J.; Baldry, I. K.; Bilicki, M.; Choi, A.; Erben, T.; Schneider, P.
2017-05-01
We present the-wizz, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of the-wizz is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an 'expert'. It allows the end user of a given survey to select any subsample of photometric galaxies with unknown redshifts, match this sample's catalogue indices into a value-added data file and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly survey and the Sloan Digital Sky Survey. The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. the-wizz can be downloaded at http://github.com/morriscb/The-wiZZ/.
Parameters of oscillation generation regions in open star cluster models
NASA Astrophysics Data System (ADS)
Danilov, V. M.; Putkov, S. I.
2017-07-01
We determine the masses and radii of central regions of open star cluster (OCL) models with small or zero entropy production and estimate the masses of oscillation generation regions in clustermodels based on the data of the phase-space coordinates of stars. The radii of such regions are close to the core radii of the OCL models. We develop a new method for estimating the total OCL masses based on the cluster core mass, the cluster and cluster core radii, and radial distribution of stars. This method yields estimates of dynamical masses of Pleiades, Praesepe, and M67, which agree well with the estimates of the total masses of the corresponding clusters based on proper motions and spectroscopic data for cluster stars.We construct the spectra and dispersion curves of the oscillations of the field of azimuthal velocities v φ in OCL models. Weak, low-amplitude unstable oscillations of v φ develop in cluster models near the cluster core boundary, and weak damped oscillations of v φ often develop at frequencies close to the frequencies of more powerful oscillations, which may reduce the non-stationarity degree in OCL models. We determine the number and parameters of such oscillations near the cores boundaries of cluster models. Such oscillations points to the possible role that gradient instability near the core of cluster models plays in the decrease of the mass of the oscillation generation regions and production of entropy in the cores of OCL models with massive extended cores.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel
2017-08-01
Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.
Balzer, Laura B; Zheng, Wenjing; van der Laan, Mark J; Petersen, Maya L
2018-01-01
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
Helleringer, Stephane; Arhinful, Daniel; Abuaku, Benjamin; Humes, Michael; Wilson, Emily; Marsh, Andrew; Clermont, Adrienne; Black, Robert E; Bryce, Jennifer; Amouzou, Agbessi
2018-01-01
Reducing neonatal and child mortality is a key component of the health-related sustainable development goal (SDG), but most low and middle income countries lack data to monitor child mortality on an annual basis. We tested a mortality monitoring system based on the continuous recording of pregnancies, births and deaths by trained community-based volunteers (CBV). This project was implemented in 96 clusters located in three districts of the Northern Region of Ghana. Community-based volunteers (CBVs) were selected from these clusters and were trained in recording all pregnancies, births, and deaths among children under 5 in their catchment areas. Data collection lasted from January 2012 through September 2013. All CBVs transmitted tallies of recorded births and deaths to the Ghana Birth and deaths registry each month, except in one of the study districts (approximately 80% reporting). Some events were reported only several months after they had occurred. We assessed the completeness and accuracy of CBV data by comparing them to retrospective full pregnancy histories (FPH) collected during a census of the same clusters conducted in October-December 2013. We conducted all analyses separately by district, as well as for the combined sample of all districts. During the 21-month implementation period, the CBVs reported a total of 2,819 births and 137 under-five deaths. Among the latter, there were 84 infant deaths (55 neonatal deaths and 29 post-neonatal deaths). Comparison of the CBV data with FPH data suggested that CBVs significantly under-estimated child mortality: the estimated under-5 mortality rate according to CBV data was only 2/3 of the rate estimated from FPH data (95% Confidence Interval for the ratio of the two rates = 51.7 to 81.4). The discrepancies between the CBV and FPH estimates of infant and neonatal mortality were more limited, but varied significantly across districts. In northern Ghana, a community-based data collection systems relying on volunteers did not yield accurate estimates of child mortality rates. Additional implementation research is needed to improve the timeliness, completeness and accuracy of such systems. Enhancing pregnancy monitoring, in particular, may be an essential step to improve the measurement of neonatal mortality.
ASCA Temperature Maps for Several Interesting Clusters and Their Interpretations
NASA Technical Reports Server (NTRS)
Markevitch, M.; Sarazin, C.; Forman, W.; Vikhlinin, A.
1998-01-01
We present ASCA temperature maps for several galaxy clusters with strong mergers, as well as for several relaxed clusters selected for X-ray mass determination. From the merger temperature maps, we estimate velocities of the colliding subunits and discuss several implications of these estimates. For the relaxed clusters, we derive unprecedentedly accurate mass and gas fraction profiles out to radii of overdensity approximately 500.
Alam, Mahbub-Ul; Winch, Peter J; Saxton, Ronald E; Nizame, Fosiul A; Yeasmin, Farzana; Norman, Guy; Masud, Abdullah-Al; Begum, Farzana; Rahman, Mahbubur; Hossain, Kamal; Layden, Anita; Unicomb, Leanne; Luby, Stephen P
2017-08-01
Shared toilets in urban slums are often unclean and poorly maintained, discouraging consistent use and thereby limiting impacts on health and quality of life. We developed behaviour change interventions to support shared toilet maintenance and improve user satisfaction. We report the intervention effectiveness on improving shared toilet cleanliness. We conducted a cluster-randomised controlled trial among users of 1226 shared toilets in 23 Dhaka slums. We assessed baseline toilet cleanliness in January 2015. The six-month intervention included provision of hardware (bin for solid waste, 4 l flushing bucket, 70 l water reservoir), and behaviour change communication (compound meetings, interpersonal household sessions, signs depicting rules for toilet use). We estimated the adjusted difference in difference (DID) to assess outcomes and accounted for clustering effects using generalised estimating equations. Compared to controls, intervention toilets were more likely to have water available inside toilet cubicles (DID: +4.7%, 95% CI: 0.2, 9.2), access to brush/broom for cleaning (DID: +8.4%, 95% CI: 2, 15) and waste bins (DID: +63%, 95% CI: 59, 66), while less likely to have visible faeces inside the pan (DID: -13%, 95% CI: -19, -5), the smell of faeces (DID: -7.6%, 95% CI: -14, -1.3) and household waste inside the cubicle (DID: -4%, 95% CI: -7, -1). In one of few efforts to promote shared toilet cleanliness, intervention compounds were significantly more likely to have cleaner toilets after six months. Future research might explore how residents can self-finance toilet maintenance, or employ mass media to reduce per-capita costs of behaviour change. © 2017 John Wiley & Sons Ltd.
Cluster-based analysis improves predictive validity of spike-triggered receptive field estimates
Malone, Brian J.
2017-01-01
Spectrotemporal receptive field (STRF) characterization is a central goal of auditory physiology. STRFs are often approximated by the spike-triggered average (STA), which reflects the average stimulus preceding a spike. In many cases, the raw STA is subjected to a threshold defined by gain values expected by chance. However, such correction methods have not been universally adopted, and the consequences of specific gain-thresholding approaches have not been investigated systematically. Here, we evaluate two classes of statistical correction techniques, using the resulting STRF estimates to predict responses to a novel validation stimulus. The first, more traditional technique eliminated STRF pixels (time-frequency bins) with gain values expected by chance. This correction method yielded significant increases in prediction accuracy, including when the threshold setting was optimized for each unit. The second technique was a two-step thresholding procedure wherein clusters of contiguous pixels surviving an initial gain threshold were then subjected to a cluster mass threshold based on summed pixel values. This approach significantly improved upon even the best gain-thresholding techniques. Additional analyses suggested that allowing threshold settings to vary independently for excitatory and inhibitory subfields of the STRF resulted in only marginal additional gains, at best. In summary, augmenting reverse correlation techniques with principled statistical correction choices increased prediction accuracy by over 80% for multi-unit STRFs and by over 40% for single-unit STRFs, furthering the interpretational relevance of the recovered spectrotemporal filters for auditory systems analysis. PMID:28877194
Coburn, T.C.; Freeman, P.A.; Attanasi, E.D.
2012-01-01
The primary objectives of this research were to (1) investigate empirical methods for establishing regional trends in unconventional gas resources as exhibited by historical production data and (2) determine whether or not incorporating additional knowledge of a regional trend in a suite of previously established local nonparametric resource prediction algorithms influences assessment results. Three different trend detection methods were applied to publicly available production data (well EUR aggregated to 80-acre cells) from the Devonian Antrim Shale gas play in the Michigan Basin. This effort led to the identification of a southeast-northwest trend in cell EUR values across the play that, in a very general sense, conforms to the primary fracture and structural orientations of the province. However, including this trend in the resource prediction algorithms did not lead to improved results. Further analysis indicated the existence of clustering among cell EUR values that likely dampens the contribution of the regional trend. The reason for the clustering, a somewhat unexpected result, is not completely understood, although the geological literature provides some possible explanations. With appropriate data, a better understanding of this clustering phenomenon may lead to important information about the factors and their interactions that control Antrim Shale gas production, which may, in turn, help establish a more general protocol for better estimating resources in this and other shale gas plays. ?? 2011 International Association for Mathematical Geology (outside the USA).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Popescu, Bogdan; Hanson, M. M.
2010-04-10
We present Monte Carlo models of open stellar clusters with the purpose of mapping out the behavior of integrated colors with mass and age. Our cluster simulation package allows for stochastic variations in the stellar mass function to evaluate variations in integrated cluster properties. We find that UBVK colors from our simulations are consistent with simple stellar population (SSP) models, provided the cluster mass is large, M {sub cluster} {>=} 10{sup 6} M {sub sun}. Below this mass, our simulations show two significant effects. First, the mean value of the distribution of integrated colors moves away from the SSP predictionsmore » and is less red, in the first 10{sup 7} to 10{sup 8} years in UBV colors, and for all ages in (V - K). Second, the 1{sigma} dispersion of observed colors increases significantly with lower cluster mass. We attribute the former to the reduced number of red luminous stars in most of the lower mass clusters and the latter to the increased stochastic effect of a few of these stars on lower mass clusters. This latter point was always assumed to occur, but we now provide the first public code able to quantify this effect. We are completing a more extensive database of magnitudes and colors as a function of stellar cluster age and mass that will allow the determination of the correlation coefficients among different bands, and improve estimates of cluster age and mass from integrated photometry.« less
2014-01-01
Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
Re-estimating sample size in cluster randomised trials with active recruitment within clusters.
van Schie, S; Moerbeek, M
2014-08-30
Often only a limited number of clusters can be obtained in cluster randomised trials, although many potential participants can be recruited within each cluster. Thus, active recruitment is feasible within the clusters. To obtain an efficient sample size in a cluster randomised trial, the cluster level and individual level variance should be known before the study starts, but this is often not the case. We suggest using an internal pilot study design to address this problem of unknown variances. A pilot can be useful to re-estimate the variances and re-calculate the sample size during the trial. Using simulated data, it is shown that an initially low or high power can be adjusted using an internal pilot with the type I error rate remaining within an acceptable range. The intracluster correlation coefficient can be re-estimated with more precision, which has a positive effect on the sample size. We conclude that an internal pilot study design may be used if active recruitment is feasible within a limited number of clusters. Copyright © 2014 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schrabback, T.; Applegate, D.; Dietrich, J. P.
Here we present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z median = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev–Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass–observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration–mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass–temperature scaling relation ln (E(z)M 500c/10 14 M ⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81more » $$+0.24\\atop{-0.14}$$(stat.)±0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c 200c=5.6$$+3.7\\atop{-1.8}$$.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schrabback, T.; Applegate, D.; Dietrich, J. P.
We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z(median) = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in Vmore » - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z) M-500c/10(14)M(circle dot)) = A + 1.5ln (kT/7.2 keV) to A = 1.81(-0.14)(+0.24)(stat.)+/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c(200c) = 5.6(-1.8)(+3.7).« less
Schrabback, T.; Applegate, D.; Dietrich, J. P.; ...
2017-10-14
Here we present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z median = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev–Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass–observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration–mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass–temperature scaling relation ln (E(z)M 500c/10 14 M ⊙) = A + 1.5ln (kT/7.2 keV) to A=1.81more » $$+0.24\\atop{-0.14}$$(stat.)±0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c 200c=5.6$$+3.7\\atop{-1.8}$$.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Melchior, P.; Gruen, D.; McClintock, T.
We use weak-lensing shear measurements to determine the mean mass of optically selected galaxy clusters in Dark Energy Survey Science Verification data. In a blinded analysis, we split the sample of more than 8,000 redMaPPer clusters into 15 subsets, spanning ranges in the richness parametermore » $$5 \\leq \\lambda \\leq 180$$ and redshift $$0.2 \\leq z \\leq 0.8$$, and fit the averaged mass density contrast profiles with a model that accounts for seven distinct sources of systematic uncertainty: shear measurement and photometric redshift errors; cluster-member contamination; miscentering; deviations from the NFW halo profile; halo triaxiality; and line-of-sight projections. We combine the inferred cluster masses to estimate the joint scaling relation between mass, richness and redshift, $$\\mathcal{M}(\\lambda,z) \\varpropto M_0 \\lambda^{F} (1+z)^{G}$$. We find $$M_0 \\equiv \\langle M_{200\\mathrm{m}}\\,|\\,\\lambda=30,z=0.5\\rangle=\\left[ 2.35 \\pm 0.22\\ \\rm{(stat)} \\pm 0.12\\ \\rm{(sys)} \\right] \\cdot 10^{14}\\ M_\\odot$$, with $$F = 1.12\\,\\pm\\,0.20\\ \\rm{(stat)}\\, \\pm\\, 0.06\\ \\rm{(sys)}$$ and $$G = 0.18\\,\\pm\\, 0.75\\ \\rm{(stat)}\\, \\pm\\, 0.24\\ \\rm{(sys)}$$. The amplitude of the mass-richness relation is in excellent agreement with the weak-lensing calibration of redMaPPer clusters in SDSS by Simet et al. (2016) and with the Saro et al. (2015) calibration based on abundance matching of SPT-detected clusters. Our results extend the redshift range over which the mass-richness relation of redMaPPer clusters has been calibrated with weak lensing from $$z\\leq 0.3$$ to $$z\\leq0.8$$. Calibration uncertainties of shear measurements and photometric redshift estimates dominate our systematic error budget and require substantial improvements for forthcoming studies.« less
Kong, Jianlei; Ding, Xiaokang; Liu, Jinhao; Yan, Lei; Wang, Jianli
2015-01-01
In this paper, a new algorithm to improve the accuracy of estimating diameter at breast height (DBH) for tree trunks in forest areas is proposed. First, the information is collected by a two-dimensional terrestrial laser scanner (2DTLS), which emits laser pulses to generate a point cloud. After extraction and filtration, the laser point clusters of the trunks are obtained, which are optimized by an arithmetic means method. Then, an algebraic circle fitting algorithm in polar form is non-linearly optimized by the Levenberg-Marquardt method to form a new hybrid algorithm, which is used to acquire the diameters and positions of the trees. Compared with previous works, this proposed method improves the accuracy of diameter estimation of trees significantly and effectively reduces the calculation time. Moreover, the experimental results indicate that this method is stable and suitable for the most challenging conditions, which has practical significance in improving the operating efficiency of forest harvester and reducing the risk of causing accidents. PMID:26147726
NASA Astrophysics Data System (ADS)
Strauss, Cesar; Rosa, Marcelo Barbio; Stephany, Stephan
2013-12-01
Convective cells are cloud formations whose growth, maturation and dissipation are of great interest among meteorologists since they are associated with severe storms with large precipitation structures. Some works suggest a strong correlation between lightning occurrence and convective cells. The current work proposes a new approach to analyze the correlation between precipitation and lightning, and to identify electrically active cells. Such cells may be employed for tracking convective events in the absence of weather radar coverage. This approach employs a new spatio-temporal clustering technique based on a temporal sliding-window and a standard kernel density estimation to process lightning data. Clustering allows the identification of the cells from lightning data and density estimation bounds the contours of the cells. The proposed approach was evaluated for two convective events in Southeast Brazil. Image segmentation of radar data was performed to identify convective precipitation structures using the Steiner criteria. These structures were then compared and correlated to the electrically active cells in particular instants of time for both events. It was observed that most precipitation structures have associated cells, by comparing the ground tracks of their centroids. In addition, for one particular cell of each event, its temporal evolution was compared to that of the associated precipitation structure. Results show that the proposed approach may improve the use of lightning data for tracking convective events in countries that lack weather radar coverage.
NASA Astrophysics Data System (ADS)
Koitz, Ralph; Soini, Thomas M.; Genest, Alexander; Trickey, S. B.; Rösch, Notker
2012-07-01
The performance of eight generalized gradient approximation exchange-correlation (xc) functionals is assessed by a series of scalar relativistic all-electron calculations on octahedral palladium model clusters Pdn with n = 13, 19, 38, 55, 79, 147 and the analogous clusters Aun (for n up through 79). For these model systems, we determined the cohesive energies and average bond lengths of the optimized octahedral structures. We extrapolate these values to the bulk limits and compare with the corresponding experimental values. While the well-established functionals BP, PBE, and PW91 are the most accurate at predicting energies, the more recent forms PBEsol, VMTsol, and VT{84}sol significantly improve the accuracy of geometries. The observed trends are largely similar for both Pd and Au. In the same spirit, we also studied the scalability of the ionization potentials and electron affinities of the Pd clusters, and extrapolated those quantities to estimates of the work function. Overall, the xc functionals can be classified into four distinct groups according to the accuracy of the computed parameters. These results allow a judicious selection of xc approximations for treating transition metal clusters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thanjavur, Karun; Willis, Jon; Crampton, David, E-mail: karun@uvic.c
2009-11-20
We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg{sup 2} images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulationsmore » show that the false detection rate for these data, at our selected threshold, is only approx1%, and that the cluster catalogs are approx80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z approx 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg{sup 2} are detected, with 1-2 Fornax-like or richer clusters every 2 deg{sup 2}. Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses-one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.« less
Lagrangian analysis by clustering. An example in the Nordic Seas.
NASA Astrophysics Data System (ADS)
Koszalka, Inga; Lacasce, Joseph H.
2010-05-01
We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian data. Rather than grouping the drifter-derived velocities in uniform geographical bins, as is commonly done, we group a specified number of nearest-neighbor velocities. This is done via a clustering algorithm operating on the instantaneous positions of the drifters. Thus it is the data distribution itself which determines the positions of the averages and the areal extent of the clusters. A major advantage is that because the number of members is essentially the same for all clusters, the statistical accuracy is more uniform than with geographical bins. We illustrate the technique using synthetic data from a stochastic model, employing a realistic mean flow. The latter is an accurate representation of the surface currents in the Nordic Seas and is strongly inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities (both of which are known from the stochastic model). We also compare the results to those obtained with fixed geographical bins. Clustering is more successful at capturing spatial variability of the mean flow and also improves convergence in the eddy diffusivity estimates. We discuss both the future prospects and shortcomings of the new method.
Zulu, Leo C; Kalipeni, Ezekiel; Johannes, Eliza
2014-05-23
Although local spatiotemporal analysis can improve understanding of geographic variation of the HIV epidemic, its drivers, and the search for targeted interventions, it is limited in sub-Saharan Africa. Despite recent declines, Malawi's estimated 10.0% HIV prevalence (2011) remained among the highest globally. Using data on pregnant women in Malawi, this study 1) examines spatiotemporal trends in HIV prevalence 1994-2010, and 2) for 2010, identifies and maps the spatial variation/clustering of factors associated with HIV prevalence at district level. Inverse distance weighting was used within ArcGIS Geographic Information Systems (GIS) software to generate continuous surfaces of HIV prevalence from point data (1994, 1996, 1999, 2001, 2003, 2005, 2007, and 2010) obtained from surveillance antenatal clinics. From the surfaces prevalence estimates were extracted at district level and the results mapped nationally. Spatial dependency (autocorrelation) and clustering of HIV prevalence were also analyzed. Correlation and multiple regression analyses were used to identify factors associated with HIV prevalence for 2010 and their spatial variation/clustering mapped and compared to HIV clustering. Analysis revealed wide spatial variation in HIV prevalence at regional, urban/rural, district and sub-district levels. However, prevalence was spatially leveling out within and across 'sub-epidemics' while declining significantly after 1999. Prevalence exhibited statistically significant spatial dependence nationally following initial (1995-1999) localized, patchy low/high patterns as the epidemic spread rapidly. Locally, HIV "hotspots" clustered among eleven southern districts/cities while a "coldspot" captured configurations of six central region districts. Preliminary multiple regression of 2010 HIV prevalence produced a model with four significant explanatory factors (adjusted R2 = 0.688): mean distance to main roads, mean travel time to nearest transport, percentage that had taken an HIV test ever, and percentage attaining a senior primary education. Spatial clustering linked some factors to particular subsets of high HIV-prevalence districts. Spatial analysis enhanced understanding of local spatiotemporal variation in HIV prevalence, possible underlying factors, and potential for differentiated spatial targeting of interventions. Findings suggest that intervention strategies should also emphasize improved access to health/HIV services, basic education, and syphilis management, particularly in rural hotspot districts, as further research is done on drivers at finer scale.
2014-01-01
Background Although local spatiotemporal analysis can improve understanding of geographic variation of the HIV epidemic, its drivers, and the search for targeted interventions, it is limited in sub-Saharan Africa. Despite recent declines, Malawi’s estimated 10.0% HIV prevalence (2011) remained among the highest globally. Using data on pregnant women in Malawi, this study 1) examines spatiotemporal trends in HIV prevalence 1994-2010, and 2) for 2010, identifies and maps the spatial variation/clustering of factors associated with HIV prevalence at district level. Methods Inverse distance weighting was used within ArcGIS Geographic Information Systems (GIS) software to generate continuous surfaces of HIV prevalence from point data (1994, 1996, 1999, 2001, 2003, 2005, 2007, and 2010) obtained from surveillance antenatal clinics. From the surfaces prevalence estimates were extracted at district level and the results mapped nationally. Spatial dependency (autocorrelation) and clustering of HIV prevalence were also analyzed. Correlation and multiple regression analyses were used to identify factors associated with HIV prevalence for 2010 and their spatial variation/clustering mapped and compared to HIV clustering. Results Analysis revealed wide spatial variation in HIV prevalence at regional, urban/rural, district and sub-district levels. However, prevalence was spatially leveling out within and across ‘sub-epidemics’ while declining significantly after 1999. Prevalence exhibited statistically significant spatial dependence nationally following initial (1995-1999) localized, patchy low/high patterns as the epidemic spread rapidly. Locally, HIV “hotspots” clustered among eleven southern districts/cities while a “coldspot” captured configurations of six central region districts. Preliminary multiple regression of 2010 HIV prevalence produced a model with four significant explanatory factors (adjusted R2 = 0.688): mean distance to main roads, mean travel time to nearest transport, percentage that had taken an HIV test ever, and percentage attaining a senior primary education. Spatial clustering linked some factors to particular subsets of high HIV-prevalence districts. Conclusions Spatial analysis enhanced understanding of local spatiotemporal variation in HIV prevalence, possible underlying factors, and potential for differentiated spatial targeting of interventions. Findings suggest that intervention strategies should also emphasize improved access to health/HIV services, basic education, and syphilis management, particularly in rural hotspot districts, as further research is done on drivers at finer scale. PMID:24886573
Ellipsoidal fuzzy learning for smart car platoons
NASA Astrophysics Data System (ADS)
Dickerson, Julie A.; Kosko, Bart
1993-12-01
A neural-fuzzy system combined supervised and unsupervised learning to find and tune the fuzzy-rules. An additive fuzzy system approximates a function by covering its graph with fuzzy rules. A fuzzy rule patch can take the form of an ellipsoid in the input-output space. Unsupervised competitive learning found the statistics of data clusters. The covariance matrix of each synaptic quantization vector defined on ellipsoid centered at the centroid of the data cluster. Tightly clustered data gave smaller ellipsoids or more certain rules. Sparse data gave larger ellipsoids or less certain rules. Supervised learning tuned the ellipsoids to improve the approximation. The supervised neural system used gradient descent to find the ellipsoidal fuzzy patches. It locally minimized the mean-squared error of the fuzzy approximation. Hybrid ellipsoidal learning estimated the control surface for a smart car controller.
Individual participant data meta-analyses should not ignore clustering
Abo-Zaid, Ghada; Guo, Boliang; Deeks, Jonathan J.; Debray, Thomas P.A.; Steyerberg, Ewout W.; Moons, Karel G.M.; Riley, Richard David
2013-01-01
Objectives Individual participant data (IPD) meta-analyses often analyze their IPD as if coming from a single study. We compare this approach with analyses that rather account for clustering of patients within studies. Study Design and Setting Comparison of effect estimates from logistic regression models in real and simulated examples. Results The estimated prognostic effect of age in patients with traumatic brain injury is similar, regardless of whether clustering is accounted for. However, a family history of thrombophilia is found to be a diagnostic marker of deep vein thrombosis [odds ratio, 1.30; 95% confidence interval (CI): 1.00, 1.70; P = 0.05] when clustering is accounted for but not when it is ignored (odds ratio, 1.06; 95% CI: 0.83, 1.37; P = 0.64). Similarly, the treatment effect of nicotine gum on smoking cessation is severely attenuated when clustering is ignored (odds ratio, 1.40; 95% CI: 1.02, 1.92) rather than accounted for (odds ratio, 1.80; 95% CI: 1.29, 2.52). Simulations show models accounting for clustering perform consistently well, but downwardly biased effect estimates and low coverage can occur when ignoring clustering. Conclusion Researchers must routinely account for clustering in IPD meta-analyses; otherwise, misleading effect estimates and conclusions may arise. PMID:23651765
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.
Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray
2016-12-01
In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Regional L-Moment-Based Flood Frequency Analysis in the Upper Vistula River Basin, Poland
NASA Astrophysics Data System (ADS)
Rutkowska, A.; Żelazny, M.; Kohnová, S.; Łyp, M.; Banasik, K.
2017-02-01
The Upper Vistula River basin was divided into pooling groups with similar dimensionless frequency distributions of annual maximum river discharge. The cluster analysis and the Hosking and Wallis (HW) L-moment-based method were used to divide the set of 52 mid-sized catchments into disjoint clusters with similar morphometric, land use, and rainfall variables, and to test the homogeneity within clusters. Finally, three and four pooling groups were obtained alternatively. Two methods for identification of the regional distribution function were used, the HW method and the method of Kjeldsen and Prosdocimi based on a bivariate extension of the HW measure. Subsequently, the flood quantile estimates were calculated using the index flood method. The ordinary least squares (OLS) and the generalised least squares (GLS) regression techniques were used to relate the index flood to catchment characteristics. Predictive performance of the regression scheme for the southern part of the Upper Vistula River basin was improved by using GLS instead of OLS. The results of the study can be recommended for the estimation of flood quantiles at ungauged sites, in flood risk mapping applications, and in engineering hydrology to help design flood protection structures.
Cognitive profiles in euthymic patients with bipolar disorders: results from the FACE-BD cohort.
Roux, Paul; Raust, Aurélie; Cannavo, Anne Sophie; Aubin, Valérie; Aouizerate, Bruno; Azorin, Jean-Michel; Bellivier, Frank; Belzeaux, Raoul; Bougerol, Thierry; Cussac, Iréna; Courtet, Philippe; Etain, Bruno; Gard, Sébastien; Job, Sophie; Kahn, Jean-Pierre; Leboyer, Marion; Olié, Emilie; Henry, Chantal; Passerieux, Christine
2017-03-01
Although cognitive deficits are a well-established feature of bipolar disorders (BD), even during periods of euthymia, little is known about cognitive phenotype heterogeneity among patients with BD. We investigated neuropsychological performance in 258 euthymic patients with BD recruited via the French network of expert centers for BD. We used a test battery assessing six domains of cognition. Hierarchical cluster analysis of the cross-sectional data was used to determine the optimal number of subgroups and to assign each patient to a specific cognitive cluster. Subsequently, subjects from each cluster were compared on demographic, clinical functioning, and pharmacological variables. A four-cluster solution was identified. The global cognitive performance was above normal in one cluster and below normal in another. The other two clusters had a near-normal cognitive performance, with above and below average verbal memory, respectively. Among the four clusters, significant differences were observed in estimated intelligence quotient and social functioning, which were lower for the low cognitive performers compared to the high cognitive performers. These results confirm the existence of several distinct cognitive profiles in BD. Identification of these profiles may help to develop profile-specific cognitive remediation programs, which might improve functioning in BD. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Cluster Stability Estimation Based on a Minimal Spanning Trees Approach
NASA Astrophysics Data System (ADS)
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora
2009-08-01
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.
Arpino, Bruno; Cannas, Massimo
2016-05-30
This article focuses on the implementation of propensity score matching for clustered data. Different approaches to reduce bias due to cluster-level confounders are considered and compared using Monte Carlo simulations. We investigated methods that exploit the clustered structure of the data in two ways: in the estimation of the propensity score model (through the inclusion of fixed or random effects) or in the implementation of the matching algorithm. In addition to a pure within-cluster matching, we also assessed the performance of a new approach, 'preferential' within-cluster matching. This approach first searches for control units to be matched to treated units within the same cluster. If matching is not possible within-cluster, then the algorithm searches in other clusters. All considered approaches successfully reduced the bias due to the omission of a cluster-level confounder. The preferential within-cluster matching approach, combining the advantages of within-cluster and between-cluster matching, showed a relatively good performance both in the presence of big and small clusters, and it was often the best method. An important advantage of this approach is that it reduces the number of unmatched units as compared with a pure within-cluster matching. We applied these methods to the estimation of the effect of caesarean section on the Apgar score using birth register data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
An improved approximate-Bayesian model-choice method for estimating shared evolutionary history
2014-01-01
Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937
Attitude Estimation in Fractionated Spacecraft Cluster Systems
NASA Technical Reports Server (NTRS)
Hadaegh, Fred Y.; Blackmore, James C.
2011-01-01
An attitude estimation was examined in fractioned free-flying spacecraft. Instead of a single, monolithic spacecraft, a fractionated free-flying spacecraft uses multiple spacecraft modules. These modules are connected only through wireless communication links and, potentially, wireless power links. The key advantage of this concept is the ability to respond to uncertainty. For example, if a single spacecraft module in the cluster fails, a new one can be launched at a lower cost and risk than would be incurred with onorbit servicing or replacement of the monolithic spacecraft. In order to create such a system, however, it is essential to know what the navigation capabilities of the fractionated system are as a function of the capabilities of the individual modules, and to have an algorithm that can perform estimation of the attitudes and relative positions of the modules with fractionated sensing capabilities. Looking specifically at fractionated attitude estimation with startrackers and optical relative attitude sensors, a set of mathematical tools has been developed that specify the set of sensors necessary to ensure that the attitude of the entire cluster ( cluster attitude ) can be observed. Also developed was a navigation filter that can estimate the cluster attitude if these conditions are satisfied. Each module in the cluster may have either a startracker, a relative attitude sensor, or both. An extended Kalman filter can be used to estimate the attitude of all modules. A range of estimation performances can be achieved depending on the sensors used and the topology of the sensing network.
MASSCLEANage—Stellar Cluster Ages from Integrated Colors
NASA Astrophysics Data System (ADS)
Popescu, Bogdan; Hanson, M. M.
2010-11-01
We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC. Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.
Spatially explicit population estimates for black bears based on cluster sampling
Humm, J.; McCown, J. Walter; Scheick, B.K.; Clark, Joseph D.
2017-01-01
We estimated abundance and density of the 5 major black bear (Ursus americanus) subpopulations (i.e., Eglin, Apalachicola, Osceola, Ocala-St. Johns, Big Cypress) in Florida, USA with spatially explicit capture-mark-recapture (SCR) by extracting DNA from hair samples collected at barbed-wire hair sampling sites. We employed a clustered sampling configuration with sampling sites arranged in 3 × 3 clusters spaced 2 km apart within each cluster and cluster centers spaced 16 km apart (center to center). We surveyed all 5 subpopulations encompassing 38,960 km2 during 2014 and 2015. Several landscape variables, most associated with forest cover, helped refine density estimates for the 5 subpopulations we sampled. Detection probabilities were affected by site-specific behavioral responses coupled with individual capture heterogeneity associated with sex. Model-averaged bear population estimates ranged from 120 (95% CI = 59–276) bears or a mean 0.025 bears/km2 (95% CI = 0.011–0.44) for the Eglin subpopulation to 1,198 bears (95% CI = 949–1,537) or 0.127 bears/km2 (95% CI = 0.101–0.163) for the Ocala-St. Johns subpopulation. The total population estimate for our 5 study areas was 3,916 bears (95% CI = 2,914–5,451). The clustered sampling method coupled with information on land cover was efficient and allowed us to estimate abundance across extensive areas that would not have been possible otherwise. Clustered sampling combined with spatially explicit capture-recapture methods has the potential to provide rigorous population estimates for a wide array of species that are extensive and heterogeneous in their distribution.
A MAGNIFIED GLANCE INTO THE DARK SECTOR: PROBING COSMOLOGICAL MODELS WITH STRONG LENSING IN A1689
DOE Office of Scientific and Technical Information (OSTI.GOV)
Magaña, Juan; Motta, V.; Cárdenas, Victor H.
2015-11-01
In this paper we constrain four alternative models to the late cosmic acceleration in the universe: Chevallier–Polarski–Linder (CPL), interacting dark energy (IDE), Ricci holographic dark energy (HDE), and modified polytropic Cardassian (MPC). Strong lensing (SL) images of background galaxies produced by the galaxy cluster Abell 1689 are used to test these models. To perform this analysis we modify the LENSTOOL lens modeling code. The value added by this probe is compared with other complementary probes: Type Ia supernovae (SN Ia), baryon acoustic oscillations (BAO), and cosmic microwave background (CMB). We found that the CPL constraints obtained for the SL datamore » are consistent with those estimated using the other probes. The IDE constraints are consistent with the complementary bounds only if large errors in the SL measurements are considered. The Ricci HDE and MPC constraints are weak, but they are similar to the BAO, SN Ia, and CMB estimations. We also compute the figure of merit as a tool to quantify the goodness of fit of the data. Our results suggest that the SL method provides statistically significant constraints on the CPL parameters but is weak for those of the other models. Finally, we show that the use of the SL measurements in galaxy clusters is a promising and powerful technique to constrain cosmological models. The advantage of this method is that cosmological parameters are estimated by modeling the SL features for each underlying cosmology. These estimations could be further improved by SL constraints coming from other galaxy clusters.« less
Energy spectra of X-ray clusters of galaxies
NASA Technical Reports Server (NTRS)
Avni, Y.
1976-01-01
A procedure for estimating the ranges of parameters that describe the spectra of X-rays from clusters of galaxies is presented. The applicability of the method is proved by statistical simulations of cluster spectra; such a proof is necessary because of the nonlinearity of the spectral functions. Implications for the spectra of the Perseus, Coma, and Virgo clusters are discussed. The procedure can be applied in more general problems of parameter estimation.
A 1400-MHz survey of 1478 Abell clusters of galaxies
NASA Technical Reports Server (NTRS)
Owen, F. N.; White, R. A.; Hilldrup, K. C.; Hanisch, R. J.
1982-01-01
Observations of 1478 Abell clusters of galaxies with the NRAO 91-m telescope at 1400 MHz are reported. The measured beam shape was deconvolved from the measured source Gaussian fits in order to estimate the source size and position angle. All detected sources within 0.5 corrected Abell cluster radii are listed, including the cluster number, richness class, distance class, magnitude of the tenth brightest galaxy, redshift estimate, corrected cluster radius in arcmin, right ascension and error, declination and error, total flux density and error, and angular structure for each source.
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
Wickham, J.D.; Stehman, S.V.; Smith, J.H.; Wade, T.G.; Yang, L.
2004-01-01
Two-stage cluster sampling reduces the cost of collecting accuracy assessment reference data by constraining sample elements to fall within a limited number of geographic domains (clusters). However, because classification error is typically positively spatially correlated, within-cluster correlation may reduce the precision of the accuracy estimates. The detailed population information to quantify a priori the effect of within-cluster correlation on precision is typically unavailable. Consequently, a convenient, practical approach to evaluate the likely performance of a two-stage cluster sample is needed. We describe such an a priori evaluation protocol focusing on the spatial distribution of the sample by land-cover class across different cluster sizes and costs of different sampling options, including options not imposing clustering. This protocol also assesses the two-stage design's adequacy for estimating the precision of accuracy estimates for rare land-cover classes. We illustrate the approach using two large-area, regional accuracy assessments from the National Land-Cover Data (NLCD), and describe how the a priorievaluation was used as a decision-making tool when implementing the NLCD design.
Target Information Processing: A Joint Decision and Estimation Approach
2012-03-29
ground targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important...targets ( track - before - detect ) using computer cluster and graphics processing unit. Estimation and filtering theory is one of the most important
Birnbaum, Julia; Geyer, Christine; Kirchberg, Franca; Manios, Yannis; Koletzko, Berthold
2017-02-01
This study targeted to examine the effect of the ToyBox-intervention, a kindergarten-based, family-involved intervention, aiming to improve preschooler's energy-related behaviours (e.g., physical activity) on motor performance ability. Physical activity sessions, classroom activities, environmental changes and tools for parents were the components of the 1-year intervention. The intervention and control were cluster-randomised, and children's anthropometry and two motor test items (jumping from side to side, JSS and standing long jump, SLJ) were assessed. A total of 1293 (4.6 ± 0.69 years; 52% boys) from 45 kindergartens in Germany were included (intervention, n = 863; control, n = 430). The effect was assessed using generalised estimating equation. The intervention group showed a better improvement in JSS (Estimate 2.19 jumps, P = 0.01) and tended to improve better in SLJ (Estimate 2.73 cm, P = 0.08). The intervention was more effective in boys with respect to SLJ (P of interaction effect = 0.01). Children aged <4.5 years did not show a significant benefit while older children improved (JSS, Estimate 3.38 jumps, P = 0.004; SLJ, Estimate 4.18 cm, P = 0.04). Children with low socio-economic status improved in JSS (Estimate 5.98 jumps, P = 0.0001). The ToyBox-intervention offers an effective strategy to improve specific components of motor performance ability in early childhood. Future programmes should consider additional strategies specifically targeting girls and younger aged children. BMI: body mass index; SES: socio-economic status; JSS: jumping from side to side; SLJ: standing long jump; SD: standard deviation; GEE: generalised estimating equation.
NASA Astrophysics Data System (ADS)
Ebrahimi, A.; Pahlavani, P.; Masoumi, Z.
2017-09-01
Traffic monitoring and managing in urban intelligent transportation systems (ITS) can be carried out based on vehicular sensor networks. In a vehicular sensor network, vehicles equipped with sensors such as GPS, can act as mobile sensors for sensing the urban traffic and sending the reports to a traffic monitoring center (TMC) for traffic estimation. The energy consumption by the sensor nodes is a main problem in the wireless sensor networks (WSNs); moreover, it is the most important feature in designing these networks. Clustering the sensor nodes is considered as an effective solution to reduce the energy consumption of WSNs. Each cluster should have a Cluster Head (CH), and a number of nodes located within its supervision area. The cluster heads are responsible for gathering and aggregating the information of clusters. Then, it transmits the information to the data collection center. Hence, the use of clustering decreases the volume of transmitting information, and, consequently, reduces the energy consumption of network. In this paper, Fuzzy C-Means (FCM) and Fuzzy Subtractive algorithms are employed to cluster sensors and investigate their performance on the energy consumption of sensors. It can be seen that the FCM algorithm and Fuzzy Subtractive have been reduced energy consumption of vehicle sensors up to 90.68% and 92.18%, respectively. Comparing the performance of the algorithms implies the 1.5 percent improvement in Fuzzy Subtractive algorithm in comparison.
NASA Instrument Cost/Schedule Model
NASA Technical Reports Server (NTRS)
Habib-Agahi, Hamid; Mrozinski, Joe; Fox, George
2011-01-01
NASA's Office of Independent Program and Cost Evaluation (IPCE) has established a number of initiatives to improve its cost and schedule estimating capabilities. 12One of these initiatives has resulted in the JPL developed NASA Instrument Cost Model. NICM is a cost and schedule estimator that contains: A system level cost estimation tool; a subsystem level cost estimation tool; a database of cost and technical parameters of over 140 previously flown remote sensing and in-situ instruments; a schedule estimator; a set of rules to estimate cost and schedule by life cycle phases (B/C/D); and a novel tool for developing joint probability distributions for cost and schedule risk (Joint Confidence Level (JCL)). This paper describes the development and use of NICM, including the data normalization processes, data mining methods (cluster analysis, principal components analysis, regression analysis and bootstrap cross validation), the estimating equations themselves and a demonstration of the NICM tool suite.
STELLAR ENCOUNTER RATE IN GALACTIC GLOBULAR CLUSTERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bahramian, Arash; Heinke, Craig O.; Sivakoff, Gregory R.
2013-04-01
The high stellar densities in the cores of globular clusters cause significant stellar interactions. These stellar interactions can produce close binary mass-transferring systems involving compact objects and their progeny, such as X-ray binaries and radio millisecond pulsars. Comparing the numbers of these systems and interaction rates in different clusters drives our understanding of how cluster parameters affect the production of close binaries. In this paper we estimate stellar encounter rates ({Gamma}) for 124 Galactic globular clusters based on observational data as opposed to the methods previously employed, which assumed 'King-model' profiles for all clusters. By deprojecting cluster surface brightness profilesmore » to estimate luminosity density profiles, we treat 'King-model' and 'core-collapsed' clusters in the same way. In addition, we use Monte Carlo simulations to investigate the effects of uncertainties in various observational parameters (distance, reddening, surface brightness) on {Gamma}, producing the first catalog of globular cluster stellar encounter rates with estimated errors. Comparing our results with published observations of likely products of stellar interactions (numbers of X-ray binaries, numbers of radio millisecond pulsars, and {gamma}-ray luminosity) we find both clear correlations and some differences with published results.« less
Li, Peng; Redden, David T.
2014-01-01
SUMMARY The sandwich estimator in generalized estimating equations (GEE) approach underestimates the true variance in small samples and consequently results in inflated type I error rates in hypothesis testing. This fact limits the application of the GEE in cluster-randomized trials (CRTs) with few clusters. Under various CRT scenarios with correlated binary outcomes, we evaluate the small sample properties of the GEE Wald tests using bias-corrected sandwich estimators. Our results suggest that the GEE Wald z test should be avoided in the analyses of CRTs with few clusters even when bias-corrected sandwich estimators are used. With t-distribution approximation, the Kauermann and Carroll (KC)-correction can keep the test size to nominal levels even when the number of clusters is as low as 10, and is robust to the moderate variation of the cluster sizes. However, in cases with large variations in cluster sizes, the Fay and Graubard (FG)-correction should be used instead. Furthermore, we derive a formula to calculate the power and minimum total number of clusters one needs using the t test and KC-correction for the CRTs with binary outcomes. The power levels as predicted by the proposed formula agree well with the empirical powers from the simulations. The proposed methods are illustrated using real CRT data. We conclude that with appropriate control of type I error rates under small sample sizes, we recommend the use of GEE approach in CRTs with binary outcomes due to fewer assumptions and robustness to the misspecification of the covariance structure. PMID:25345738
VizieR Online Data Catalog: Star clusters distances and extinctions (Buckner+, 2013)
NASA Astrophysics Data System (ADS)
Buckner, A. S. M.; Froebrich, D.
2014-10-01
Determining star cluster distances is essential to analyse their properties and distribution in the Galaxy. In particular, it is desirable to have a reliable, purely photometric distance estimation method for large samples of newly discovered cluster candidates e.g. from the Two Micron All Sky Survey, the UK Infrared Deep Sky Survey Galactic Plane Survey and VVV. Here, we establish an automatic method to estimate distances and reddening from near-infrared photometry alone, without the use of isochrone fitting. We employ a decontamination procedure of JHK photometry to determine the density of stars foreground to clusters and a galactic model to estimate distances. We then calibrate the method using clusters with known properties. This allows us to establish distance estimates with better than 40 percent accuracy. We apply our method to determine the extinction and distance values to 378 known open clusters and 397 cluster candidates from the list of Froebrich, Scholz & Raftery (2007MNRAS.374..399F, Cat. J/MNRAS/374/399). We find that the sample is biased towards clusters of a distance of approximately 3kpc, with typical distances between 2 and 6kpc. Using the cluster distances and extinction values, we investigate how the average extinction per kiloparsec distance changes as a function of the Galactic longitude. We find a systematic dependence that can be approximated by AH(l)[mag/kpc]=0.10+0.001x|l-180°|/° for regions more than 60° from the Galactic Centre. (1 data file).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reichardt, C. L.; Stalder, B.; Ashby, M. L. N.
2013-02-15
We present a catalog of galaxy cluster candidates, selected through their Sunyaev-Zel'dovich (SZ) effect signature in the first 720 deg{sup 2} of the South Pole Telescope (SPT) survey. This area was mapped with the SPT in the 2008 and 2009 austral winters to a depth of {approx}18 {mu}K{sub CMB}-arcmin at 150 GHz; 550 deg{sup 2} of it was also mapped to {approx}44 {mu}K{sub CMB}-arcmin at 95 GHz. Based on optical imaging of all 224 candidates and near-infrared imaging of the majority of candidates, we have found optical and/or infrared counterparts for 158, which we then classify as confirmed galaxy clusters.more » Of these 158 clusters, 135 were first identified as clusters in SPT data, including 117 new discoveries reported in this work. This catalog triples the number of confirmed galaxy clusters discovered through the SZ effect. We report photometrically derived (and in some cases spectroscopic) redshifts for confirmed clusters and redshift lower limits for the remaining candidates. The catalog extends to high redshift with a median redshift of z = 0.55 and maximum confirmed redshift of z = 1.37. Forty-five of the clusters have counterparts in the ROSAT bright or faint source catalogs from which we estimate X-ray fluxes. Based on simulations, we expect the catalog to be nearly 100% complete above M {sub 500} Almost-Equal-To 5 Multiplication-Sign 10{sup 14} M {sub Sun} h {sup -1} {sub 70} at z {approx}> 0.6. There are 121 candidates detected at signal-to-noise ratio greater than five, at which the catalog purity is measured to be 95%. From this high-purity subsample, we exclude the z < 0.3 clusters and use the remaining 100 candidates to improve cosmological constraints following the method presented by Benson et al. Adding the cluster data to CMB + BAO + H {sub 0} data leads to a preference for non-zero neutrino masses while only slightly reducing the upper limit on the sum of neutrino masses to {Sigma}m {sub {nu}} < 0.38 eV (95% CL). For a spatially flat wCDM cosmological model, the addition of this catalog to the CMB + BAO + H {sub 0} + SNe results yields {sigma}{sub 8} = 0.807 {+-} 0.027 and w = -1.010 {+-} 0.058, improving the constraints on these parameters by a factor of 1.4 and 1.3, respectively. The larger cluster catalog presented in this work leads to slight improvements in cosmological constraints from those presented by Benson et al. These cosmological constraints are currently limited by uncertainty in the cluster mass calibration, not the size or quality of the cluster catalog. A multi-wavelength observation program to improve the cluster mass calibration will make it possible to realize the full potential of the final 2500 deg{sup 2} SPT cluster catalog to constrain cosmology.« less
Buregyeya, Esther; Rutebemberwa, Elizeus; LaRussa, Philip; Mbonye, Anthony
2016-11-11
Uganda's under-five mortality is high, currently estimated at 66/1000 live births. Poor referral of sick children that seek care from the private sector is one of the contributory factors. The proposed intervention aims to improve referral and uptake of referral advice for children that seek care from private facilities (registered drug shops/private clinics). A cluster randomized design will be applied to test the intervention in Mukono District, central Uganda. A sample of study clusters will implement the intervention. The intervention will consist of three components: i) raising awareness in the community: village health teams will discuss the importance of referral and encourage households to save money, ii) training and supervision of providers in the private sector to diagnose, treat and refer sick children, iii) regular meetings between the public and private providers (convened by the district health team) to discuss the referral system. Twenty clusters will be included in the study, randomized in the ratio of 1:1. A minimum of 319 sick children per cluster and the total number of sick children to be recruited from all clusters will be 8910; adjusting for a 10 % loss to follow up and possible withdrawal of private outlets. The immediate sustainable impact will be appropriate treatment of sick children. The intervention is likely to impact on private sector practices since the scope of the services they provide will have expanded. The proposed study is also likely to have an impact on families as; i) they may appreciate the importance of timely referral on child illness management, ii) the cost savings related to reduced morbidity will be used by household to access other social services. The linkage between the private and public sectors will create a potential avenue for delivery of other public health interventions and improved working relations in the two sectors. Further, improved quality of services in the private sector will improve provider confidence and hopefully more clientelle to the private practices. NCT02450630 Registration date: May/9 th /2015.
Carpenter, Joanne S; Robillard, Rébecca; Lee, Rico S C; Hermens, Daniel F; Naismith, Sharon L; White, Django; Whitwell, Bradley; Scott, Elizabeth M; Hickie, Ian B
2015-01-01
Although early-stage affective disorders are associated with both cognitive dysfunction and sleep-wake disruptions, relationships between these factors have not been specifically examined in young adults. Sleep and circadian rhythm disturbances in those with affective disorders are considerably heterogeneous, and may not relate to cognitive dysfunction in a simple linear fashion. This study aimed to characterise profiles of sleep and circadian disturbance in young people with affective disorders and examine associations between these profiles and cognitive performance. Actigraphy monitoring was completed in 152 young people (16-30 years; 66% female) with primary diagnoses of affective disorders, and 69 healthy controls (18-30 years; 57% female). Patients also underwent detailed neuropsychological assessment. Actigraphy data were processed to estimate both sleep and circadian parameters. Overall neuropsychological performance in patients was poor on tasks relating to mental flexibility and visual memory. Two hierarchical cluster analyses identified three distinct patient groups based on sleep variables and three based on circadian variables. Sleep clusters included a 'long sleep' cluster, a 'disrupted sleep' cluster, and a 'delayed and disrupted sleep' cluster. Circadian clusters included a 'strong circadian' cluster, a 'weak circadian' cluster, and a 'delayed circadian' cluster. Medication use differed between clusters. The 'long sleep' cluster displayed significantly worse visual memory performance compared to the 'disrupted sleep' cluster. No other cognitive functions differed between clusters. These results highlight the heterogeneity of sleep and circadian profiles in young people with affective disorders, and provide preliminary evidence in support of a relationship between sleep and visual memory, which may be mediated by use of antipsychotic medication. These findings have implications for the personalisation of treatments and improvement of functioning in young adults early in the course of affective illness.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Color-magnitude diagrams for six metal-rich, low-latitude globular clusters
NASA Technical Reports Server (NTRS)
Armandroff, Taft E.
1988-01-01
Colors and magnitudes for stars on CCD frames for six metal-rich, low-latitude, previously unstudied globular clusters and one well-studied, metal-rich cluster (47 Tuc) have been derived and color-magnitude diagrams have been constructed. The photometry for stars in 47 Tuc are in good agreement with previous studies, while the V magnitudes of the horizontal-branch stars in the six program clusters do not agree with estimates based on secondary methods. The distances to these clusters are different from prior estimates. Redding values are derived for each program cluster. The horizontal branches of the program clusters all appear to lie entirely redwards of the red edge of the instability strip, as is normal for their metallicities.
Prüss-Ustün, Annette; Bartram, Jamie; Clasen, Thomas; Colford, John M; Cumming, Oliver; Curtis, Valerie; Bonjour, Sophie; Dangour, Alan D; De France, Jennifer; Fewtrell, Lorna; Freeman, Matthew C; Gordon, Bruce; Hunter, Paul R; Johnston, Richard B; Mathers, Colin; Mäusezahl, Daniel; Medlicott, Kate; Neira, Maria; Stocks, Meredith; Wolf, Jennyfer; Cairncross, Sandy
2014-01-01
Objective To estimate the burden of diarrhoeal diseases from exposure to inadequate water, sanitation and hand hygiene in low- and middle-income settings and provide an overview of the impact on other diseases. Methods For estimating the impact of water, sanitation and hygiene on diarrhoea, we selected exposure levels with both sufficient global exposure data and a matching exposure-risk relationship. Global exposure data were estimated for the year 2012, and risk estimates were taken from the most recent systematic analyses. We estimated attributable deaths and disability-adjusted life years (DALYs) by country, age and sex for inadequate water, sanitation and hand hygiene separately, and as a cluster of risk factors. Uncertainty estimates were computed on the basis of uncertainty surrounding exposure estimates and relative risks. Results In 2012, 502 000 diarrhoea deaths were estimated to be caused by inadequate drinking water and 280 000 deaths by inadequate sanitation. The most likely estimate of disease burden from inadequate hand hygiene amounts to 297 000 deaths. In total, 842 000 diarrhoea deaths are estimated to be caused by this cluster of risk factors, which amounts to 1.5% of the total disease burden and 58% of diarrhoeal diseases. In children under 5 years old, 361 000 deaths could be prevented, representing 5.5% of deaths in that age group. Conclusions This estimate confirms the importance of improving water and sanitation in low- and middle-income settings for the prevention of diarrhoeal disease burden. It also underscores the need for better data on exposure and risk reductions that can be achieved with provision of reliable piped water, community sewage with treatment and hand hygiene. PMID:24779548
Prüss-Ustün, Annette; Bartram, Jamie; Clasen, Thomas; Colford, John M; Cumming, Oliver; Curtis, Valerie; Bonjour, Sophie; Dangour, Alan D; De France, Jennifer; Fewtrell, Lorna; Freeman, Matthew C; Gordon, Bruce; Hunter, Paul R; Johnston, Richard B; Mathers, Colin; Mäusezahl, Daniel; Medlicott, Kate; Neira, Maria; Stocks, Meredith; Wolf, Jennyfer; Cairncross, Sandy
2014-08-01
To estimate the burden of diarrhoeal diseases from exposure to inadequate water, sanitation and hand hygiene in low- and middle-income settings and provide an overview of the impact on other diseases. For estimating the impact of water, sanitation and hygiene on diarrhoea, we selected exposure levels with both sufficient global exposure data and a matching exposure-risk relationship. Global exposure data were estimated for the year 2012, and risk estimates were taken from the most recent systematic analyses. We estimated attributable deaths and disability-adjusted life years (DALYs) by country, age and sex for inadequate water, sanitation and hand hygiene separately, and as a cluster of risk factors. Uncertainty estimates were computed on the basis of uncertainty surrounding exposure estimates and relative risks. In 2012, 502,000 diarrhoea deaths were estimated to be caused by inadequate drinking water and 280,000 deaths by inadequate sanitation. The most likely estimate of disease burden from inadequate hand hygiene amounts to 297,000 deaths. In total, 842,000 diarrhoea deaths are estimated to be caused by this cluster of risk factors, which amounts to 1.5% of the total disease burden and 58% of diarrhoeal diseases. In children under 5 years old, 361,000 deaths could be prevented, representing 5.5% of deaths in that age group. This estimate confirms the importance of improving water and sanitation in low- and middle-income settings for the prevention of diarrhoeal disease burden. It also underscores the need for better data on exposure and risk reductions that can be achieved with provision of reliable piped water, community sewage with treatment and hand hygiene. © 2014 The Authors. Tropical Medicine and International Health published by John Wiley & Sons Ltd.
An improved clustering algorithm based on reverse learning in intelligent transportation
NASA Astrophysics Data System (ADS)
Qiu, Guoqing; Kou, Qianqian; Niu, Ting
2017-05-01
With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.
Dynamical mass estimates in M13
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leonard, P.J.T.; Richer, H.B.; Fahlman, G.G.
We have used the proper motion data of Cudworth Monet to make mass estimates in the globular cluster M13 by solving the spherical Jeans equation. We find a mass inside a spherical shell centered on the cluster with a radius corresponding to 390 arcsec on the sky of 5.5 or 7.6 {times} 10{sup 5} M{circle dot}, depending on the adopted cluster distance. This large dynamical mass estimate together with the observed fact that the mass function of M13 is rising steeply at the low-mass end suggest that much of the cluster mass may be in the form of low-mass starsmore » and brown dwarfs.« less
Dynamical mass estimates in M13
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leonard, P.J.T.; Richer, H.B.; Fahlman, G.G.
We have used the proper motion data of Cudworth Monet to make mass estimates in the globular cluster M13 by solving the spherical Jeans equation. We find a mass inside a spherical shell centered on the cluster with a radius corresponding to 390 arcsec on the sky of 5.5 or 7.6 {times} 10{sup 5} M{circle_dot}, depending on the adopted cluster distance. This large dynamical mass estimate together with the observed fact that the mass function of M13 is rising steeply at the low-mass end suggest that much of the cluster mass may be in the form of low-mass stars andmore » brown dwarfs.« less
Cross-correlation of weak lensing and gamma rays: implications for the nature of dark matter
NASA Astrophysics Data System (ADS)
Tröster, Tilman; Camera, Stefano; Fornasa, Mattia; Regis, Marco; van Waerbeke, Ludovic; Harnois-Déraps, Joachim; Ando, Shin'ichiro; Bilicki, Maciej; Erben, Thomas; Fornengo, Nicolao; Heymans, Catherine; Hildebrandt, Hendrik; Hoekstra, Henk; Kuijken, Konrad; Viola, Massimo
2017-05-01
We measure the cross-correlation between Fermi gamma-ray photons and over 1000 deg2 of weak lensing data from the Canada-France-Hawaii Telescope Lensing Survey (CFHTLenS), the Red Cluster Sequence Lensing Survey (RCSLenS), and the Kilo Degree Survey (KiDS). We present the first measurement of tomographic weak lensing cross-correlations and the first application of spectral binning to cross-correlations between gamma rays and weak lensing. The measurements are performed using an angular power spectrum estimator while the covariance is estimated using an analytical prescription. We verify the accuracy of our covariance estimate by comparing it to two internal covariance estimators. Based on the non-detection of a cross-correlation signal, we derive constraints on weakly interacting massive particle (WIMP) dark matter. We compute exclusion limits on the dark matter annihilation cross-section <σannv>, decay rate Γdec and particle mass mDM. We find that in the absence of a cross-correlation signal, tomography does not significantly improve the constraining power of the analysis. Assuming a strong contribution to the gamma-ray flux due to small-scale clustering of dark matter and accounting for known astrophysical sources of gamma rays, we exclude the thermal relic cross-section for particle masses of mDM ≲ 20 GeV.
Anchoring the Population II Distance Scale: Accurate Ages for Globular Clusters
NASA Technical Reports Server (NTRS)
Chaboyer, Brian C.; Chaboyer, Brian C.; Carney, Bruce W.; Latham, David W.; Dunca, Douglas; Grand, Terry; Layden, Andy; Sarajedini, Ataollah; McWilliam, Andrew; Shao, Michael
2004-01-01
The metal-poor stars in the halo of the Milky Way galaxy were among the first objects formed in our Galaxy. These Population II stars are the oldest objects in the universe whose ages can be accurately determined. Age determinations for these stars allow us to set a firm lower limit, to the age of the universe and to probe the early formation history of the Milky Way. The age of the universe determined from studies of Population II stars may be compared to the expansion age of the universe and used to constrain cosmological models. The largest uncertainty in estimates for the ages of stars in our halo is due to the uncertainty in the distance scale to Population II objects. We propose to obtain accurate parallaxes to a number of Population II objects (globular clusters and field stars in the halo) resulting in a significant improvement in the Population II distance scale and greatly reducing the uncertainty in the estimated ages of the oldest stars in our galaxy. At the present time, the oldest stars are estimated to be 12.8 Gyr old, with an uncertainty of approx. 15%. The SIM observations obtained by this key project, combined with the supporting theoretical research and ground based observations outlined in this proposal will reduce the estimated uncertainty in the age estimates to 5%).
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Integrated-light spectroscopy of globular clusters at the infrared Ca II lines
NASA Technical Reports Server (NTRS)
Armandroff, Taft E.; Zinn, Robert
1988-01-01
Integrated-light spectroscopy has been obtained for 27 globular clusters at the Ca II IR triplet. Line strengths and radial velocities have been measured from the spectra. For the well-studied clusters in the sample, the strength of the Ca II lines is very well correlated with previous metallicity estimates. Thus, the triplet is useful as a metallicity indicator in globular cluster integrated-light spectra. The greatly reduced effect of interstellar extinction at these wavelengths (compared to the blue region of the spectrum) has permitted observations of some of the most heavily reddened clusters in the Galaxy. For several such clusters, the Ca II triplet metallicities are in poor agreement with metallicity estimates from IR photometry by Malkan (1981). The strength of an interstellar band at 8621A has been used to estimate the amount of extinction towards these clusters. Using the new metallicity and radial-velocity data, the metallicity distribution, kinematics, and spatial distribution of the disk globular cluster system have been analyzed. Results very similar to those of Zinn (1985) have been found. The relation of the disk globulars to the stellar thick disk is discussed.
Sampling procedures for inventory of commercial volume tree species in Amazon Forest.
Netto, Sylvio P; Pelissari, Allan L; Cysneiros, Vinicius C; Bonazza, Marcelo; Sanquetta, Carlos R
2017-01-01
The spatial distribution of tropical tree species can affect the consistency of the estimators in commercial forest inventories, therefore, appropriate sampling procedures are required to survey species with different spatial patterns in the Amazon Forest. For this, the present study aims to evaluate the conventional sampling procedures and introduce the adaptive cluster sampling for volumetric inventories of Amazonian tree species, considering the hypotheses that the density, the spatial distribution and the zero-plots affect the consistency of the estimators, and that the adaptive cluster sampling allows to obtain more accurate volumetric estimation. We use data from a census carried out in Jamari National Forest, Brazil, where trees with diameters equal to or higher than 40 cm were measured in 1,355 plots. Species with different spatial patterns were selected and sampled with simple random sampling, systematic sampling, linear cluster sampling and adaptive cluster sampling, whereby the accuracy of the volumetric estimation and presence of zero-plots were evaluated. The sampling procedures applied to species were affected by the low density of trees and the large number of zero-plots, wherein the adaptive clusters allowed concentrating the sampling effort in plots with trees and, thus, agglutinating more representative samples to estimate the commercial volume.
Long-Period Planets in Open Clusters and the Evolution of Planetary Systems
NASA Astrophysics Data System (ADS)
Quinn, Samuel N.; White, Russel; Latham, David W.; Stefanik, Robert
2018-01-01
Recent discoveries of giant planets in open clusters confirm that they do form and migrate in relatively dense stellar groups, though overall occurrence rates are not yet well constrained because the small sample of giant planets discovered thus far predominantly have short periods. Moreover, planet formation rates and the architectures of planetary systems in clusters may vary significantly -- e.g., due to intercluster differences in the chemical properties that regulate the growth of planetary embryos or in the stellar space density and binary populations, which can influence the dynamical evolution of planetary systems. Constraints on the population of long-period Jovian planets -- those representing the reservoir from which many hot Jupiters likely form, and which are most vulnerable to intracluster dynamical interactions -- can help quantify how the birth environment affects formation and evolution, particularly through comparison of populations possessing a range of ages and chemical and dynamical properties. From our ongoing RV survey of open clusters, we present the discovery of several long-period planets and candidate substellar companions in the Praesepe, Coma Berenices, and Hyades open clusters. From these discoveries, we improve estimates of giant planet occurrence rates in clusters, and we note that high eccentricities in several of these systems support the prediction that the birth environment helps shape planetary system architectures.
MASSCLEANage-STELLAR CLUSTER AGES FROM INTEGRATED COLORS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Popescu, Bogdan; Hanson, M. M., E-mail: popescb@mail.uc.ed, E-mail: margaret.hanson@uc.ed
2010-11-20
We present the recently updated and expanded MASSCLEANcolors, a database of 70 million Monte Carlo models selected to match the properties (metallicity, ages, and masses) of stellar clusters found in the Large Magellanic Cloud (LMC). This database shows the rather extreme and non-Gaussian distribution of integrated colors and magnitudes expected with different cluster age and mass and the enormous age degeneracy of integrated colors when mass is unknown. This degeneracy could lead to catastrophic failures in estimating age with standard simple stellar population models, particularly if most of the clusters are of intermediate or low mass, like in the LMC.more » Utilizing the MASSCLEANcolors database, we have developed MASSCLEANage, a statistical inference package which assigns the most likely age and mass (solved simultaneously) to a cluster based only on its integrated broadband photometric properties. Finally, we use MASSCLEANage to derive the age and mass of LMC clusters based on integrated photometry alone. First, we compare our cluster ages against those obtained for the same seven clusters using more accurate integrated spectroscopy. We find improved agreement with the integrated spectroscopy ages over the original photometric ages. A close examination of our results demonstrates the necessity of solving simultaneously for mass and age to reduce degeneracies in the cluster ages derived via integrated colors. We then selected an additional subset of 30 photometric clusters with previously well-constrained ages and independently derive their age using the MASSCLEANage with the same photometry with very good agreement. The MASSCLEANage program is freely available under GNU General Public License.« less
The First Photometric Analysis of the Open Clusters Dolidze 32 and 36
NASA Astrophysics Data System (ADS)
Amin, M. Y.; Elsanhory, W. H.; Haroon, A. A.
2018-06-01
We present a first study of two open clusters Dolidze 32 and Dolidze 36 in the near-infrared region JHKs with the aid of PPMXL catalog. In our study, we used a method able to separate open cluster stars from those that belong to the stellar background. Our results of calculations indicate that for both cluster Dolidze 32 and Dolidze 36 the number of probable member is 286 and 780, respectively. We have estimated the cluster center for Dolidze 32 and Dolidze 36 are α = 18h41m4s.188 , δ = -04°04'57''.144 , α = 20h02m29s.95 , δ = 42°05'49''.2 , respectively. The limiting radius for both clusters Dolidze 32 and Dolidze 36 is about 0.94 ± 0.03 pc and 0.81 ± 0.03 pc, respectively. The Color Magnitude Diagram allows us to estimate the reddening E(B - V) = 1.41 ± 0.03 mag. for Dolidze 32 and E(B - V) = 0.19 ± 0.04 mag. for Dolidze 36 in such a way that the distance modulus (m - M) is 11.36 ± 0.02 and 10.10 ± 0.03 for both clusters, respectively. On the other hand, the luminosity and mass functions of these two open clusters, Dolidze 32 and Dolidze 36, have been estimated, showing that the estimated masses are 437 ± 21 M⊙ and 678 ± 26 M⊙, respectively, while the mass function slopes are -2.56 ± 0.62 and -2.01 ± 0.70 for Dolidze 32 and Dolidze 36, respectively. Finally, the dynamical state of these two clusters shows that only Dolidze 36 can be considered as a dynamically relaxed cluster.
Iterative Track Fitting Using Cluster Classification in Multi Wire Proportional Chamber
NASA Astrophysics Data System (ADS)
Primor, David; Mikenberg, Giora; Etzion, Erez; Messer, Hagit
2007-10-01
This paper addresses the problem of track fitting of a charged particle in a multi wire proportional chamber (MWPC) using cathode readout strips. When a charged particle crosses a MWPC, a positive charge is induced on a cluster of adjacent strips. In the presence of high radiation background, the cluster charge measurements may be contaminated due to background particles, leading to less accurate hit position estimation. The least squares method for track fitting assumes the same position error distribution for all hits and thus loses its optimal properties on contaminated data. For this reason, a new robust algorithm is proposed. The algorithm first uses the known spatial charge distribution caused by a single charged particle over the strips, and classifies the clusters into ldquocleanrdquo and ldquodirtyrdquo clusters. Then, using the classification results, it performs an iterative weighted least squares fitting procedure, updating its optimal weights each iteration. The performance of the suggested algorithm is compared to other track fitting techniques using a simulation of tracks with radiation background. It is shown that the algorithm improves the track fitting performance significantly. A practical implementation of the algorithm is presented for muon track fitting in the cathode strip chamber (CSC) of the ATLAS experiment.
Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M
2018-06-01
Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Clustering and Filtering Tandem Mass Spectra Acquired in Data-Independent Mode
NASA Astrophysics Data System (ADS)
Pak, Huisong; Nikitin, Frederic; Gluck, Florent; Lisacek, Frederique; Scherl, Alexander; Muller, Markus
2013-12-01
Data-independent mass spectrometry activates all ion species isolated within a given mass-to-charge window ( m/z) regardless of their abundance. This acquisition strategy overcomes the traditional data-dependent ion selection boosting data reproducibility and sensitivity. However, several tandem mass (MS/MS) spectra of the same precursor ion are acquired during chromatographic elution resulting in large data redundancy. Also, the significant number of chimeric spectra and the absence of accurate precursor ion masses hamper peptide identification. Here, we describe an algorithm to preprocess data-independent MS/MS spectra by filtering out noise peaks and clustering the spectra according to both the chromatographic elution profiles and the spectral similarity. In addition, we developed an approach to estimate the m/z value of precursor ions from clustered MS/MS spectra in order to improve database search performance. Data acquired using a small 3 m/z units precursor mass window and multiple injections to cover a m/z range of 400-1400 was processed with our algorithm. It showed an improvement in the number of both peptide and protein identifications by 8 % while reducing the number of submitted spectra by 18 % and the number of peaks by 55 %. We conclude that our clustering method is a valid approach for data analysis of these data-independent fragmentation spectra. The software including the source code is available for the scientific community.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Muscle and eye movement artifact removal prior to EEG source localization.
Hallez, Hans; Vergult, Anneleen; Phlypo, Ronald; Van Hese, Peter; De Clercq, Wim; D'Asseler, Yves; Van de Walle, Rik; Vanrumste, Bart; Van Paesschen, Wim; Van Huffel, Sabine; Lemahieu, Ignace
2006-01-01
Muscle and eye movement artifacts are very prominent in the ictal EEG of patients suffering from epilepsy, thus making the dipole localization of ictal activity very unreliable. Recently, two techniques (BSS-CCA and pSVD) were developed to remove those artifacts. The purpose of this study is to assess whether the removal of muscle and eye movement artifacts improves the EEG dipole source localization. We used a total of 8 EEG fragments, each from another patient, first unfiltered, then filtered by the BSS-CCA and pSVD. In both the filtered and unfiltered EEG fragments we estimated multiple dipoles using RAP-MUSIC. The resulting dipoles were subjected to a K-means clustering algorithm, to extract the most prominent cluster. We found that the removal of muscle and eye artifact results to tighter and more clear dipole clusters. Furthermore, we found that localization of the filtered EEG corresponded with the localization derived from the ictal SPECT in 7 of the 8 patients. Therefore, we can conclude that the BSS-CCA and pSVD improve localization of ictal activity, thus making the localization more reliable for the presurgical evaluation of the patient.
NASA Astrophysics Data System (ADS)
Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin; Gong, Yanjun
2017-11-01
In this paper, diffusion limited aggregation (DLA) algorithm is improved to generate the alumina particle cluster with different radius of monomers in the plume. Scattering properties of these alumina clusters are solved by the multiple sphere T matrix method (MSTM). The effect of the number and radius of monomers on the scattering properties of clusters of alumina particles is discussed. The scattering properties of two types of alumina particle clusters are compared, one has different radius of monomers that follows lognormal probability distribution, another has the same radius of monomers that equals the mean of lognormal probability distribution. The result show that the scattering phase functions and linear polarization degrees of these two types of alumina particle clusters are of great differences. For the alumina clusters with different radius of monomers, the forward scatterings are bigger and the linear polarization degree has multiple peaks. Moreover, the vary of their scattering properties do not have strong correlative with the change of number of monomers. For larger booster motors, 25-38% of the plume being condensed alumina. The alumina can scatter radiation from other sources present in the plume and effect on radiation transfer characteristics of plume. In addition, the shape, size distribution and refractive index of the particles in the plume are estimated by linear polarization degree. Therefore, accurate scattering properties calculation is very important to decrease the deviation in the related research.
Methods in Computational Cosmology
NASA Astrophysics Data System (ADS)
Vakili, Mohammadjavad
State of the inhomogeneous universe and its geometry throughout cosmic history can be studied by measuring the clustering of galaxies and the gravitational lensing of distant faint galaxies. Lensing and clustering measurements from large datasets provided by modern galaxy surveys will forever shape our understanding of the how the universe expands and how the structures grow. Interpretation of these rich datasets requires careful characterization of uncertainties at different stages of data analysis: estimation of the signal, estimation of the signal uncertainties, model predictions, and connecting the model to the signal through probabilistic means. In this thesis, we attempt to address some aspects of these challenges. The first step in cosmological weak lensing analyses is accurate estimation of the distortion of the light profiles of galaxies by large scale structure. These small distortions, known as the cosmic shear signal, are dominated by extra distortions due to telescope optics and atmosphere (in the case of ground-based imaging). This effect is captured by a kernel known as the Point Spread Function (PSF) that needs to be fully estimated and corrected for. We address two challenges a head of accurate PSF modeling for weak lensing studies. The first challenge is finding the centers of point sources that are used for empirical estimation of the PSF. We show that the approximate methods for centroiding stars in wide surveys are able to optimally saturate the information content that is retrievable from astronomical images in the presence of noise. The fist step in weak lensing studies is estimating the shear signal by accurately measuring the shapes of galaxies. Galaxy shape measurement involves modeling the light profile of galaxies convolved with the light profile of the PSF. Detectors of many space-based telescopes such as the Hubble Space Telescope (HST) sample the PSF with low resolution. Reliable weak lensing analysis of galaxies observed by the HST camera requires knowledge of the PSF at a resolution higher than the pixel resolution of HST. This PSF is called the super-resolution PSF. In particular, we present a forward model of the point sources imaged through filters of the HST WFC3 IR channel. We show that this forward model can accurately estimate the super-resolution PSF. We also introduce a noise model that permits us to robustly analyze the HST WFC3 IR observations of the crowded fields. Then we try to address one of the theoretical uncertainties in modeling of galaxy clustering on small scales. Study of small scale clustering requires assuming a halo model. Clustering of halos has been shown to depend on halo properties beyond mass such as halo concentration, a phenomenon referred to as assembly bias. Standard large-scale structure studies with halo occupation distribution (HOD) assume that halo mass alone is sufficient to characterize the connection between galaxies and halos. However, assembly bias could cause the modeling of galaxy clustering to face systematic effects if the expected number of galaxies in halos is correlated with other halo properties. Using high resolution N-body simulations and the clustering measurements of Sloan Digital Sky Survey (SDSS) DR7 main galaxy sample, we show that modeling of galaxy clustering can slightly improve if we allow the HOD model to depend on halo properties beyond mass. One of the key ingredients in precise parameter inference using galaxy clustering is accurate estimation of the error covariance matrix of clustering measurements. This requires generation of many independent galaxy mock catalogs that accurately describe the statistical distribution of galaxies in a wide range of physical scales. We present a fast and accurate method based on low-resolution N-body simulations and an empirical bias model for generating mock catalogs. We use fast particle mesh gravity solvers for generation of dark matter density field and we use Markov Chain Monti Carlo (MCMC) to estimate the bias model that connects dark matter to galaxies. We show that this approach enables the fast generation of mock catalogs that recover clustering at a percent-level accuracy down to quasi-nonlinear scales. Cosmological datasets are interpreted by specifying likelihood functions that are often assumed to be multivariate Gaussian. Likelihood free approaches such as Approximate Bayesian Computation (ABC) can bypass this assumption by introducing a generative forward model of the data and a distance metric for quantifying the closeness of the data and the model. We present the first application of ABC in large scale structure for constraining the connections between galaxies and dark matter halos. We present an implementation of ABC equipped with Population Monte Carlo and a generative forward model of the data that incorporates sample variance and systematic uncertainties. (Abstract shortened by ProQuest.).
Cool Core Bias in Sunyaev-Zel’dovich Galaxy Cluster Surveys
Lin, Henry W.; McDonald, Michael; Benson, Bradford; ...
2015-03-18
Sunyaev-Zeldovich (SZ) surveys find massive clusters of galaxies by measuring the inverse Compton scattering of cosmic microwave background off of intra-cluster gas. The cluster selection function from such surveys is expected to be nearly independent of redshift and cluster astrophysics. In this work, we estimate the effect on the observed SZ signal of centrally-peaked gas density profiles (cool cores) and radio emission from the brightest cluster galaxy (BCG) by creating mock observations of a sample of clusters that span the observed range of classical cooling rates and radio luminosities. For each cluster, we make simulated SZ observations by the Southmore » Pole Telescope and characterize the cluster selection function, but note that our results are broadly applicable to other SZ surveys. We find that the inclusion of a cool core can cause a change in the measured SPT significance of a cluster between 0.01%–10% at z > 0.3, increasing with cuspiness of the cool core and angular size on the sky of the cluster (i.e., decreasing redshift, increasing mass). We provide quantitative estimates of the bias in the SZ signal as a function of a gas density cuspiness parameter, redshift, mass, and the 1.4 GHz radio luminosity of the central AGN. Based on this work, we estimate that, for the Phoenix cluster (one of the strongest cool cores known), the presence of a cool core is biasing the SZ significance high by ~6%. The ubiquity of radio galaxies at the centers of cool core clusters will offset the cool core bias to varying degrees« less
Ages of intermediate-age Magellanic Cloud star clusters
NASA Technical Reports Server (NTRS)
Flower, P. J.
1984-01-01
Ages of intermediate-age Large Magellanic Cloud star clusters have been estimated without locating the faint, unevolved portion of cluster main sequences. Six clusters with established color-magnitude diagrams were selected for study: SL 868, NGC 1783, NGC 1868, NGC 2121, NGC 2209, and NGC 2231. Since red giant photometry is more accurate than the necessarily fainter main-sequence photometry, the distributions of red giants on the cluster color-magnitude diagrams were compared to a grid of 33 stellar evolutionary tracks, evolved from the main sequence through core-helium exhaustion, spanning the expected mass and metallicity range for Magellanic Cloud cluster red giants. The time-dependent behavior of the luminosity of the model red giants was used to estimate cluster ages from the observed cluster red giant luminosities. Except for the possibility of SL 868 being an old globular cluster, all clusters studied were found to have ages less than 10 to the 9th yr. It is concluded that there is currently no substantial evidence for a major cluster population of large, populous clusters greater than 10 to the 9th yr old in the Large Magellanic Cloud.
D-DSC: Decoding Delay-based Distributed Source Coding for Internet of Sensing Things
Akan, Ozgur B.
2018-01-01
Spatial correlation between densely deployed sensor nodes in a wireless sensor network (WSN) can be exploited to reduce the power consumption through a proper source coding mechanism such as distributed source coding (DSC). In this paper, we propose the Decoding Delay-based Distributed Source Coding (D-DSC) to improve the energy efficiency of the classical DSC by employing the decoding delay concept which enables the use of the maximum correlated portion of sensor samples during the event estimation. In D-DSC, network is partitioned into clusters, where the clusterheads communicate their uncompressed samples carrying the side information, and the cluster members send their compressed samples. Sink performs joint decoding of the compressed and uncompressed samples and then reconstructs the event signal using the decoded sensor readings. Based on the observed degree of the correlation among sensor samples, the sink dynamically updates and broadcasts the varying compression rates back to the sensor nodes. Simulation results for the performance evaluation reveal that D-DSC can achieve reliable and energy-efficient event communication and estimation for practical signal detection/estimation applications having massive number of sensors towards the realization of Internet of Sensing Things (IoST). PMID:29538405
D-DSC: Decoding Delay-based Distributed Source Coding for Internet of Sensing Things.
Aktas, Metin; Kuscu, Murat; Dinc, Ergin; Akan, Ozgur B
2018-01-01
Spatial correlation between densely deployed sensor nodes in a wireless sensor network (WSN) can be exploited to reduce the power consumption through a proper source coding mechanism such as distributed source coding (DSC). In this paper, we propose the Decoding Delay-based Distributed Source Coding (D-DSC) to improve the energy efficiency of the classical DSC by employing the decoding delay concept which enables the use of the maximum correlated portion of sensor samples during the event estimation. In D-DSC, network is partitioned into clusters, where the clusterheads communicate their uncompressed samples carrying the side information, and the cluster members send their compressed samples. Sink performs joint decoding of the compressed and uncompressed samples and then reconstructs the event signal using the decoded sensor readings. Based on the observed degree of the correlation among sensor samples, the sink dynamically updates and broadcasts the varying compression rates back to the sensor nodes. Simulation results for the performance evaluation reveal that D-DSC can achieve reliable and energy-efficient event communication and estimation for practical signal detection/estimation applications having massive number of sensors towards the realization of Internet of Sensing Things (IoST).
ERIC Educational Resources Information Center
Starkey, Leighann; Aber, J. Lawrence; Johnston, Brian M.
2014-01-01
Mastering basic numeracy and literacy skills is one of the most fundamental goals of education. However, it is estimated that 250 million primary-school-age children lack basic reading, writing and math skills (UN, 2013). Children living in war and poverty stricken countries are among the least likely to attain those basic goals. The United States…
Pezzoli, Lorenzo; Pineda, Silvia; Halkyer, Percy; Crespo, Gladys; Andrews, Nick; Ronveaux, Olivier
2009-03-01
To estimate the yellow fever (YF) vaccine coverage for the endemic and non-endemic areas of Bolivia and to determine whether selected districts had acceptable levels of coverage (>70%). We conducted two surveys of 600 individuals (25 x 12 clusters) to estimate coverage in the endemic and non-endemic areas. We assessed 11 districts using lot quality assurance sampling (LQAS). The lot (district) sample was 35 individuals with six as decision value (alpha error 6% if true coverage 70%; beta error 6% if true coverage 90%). To increase feasibility, we divided the lots into five clusters of seven individuals; to investigate the effect of clustering, we calculated alpha and beta by conducting simulations where each cluster's true coverage was sampled from a normal distribution with a mean of 70% or 90% and standard deviations of 5% or 10%. Estimated coverage was 84.3% (95% CI: 78.9-89.7) in endemic areas, 86.8% (82.5-91.0) in non-endemic and 86.0% (82.8-89.1) nationally. LQAS showed that four lots had unacceptable coverage levels. In six lots, results were inconsistent with the estimated administrative coverage. The simulations suggested that the effect of clustering the lots is unlikely to have significantly increased the risk of making incorrect accept/reject decisions. Estimated YF coverage was high. Discrepancies between administrative coverage and LQAS results may be due to incorrect population data. Even allowing for clustering in LQAS, the statistical errors would remain low. Catch-up campaigns are recommended in districts with unacceptable coverage.
Galway, Lp; Bell, Nathaniel; Sae, Al Shatari; Hagopian, Amy; Burnham, Gilbert; Flaxman, Abraham; Weiss, Wiliam M; Rajaratnam, Julie; Takaro, Tim K
2012-04-27
Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.
2012-01-01
Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS) to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings. PMID:22540266
van der Ham, Joris L
2016-05-19
Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Kalia, Sumeet; Klar, Neil; Donner, Allan
2016-12-30
Cluster randomized trials (CRTs) involve the random assignment of intact social units rather than independent subjects to intervention groups. Time-to-event outcomes often are endpoints in CRTs. Analyses of such data need to account for the correlation among cluster members. The intracluster correlation coefficient (ICC) is used to assess the similarity among binary and continuous outcomes that belong to the same cluster. However, estimating the ICC in CRTs with time-to-event outcomes is a challenge because of the presence of censored observations. The literature suggests that the ICC may be estimated using either censoring indicators or observed event times. A simulation study explores the effect of administrative censoring on estimating the ICC. Results show that ICC estimators derived from censoring indicators or observed event times are negatively biased. Analytic work further supports these results. Observed event times are preferred to estimate the ICC under minimum frequency of administrative censoring. To our knowledge, the existing literature provides no practical guidance on the estimation of ICC when substantial amount of administrative censoring is present. The results from this study corroborate the need for further methodological research on estimating the ICC for correlated time-to-event outcomes. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Asquith, William H.; Slade, R.M.
1999-01-01
The U.S. Geological Survey, in cooperation with the Texas Department of Transportation, has developed a computer program to estimate peak-streamflow frequency for ungaged sites in natural basins in Texas. Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and also for the desin of culverts, dams, levees, and other flood-control structures. The program estimates peak-streamflow frequency using a site-specific approach and a multivariate generalized least-squares linear regression. A site-specific approach differs from a traditional regional regression approach by developing unique equations to estimate peak-streamflow frequency specifically for the ungaged site. The stations included in the regression are selected using an informal cluster analysis that compares the basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. The program provides several choices for selecting the stations. Selecting the stations using cluster analysis ensures that the stations included in the regression will have the most pertinent information about flooding characteristics of the ungaged site and therefore provide the basis for potentially improved peak-streamflow frequency estimation. An evaluation of the site-specific approach in estimating peak-streamflow frequency for gaged sites indicates that the site-specific approach is at least as accurate as a traditional regional regression approach.
A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks.
Gui, Jinsong; Zhou, Kai; Xiong, Naixue
2016-09-25
Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude.
A Cluster-Based Dual-Adaptive Topology Control Approach in Wireless Sensor Networks
Gui, Jinsong; Zhou, Kai; Xiong, Naixue
2016-01-01
Multi-Input Multi-Output (MIMO) can improve wireless network performance. Sensors are usually single-antenna devices due to the high hardware complexity and cost, so several sensors are used to form virtual MIMO array, which is a desirable approach to efficiently take advantage of MIMO gains. Also, in large Wireless Sensor Networks (WSNs), clustering can improve the network scalability, which is an effective topology control approach. The existing virtual MIMO-based clustering schemes do not either fully explore the benefits of MIMO or adaptively determine the clustering ranges. Also, clustering mechanism needs to be further improved to enhance the cluster structure life. In this paper, we propose an improved clustering scheme for virtual MIMO-based topology construction (ICV-MIMO), which can determine adaptively not only the inter-cluster transmission modes but also the clustering ranges. Through the rational division of cluster head function and the optimization of cluster head selection criteria and information exchange process, the ICV-MIMO scheme effectively reduces the network energy consumption and improves the lifetime of the cluster structure when compared with the existing typical virtual MIMO-based scheme. Moreover, the message overhead and time complexity are still in the same order of magnitude. PMID:27681731
Prediction of Solvent Physical Properties using the Hierarchical Clustering Method
Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...
Hripcsak, George; Knirsch, Charles; Zhou, Li; Wilcox, Adam; Melton, Genevieve B
2007-03-01
Data mining in electronic medical records may facilitate clinical research, but much of the structured data may be miscoded, incomplete, or non-specific. The exploitation of narrative data using natural language processing may help, although nesting, varying granularity, and repetition remain challenges. In a study of community-acquired pneumonia using electronic records, these issues led to poor classification. Limiting queries to accurate, complete records led to vastly reduced, possibly biased samples. We exploited knowledge latent in the electronic records to improve classification. A similarity metric was used to cluster cases. We defined discordance as the degree to which cases within a cluster give different answers for some query that addresses a classification task of interest. Cases with higher discordance are more likely to be incorrectly classified, and can be reviewed manually to adjust the classification, improve the query, or estimate the likely accuracy of the query. In a study of pneumonia--in which the ICD9-CM coding was found to be very poor--the discordance measure was statistically significantly correlated with classification correctness (.45; 95% CI .15-.62).
Jiang, Shenghang; Park, Seongjin; Challapalli, Sai Divya; Fei, Jingyi; Wang, Yong
2017-01-01
We report a robust nonparametric descriptor, J′(r), for quantifying the density of clustering molecules in single-molecule localization microscopy. J′(r), based on nearest neighbor distribution functions, does not require any parameter as an input for analyzing point patterns. We show that J′(r) displays a valley shape in the presence of clusters of molecules, and the characteristics of the valley reliably report the clustering features in the data. Most importantly, the position of the J′(r) valley (rJm′) depends exclusively on the density of clustering molecules (ρc). Therefore, it is ideal for direct estimation of the clustering density of molecules in single-molecule localization microscopy. As an example, this descriptor was applied to estimate the clustering density of ptsG mRNA in E. coli bacteria. PMID:28636661
Estimating metallicities with isochrone fits to photometric data of open clusters
NASA Astrophysics Data System (ADS)
Monteiro, H.; Oliveira, A. F.; Dias, W. S.; Caetano, T. C.
2014-10-01
The metallicity is a critical parameter that affects the correct determination of stellar cluster's fundamental characteristics and has important implications in Galactic and Stellar evolution research. Fewer than 10% of the 2174 currently catalogued open clusters have their metallicity determined in the literature. In this work we present a method for estimating the metallicity of open clusters via non-subjective isochrone fitting using the cross-entropy global optimization algorithm applied to UBV photometric data. The free parameters distance, reddening, age, and metallicity are simultaneously determined by the fitting method. The fitting procedure uses weights for the observational data based on the estimation of membership likelihood for each star, which considers the observational magnitude limit, the density profile of stars as a function of radius from the center of the cluster, and the density of stars in multi-dimensional magnitude space. We present results of [Fe/H] for well-studied open clusters based on distinct UBV data sets. The [Fe/H] values obtained in the ten cases for which spectroscopic determinations were available in the literature agree, indicating that our method provides a good alternative to estimating [Fe/H] by using an objective isochrone fitting. Our results show that the typical precision is about 0.1 dex.
Photometric study of open star clusters in II quadrant: Teutsch 1 and Riddle 4
NASA Astrophysics Data System (ADS)
Bisht, D.; Yadav, R. K. S.; Durgapal, A. K.
2016-01-01
We present the broad band UBVI CCD photometry in the region of two open star clusters Teutsch 1 and Riddle 4 located in the second Galactic quadrant. The optical CCD data for these clusters are obtained for the first time. Radii of the clusters are estimated as 3‧.5 for both the clusters. Using two color (U - B) versus (B - V) diagram we determined the reddening as E(B - V) = 0.40 ± 0.05 mag for Teutsch 1 and 1.10 ± 0.05 mag for Riddle 4. Using 2MASS JHK and optical data, we estimated E(J - K) = 0.24 ± 0.05 mag and E(V - K) = 1.40 ± 0.05 mag for Teutsch 1 and E(J - K) = 0.47 ± 0.06 mag and E(V - K) = 2.80 ± 0.06 mag for Riddle 4. Color-excess ratio indicates normal interstellar extinction law in the direction of both the clusters. We estimated distance as 4.3 ± 0.5 Kpc for Teutsch 1 and 2.8 ± 0.2 Kpc for Riddle 4 by comparing the color-magnitude diagram of the clusters with theoretical isochrones. The age of the clusters has been estimated as 200 ± 20 Myr for Teutsch 1 and 40 ± 10 Myr for Riddle 4 using the stellar isochrones of metallicity Z = 0.02 . The Mass function slope has been derived 1.89 ± 0.43 and 1.41 ± 0.70 for Teutsch 1 and Riddle 4 respectively. Our analysis indicates that both the clusters are dynamically relaxed. A slight bend of Galactic disc towards the southern latitude is found in the longitude range l = 130-180°.
Uncertainties in the cluster-cluster correlation function
NASA Astrophysics Data System (ADS)
Ling, E. N.; Frenk, C. S.; Barrow, J. D.
1986-12-01
The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.
A hierarchical clustering methodology for the estimation of toxicity.
Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M
2008-01-01
ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.
Calibrating the Planck cluster mass scale with cluster velocity dispersions
NASA Astrophysics Data System (ADS)
Amodeo, S.; Mei, S.; Stanford, S. A.; Bartlett, J. G.; Lawrence, C. L.; Chary, R. R.; Shim, H.; Marleau, F.; Stern, D.
2017-12-01
The potential of galaxy clusters as cosmological probes critically depends on the capability to obtain accurate estimates of their mass. This will be a key measurement for the next generation of cosmological surveys, such as Euclid. The discrepancy between the cosmological parameters determined from anisotropies in the cosmic microwave background and those derived from cluster abundance measurements from the Planck satellite calls for careful evaluation of systematic biases in cluster mass estimates. For this purpose, it is crucial to use independent techniques, like analysis of the thermal emission of the intracluster medium (ICM), observed either in the X-rays or through the Sunyaev-Zeldovich (SZ) effect, dynamics of member galaxies or gravitational lensing. We discuss possible bias in the Planck SZ mass proxy, which is based on X-ray observations. Using optical spectroscopy from the Gemini Multi-Object Spectrograph of 17 Planck-selected clusters, we present new estimates of the cluster mass based on the velocity dispersion of the member galaxies and independently of the ICM properties. We show how the difference between the velocity dispersion of galaxy and dark matter particles in simulations is the primary factor limiting interpretation of dynamical cluster mass measurements at this time, and we give the first observational constraints on the velocity bias.
The impact of baryons on massive galaxy clusters: halo structure and cluster mass estimates
NASA Astrophysics Data System (ADS)
Henson, Monique A.; Barnes, David J.; Kay, Scott T.; McCarthy, Ian G.; Schaye, Joop
2017-03-01
We use the BAHAMAS (BAryons and HAloes of MAssive Systems) and MACSIS (MAssive ClusterS and Intercluster Structures) hydrodynamic simulations to quantify the impact of baryons on the mass distribution and dynamics of massive galaxy clusters, as well as the bias in X-ray and weak lensing mass estimates. These simulations use the subgrid physics models calibrated in the BAHAMAS project, which include feedback from both supernovae and active galactic nuclei. They form a cluster population covering almost two orders of magnitude in mass, with more than 3500 clusters with masses greater than 1014 M⊙ at z = 0. We start by characterizing the clusters in terms of their spin, shape and density profile, before considering the bias in both weak lensing and hydrostatic mass estimates. Whilst including baryonic effects leads to more spherical, centrally concentrated clusters, the median weak lensing mass bias is unaffected by the presence of baryons. In both the dark matter only and hydrodynamic simulations, the weak lensing measurements underestimate cluster masses by ≈10 per cent for clusters with M200 ≤ 1015 M⊙ and this bias tends to zero at higher masses. We also consider the hydrostatic bias when using both the true density and temperature profiles, and those derived from X-ray spectroscopy. When using spectroscopic temperatures and densities, the hydrostatic bias decreases as a function of mass, leading to a bias of ≈40 per cent for clusters with M500 ≥ 1015 M⊙. This is due to the presence of cooler gas in the cluster outskirts. Using mass weighted temperatures and the true density profile reduces this bias to 5-15 per cent.
Grieve, Richard; Nixon, Richard; Thompson, Simon G
2010-01-01
Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.
Baxter, E. J.; Keisler, R.; Dodelson, S.; ...
2015-06-22
Clusters of galaxies are expected to gravitationally lens the cosmic microwave background (CMB) and thereby generate a distinct signal in the CMB on arcminute scales. Measurements of this effect can be used to constrain the masses of galaxy clusters with CMB data alone. Here we present a measurement of lensing of the CMB by galaxy clusters using data from the South Pole Telescope (SPT). We also develop a maximum likelihood approach to extract the CMB cluster lensing signal and validate the method on mock data. We quantify the effects on our analysis of several potential sources of systematic error andmore » find that they generally act to reduce the best-fit cluster mass. It is estimated that this bias to lower cluster mass is roughly 0.85σ in units of the statistical error bar, although this estimate should be viewed as an upper limit. Furthermore, we apply our maximum likelihood technique to 513 clusters selected via their Sunyaev–Zeldovich (SZ) signatures in SPT data, and rule out the null hypothesis of no lensing at 3.1σ. The lensing-derived mass estimate for the full cluster sample is consistent with that inferred from the SZ flux: M 200,lens = 0.83 +0.38 -0.37 M 200,SZ (68% C.L., statistical error only).« less
Constraints from thermal Sunyaev-Zel'dovich cluster counts and power spectrum combined with CMB
NASA Astrophysics Data System (ADS)
Salvati, Laura; Douspis, Marian; Aghanim, Nabila
2018-06-01
The thermal Sunyaev-Zel'dovich (tSZ) effect is one of the recent probes of cosmology and large-scale structures. We update constraints on cosmological parameters from galaxy clusters observed by the Planck satellite in a first attempt to combine cluster number counts and the power spectrum of hot gas; we used a new value of the optical depth and, at the same time, sampling on cosmological and scaling-relation parameters. We find that in the ΛCDM model, the addition of a tSZ power spectrum provides small improvements with respect to number counts alone, leading to the 68% c.l. constraints Ωm = 0.32 ± 0.02, σ8 = 0.76 ± 0.03, and σ8(Ωm/0.3)1/3 = 0.78 ± 0.03 and lowering the discrepancy with results for cosmic microwave background (CMB) primary anisotropies (updated with the new value of τ) to ≃1.8σ on σ8. We analysed extensions to the standard model, considering the effect of massive neutrinos and varying the equation of state parameter for dark energy. In the first case, we find that the addition of the tSZ power spectrum helps in improving cosmological constraints with respect to number count alone results, leading to the 95% upper limit ∑ mν < 1.88 eV. For the varying dark energy equation of state scenario, we find no important improvements when adding tSZ power spectrum, but still the combination of tSZ probes is able to provide constraints, producing w = -1.0 ± 0.2. In all cosmological scenarios, the mass bias to reconcile CMB and tSZ probes remains low at (1 - b) ≲ 0.67 as compared to estimates from weak lensing and X-ray mass estimate comparisons or numerical simulations.
Minetti, Andrea; Riera-Montes, Margarita; Nackers, Fabienne; Roederer, Thomas; Koudika, Marie Hortense; Sekkenes, Johanne; Taconet, Aurore; Fermon, Florence; Touré, Albouhary; Grais, Rebecca F; Checchi, Francesco
2012-10-12
Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes.
NASA Astrophysics Data System (ADS)
Armitage, Thomas J.; Barnes, David J.; Kay, Scott T.; Bahé, Yannick M.; Dalla Vecchia, Claudio; Crain, Robert A.; Theuns, Tom
2018-03-01
We use the Cluster-EAGLE simulations to explore the velocity bias introduced when using galaxies, rather than dark matter particles, to estimate the velocity dispersion of a galaxy cluster, a property known to be tightly correlated with cluster mass. The simulations consist of 30 clusters spanning a mass range 14.0 ≤ log10(M200 c/M⊙) ≤ 15.4, with their sophisticated subgrid physics modelling and high numerical resolution (subkpc gravitational softening), making them ideal for this purpose. We find that selecting galaxies by their total mass results in a velocity dispersion that is 5-10 per cent higher than the dark matter particles. However, selecting galaxies by their stellar mass results in an almost unbiased (<5 per cent) estimator of the velocity dispersion. This result holds out to z = 1.5 and is relatively insensitive to the choice of cluster aperture, varying by less than 5 per cent between r500 c and r200 m. We show that the velocity bias is a function of the time spent by a galaxy inside the cluster environment. Selecting galaxies by their total mass results in a larger bias because a larger fraction of objects have only recently entered the cluster and these have a velocity bias above unity. Galaxies that entered more than 4 Gyr ago become progressively colder with time, as expected from dynamical friction. We conclude that velocity bias should not be a major issue when estimating cluster masses from kinematic methods.
NASA Astrophysics Data System (ADS)
Clark, D. M.; Eikenberry, S. S.; Brandl, B. R.; Wilson, J. C.; Carson, J. C.; Henderson, C. P.; Hayward, T. L.; Barry, D. J.; Ptak, A. F.; Colbert, E. J. M.
2008-05-01
We use the previously identified 15 infrared star cluster counterparts to X-ray point sources in the interacting galaxies NGC 4038/4039 (the Antennae) to study the relationship between total cluster mass and X-ray binary number. This significant population of X-Ray/IR associations allows us to perform, for the first time, a statistical study of X-ray point sources and their environments. We define a quantity, η, relating the fraction of X-ray sources per unit mass as a function of cluster mass in the Antennae. We compute cluster mass by fitting spectral evolutionary models to Ks luminosity. Considering that this method depends on cluster age, we use four different age distributions to explore the effects of cluster age on the value of η and find it varies by less than a factor of 4. We find a mean value of η for these different distributions of η = 1.7 × 10-8 M-1⊙ with ση = 1.2 × 10-8 M-1⊙. Performing a χ2 test, we demonstrate η could exhibit a positive slope, but that it depends on the assumed distribution in cluster ages. While the estimated uncertainties in η are factors of a few, we believe this is the first estimate made of this quantity to "order of magnitude" accuracy. We also compare our findings to theoretical models of open and globular cluster evolution, incorporating the X-ray binary fraction per cluster.
Towards a comprehensive knowledge of the open cluster Haffner 9
NASA Astrophysics Data System (ADS)
Piatti, Andrés E.
2017-03-01
We turn our attention to Haffner 9, a Milky Way open cluster whose previous fundamental parameter estimates are far from being in agreement. In order to provide with accurate estimates, we present high-quality Washington CT1 and Johnson BVI photometry of the cluster field. We put particular care in statistically cleaning the colour-magnitude diagrams (CMDs) from field star contamination, which was found a common source in previous works for the discordant fundamental parameter estimates. The resulting cluster CMD fiducial features were confirmed from a proper motion membership analysis. Haffner 9 is a moderately young object (age ∼350 Myr), placed in the Perseus arm - at a heliocentric distance of ∼3.2 kpc - , with a lower limit for its present mass of ∼160 M⊙ and of nearly metal solar content. The combination of the cluster structural and fundamental parameters suggest that it is in an advanced stage of internal dynamical evolution, possibly in the phase typical of those with mass segregation in their core regions. However, the cluster still keeps its mass function close to that of the Salpeter's law.
Anders, Katherine L; Cutcher, Zoe; Kleinschmidt, Immo; Donnelly, Christl A; Ferguson, Neil M; Indriani, Citra; O'Neill, Scott L; Jewell, Nicholas P; Simmons, Cameron P
2018-05-07
Cluster randomized trials are the gold standard for assessing efficacy of community-level interventions, such as vector control strategies against dengue. We describe a novel cluster randomized trial methodology with a test-negative design, which offers advantages over traditional approaches. It utilizes outcome-based sampling of patients presenting with a syndrome consistent with the disease of interest, who are subsequently classified as test-positive cases or test-negative controls on the basis of diagnostic testing. We use simulations of a cluster trial to demonstrate validity of efficacy estimates under the test-negative approach. This demonstrates that, provided study arms are balanced for both test-negative and test-positive illness at baseline and that other test-negative design assumptions are met, the efficacy estimates closely match true efficacy. We also briefly discuss analytical considerations for an odds ratio-based effect estimate arising from clustered data, and outline potential approaches to analysis. We conclude that application of the test-negative design to certain cluster randomized trials could increase their efficiency and ease of implementation.
Sample size determination for GEE analyses of stepped wedge cluster randomized trials.
Li, Fan; Turner, Elizabeth L; Preisser, John S
2018-06-19
In stepped wedge cluster randomized trials, intact clusters of individuals switch from control to intervention from a randomly assigned period onwards. Such trials are becoming increasingly popular in health services research. When a closed cohort is recruited from each cluster for longitudinal follow-up, proper sample size calculation should account for three distinct types of intraclass correlations: the within-period, the inter-period, and the within-individual correlations. Setting the latter two correlation parameters to be equal accommodates cross-sectional designs. We propose sample size procedures for continuous and binary responses within the framework of generalized estimating equations that employ a block exchangeable within-cluster correlation structure defined from the distinct correlation types. For continuous responses, we show that the intraclass correlations affect power only through two eigenvalues of the correlation matrix. We demonstrate that analytical power agrees well with simulated power for as few as eight clusters, when data are analyzed using bias-corrected estimating equations for the correlation parameters concurrently with a bias-corrected sandwich variance estimator. © 2018, The International Biometric Society.
Cluster membership probability: polarimetric approach
NASA Astrophysics Data System (ADS)
Medhi, Biman J.; Tamura, Motohide
2013-04-01
Interstellar polarimetric data of the six open clusters Hogg 15, NGC 6611, NGC 5606, NGC 6231, NGC 5749 and NGC 6250 have been used to estimate the membership probability for the stars within them. For proper-motion member stars, the membership probability estimated using the polarimetric data is in good agreement with the proper-motion cluster membership probability. However, for proper-motion non-member stars, the membership probability estimated by the polarimetric method is in total disagreement with the proper-motion cluster membership probability. The inconsistencies in the determined memberships may be because of the fundamental differences between the two methods of determination: one is based on stellar proper motion in space and the other is based on selective extinction of the stellar output by the asymmetric aligned dust grains present in the interstellar medium. The results and analysis suggest that the scatter of the Stokes vectors q (per cent) and u (per cent) for the proper-motion member stars depends on the interstellar and intracluster differential reddening in the open cluster. It is found that this method could be used to estimate the cluster membership probability if we have additional polarimetric and photometric information for a star to identify it as a probable member/non-member of a particular cluster, such as the maximum wavelength value (λmax), the unit weight error of the fit (σ1), the dispersion in the polarimetric position angles (overline{ɛ }), reddening (E(B - V)) or the differential intracluster reddening (ΔE(B - V)). This method could also be used to estimate the membership probability of known member stars having no membership probability as well as to resolve disagreements about membership among different proper-motion surveys.
NASA Astrophysics Data System (ADS)
Molnar, S. M.; Broadhurst, T.
2017-05-01
The colliding cluster, CIZA J2242.8+5301, displays a spectacular, almost 2 Mpc long shock front with a radio based Mach number M≃ 5, that is puzzlingly large compared to the X-ray estimate of M≃ 2.5. The extent to which the X-ray temperature jump is diluted by cooler unshocked gas projected through the cluster currently lacks quantification. Here we apply our self-consistent N-body/hydrodynamical code (based on FLASH) to model this binary cluster encounter. We can account for the location of the shock front and also the elongated X-ray emission by tidal stretching of the gas and dark matter between the two cluster centers. The required total mass is 8.9× {10}14 {M}⊙ with a 1.3:1 mass ratio favoring the southern cluster component. The relative velocity we derive is ≃ 2500 {km} {{{s}}}-1 initially between the two main cluster components, with an impact parameter of 120 kpc. This solution implies that the shock temperature jump derived from the low angular resolution X-ray satellite Suzaku is underestimated by a factor of two, due to cool gas in projection, bringing the observed X-ray and radio estimates into agreement. Finally, we use our model to generate Compton-y maps to estimate the thermal Sunyaev-Zel’dovich (SZ) effect. At 30 GHz, this amounts to {{Δ }}{S}n=-0.072 mJy/arcmin2 and {{Δ }}{S}s=-0.075 mJy/arcmin2 at the locations of the northern and southern shock fronts respectively. Our model estimate agrees with previous empirical estimates that have inferred the measured radio spectra of the radio relics can be significantly affected by the SZ effect, with implications for charged particle acceleration models.
Object tracking algorithm based on the color histogram probability distribution
NASA Astrophysics Data System (ADS)
Li, Ning; Lu, Tongwei; Zhang, Yanduo
2018-04-01
In order to resolve tracking failure resulted from target's being occlusion and follower jamming caused by objects similar to target in the background, reduce the influence of light intensity. This paper change HSV and YCbCr color channel correction the update center of the target, continuously updated image threshold self-adaptive target detection effect, Clustering the initial obstacles is roughly range, shorten the threshold range, maximum to detect the target. In order to improve the accuracy of detector, this paper increased the Kalman filter to estimate the target state area. The direction predictor based on the Markov model is added to realize the target state estimation under the condition of background color interference and enhance the ability of the detector to identify similar objects. The experimental results show that the improved algorithm more accurate and faster speed of processing.
Line tension of a two dimensional gas-liquid interface.
Santra, Mantu; Bagchi, Biman
2009-08-28
In two dimensional (2D) gas-liquid systems, the reported simulation values of line tension are known to disagree with the existing theoretical estimates. We find that while the simulation erred in truncating the range of the interaction potential, and as a result grossly underestimated the actual value, the earlier theoretical calculation was also limited by several approximations. When both the simulation and the theory are improved, we find that the estimate of line tension is in better agreement with each other. The small value of surface tension suggests increased influence of noncircular clusters in 2D gas-liquid nucleation, as indeed observed in a recent simulation.
Wright, John; Bibby, John; Eastham, Joe; Harrison, Stephen; McGeorge, Maureen; Patterson, Chris; Price, Nick; Russell, Daphne; Russell, Ian; Small, Neil; Walsh, Matt; Young, John
2007-02-01
To evaluate clinical and cost effectiveness of implementing evidence-based guidelines for the prevention of stroke. Cluster-randomised trial Three primary care organisations in the North of England covering a population of 400,000. Seventy six primary care teams in four clusters: North, South & West, City I and City II. Guidelines for the management of patients with atrial fibrillation and transient ischaemic attack (TIA) were developed and implemented using a multifaceted approach including evidence-based recommendations, audit and feedback, interactive educational sessions, patient prompts and outreach visits. Identification and appropriate treatment of patients with atrial fibrillation or TIA, and cost effectiveness. Implementation led to 36% increase (95% CI 4% to 78%) in diagnosis of atrial fibrillation, and improved treatment of TIA (odds ratio of complying with guidelines 1.8; 95% CI 1.1 to 2.8). Combined analysis of atrial fibrillation and TIA estimates that compliance was significantly greater (OR 1.46 95% CI 1.10 to 1.94) in the condition for which practices had received the implementation programme. The development and implementation of guidelines cost less than 1500 pounds per practice. The estimated costs per quality-adjusted life year gained by patients with atrial fibrillation or TIA were both less than 2000 pounds, very much less than the usual criterion for cost effectiveness. Implementation of evidence-based guidelines improved the quality of primary care for atrial fibrillation and TIA. The intervention was feasible and very cost effective. Key components of the model include contextual analysis, strong professional support, clear recommendations based on robust evidence, simplicity of adoption, good communication and use of established networks and opinion leaders.
NASA Technical Reports Server (NTRS)
Kalton, G.
1983-01-01
A number of surveys were conducted to study the relationship between the level of aircraft or traffic noise exposure experienced by people living in a particular area and their annoyance with it. These surveys generally employ a clustered sample design which affects the precision of the survey estimates. Regression analysis of annoyance on noise measures and other variables is often an important component of the survey analysis. Formulae are presented for estimating the standard errors of regression coefficients and ratio of regression coefficients that are applicable with a two- or three-stage clustered sample design. Using a simple cost function, they also determine the optimum allocation of the sample across the stages of the sample design for the estimation of a regression coefficient.
Large and Small Magellanic Clouds age-metallicity relationships
NASA Astrophysics Data System (ADS)
Perren, G. I.; Piatti, A. E.; Vázquez, R. A.
2017-10-01
We present a new determination of the age-metallicity relation for both Magellanic Clouds, estimated through the homogeneous analysis of 239 observed star clusters. All clusters in our set were observed with the filters of the Washington photometric system. The Automated Stellar cluster Analysis package (ASteCA) was employed to derive the cluster's fundamental parameters, in particular their ages and metallicities, through an unassisted process. We find that our age-metallicity relations (AMRs) can not be fully matched to any of the estimations found in twelve previous works, and are better explained by a combination of several of them in different age intervals.
Consensus-Based Sorting of Neuronal Spike Waveforms
Fournier, Julien; Mueller, Christian M.; Shein-Idelson, Mark; Hemberger, Mike
2016-01-01
Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained “ground truth” data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data. PMID:27536990
Consensus-Based Sorting of Neuronal Spike Waveforms.
Fournier, Julien; Mueller, Christian M; Shein-Idelson, Mark; Hemberger, Mike; Laurent, Gilles
2016-01-01
Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained "ground truth" data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data.
NASA Astrophysics Data System (ADS)
Richardson, Jacob; Connor, Charles; Malservisi, Rocco; Bleacher, Jacob; Connor, Laura
2014-05-01
Clusters of tens to thousands of small volcanoes (diameters generally <30 km) are common features on the surface of Mars, Venus, and the Earth. These clusters may be described as distributed-style volcanism. Better characterizing the magmatic plumbing system of these clusters can constrain magma ascent processes as well as the regional magma production budget and heat flux beneath each cluster. Unfortunately, directly observing the plumbing systems of volcano clusters on Mars and Venus eludes our current geologic abilities. Because erosion exposes such systems at the Earth's surface, a better understanding of magmatic processes and migration can be achieved via field analysis. The terrestrial plumbing system of an eroded volcanic field may be a valuable planetary analog for Venus and Mars clusters. The magmatic plumbing system of a Pliocene-aged monogenetic volcanic field, emplaced at 0.8 km depth, is currently exposed as a sill and dike swarm in the San Rafael Desert of Central Utah, USA. The mafic bodies in this region intruded into Mesozoic sedimentary units and now make up the most erosion resistant units as sills, dikes, and plug-like conduits. Light Detection and Ranging (LiDAR) can identify volcanic units (sills, dikes, and conduits) at high resolution, both geomorphologically and with near infrared return intensity values. Two Terrestrial LiDAR Surveys and an Airborne LiDAR Survey have been carried out over the San Rafael volcanic swarm, producing a three dimensional point cloud over approximately 36 sq. km. From the point clouds of these surveys, 1-meter DEMs are produced and volcanic intrusions have been mapped. Here we present reconstructions of the volcanic instrusions of the San Rafael Swarm. We create this reconstruction by extrapolating mapped intrustions from the LiDAR surveys into a 3D space around the current surface. We compare the estimated intrusive volume to the estimated conduit density and estimates of extrusive volume at volcano clusters of similar density. The extrapolated reconstruction and conduit mapping provide a first-order estimate of the final intrustive/extrusive volume ratio for the now eroded volcanic field. Earth, Venus and Mars clusters are compared using Kernel Density Estimation (KDE) , which objectively compares cluster area, complexity, and vent density per sq. km. We show that Martian clusters are less dense than Venus clusters, which in turn are less dense than those on Earth. KDE and previous models of intrusive morphology for Mars and Venus are here used to calibrate the San Rafael plumbing system model to clusters on the two planets. The results from the calibrated Mars and Venus plumbing system models can be compared to previous estimates of magma budget and intrusive/extrusive ratios on Venus and Mars.
A population of gamma-ray emitting globular clusters seen with the Fermi Large Area Telescope
Abdo, A. A.
2010-11-24
Context. Globular clusters with their large populations of millisecond pulsars (MSPs) are believed to be potential emitters of high-energy gamma-ray emission. The observation of this emission provides a powerful tool to assess the millisecond pulsar population of a cluster, is essential for understanding the importance of binary systems for the evolution of globular clusters, and provides complementary insights into magnetospheric emission processes. Aims. Our goal is to constrain the millisecond pulsar populations in globular clusters from analysis of gamma-ray observations. Methods. We use 546 days of continuous sky-survey observations obtained with the Large Area Telescope aboard the Fermi Gamma-ray Spacemore » Telescope to study the gamma-ray emission towards 13 globular clusters. Results. Steady point-like high-energy gamma-ray emission has been significantly detected towards 8 globular clusters. Five of them (47 Tucanae, Omega Cen, NGC 6388, Terzan 5, and M 28) show hard spectral power indices (0.7 < Γ < 1.4) and clear evidence for an exponential cut-off in the range 1.0 - 2.6 GeV, which is the characteristic signature of magnetospheric emission from MSPs. Three of them (M 62, NGC 6440 and NGC 6652) also show hard spectral indices (1.0 < Γ < 1.7), however the presence of an exponential cut-off can not be unambiguously established. Three of them (Omega Cen, NGC 6388, NGC 6652) have no known radio or X-ray MSPs yet still exhibit MSP spectral properties. From the observed gamma-ray luminosities, we estimate the total number of MSPs that is expected to be present in these globular clusters. We show that our estimates of the MSP population correlate with the stellar encounter rate and we estimate 2600 - 4700 MSPs in Galactic globular clusters, commensurate with previous estimates. Conclusions. The observation of high-energy gamma-ray emission from globular clusters thus provides a reliable independent method to assess their millisecond pulsar populations.« less
NASA Astrophysics Data System (ADS)
Brewick, Patrick T.; Smyth, Andrew W.
2016-12-01
The authors have previously shown that many traditional approaches to operational modal analysis (OMA) struggle to properly identify the modal damping ratios for bridges under traffic loading due to the interference caused by the driving frequencies of the traffic loads. This paper presents a novel methodology for modal parameter estimation in OMA that overcomes the problems presented by driving frequencies and significantly improves the damping estimates. This methodology is based on finding the power spectral density (PSD) of a given modal coordinate, and then dividing the modal PSD into separate regions, left- and right-side spectra. The modal coordinates were found using a blind source separation (BSS) algorithm and a curve-fitting technique was developed that uses optimization to find the modal parameters that best fit each side spectra of the PSD. Specifically, a pattern-search optimization method was combined with a clustering analysis algorithm and together they were employed in a series of stages in order to improve the estimates of the modal damping ratios. This method was used to estimate the damping ratios from a simulated bridge model subjected to moving traffic loads. The results of this method were compared to other established OMA methods, such as Frequency Domain Decomposition (FDD) and BSS methods, and they were found to be more accurate and more reliable, even for modes that had their PSDs distorted or altered by driving frequencies.
Gas stripping and mixing in galaxy clusters: a numerical comparison study
NASA Astrophysics Data System (ADS)
Heß, Steffen; Springel, Volker
2012-11-01
The ambient hot intrahalo gas in clusters of galaxies is constantly fed and stirred by infalling galaxies, a process that can be studied in detail with cosmological hydrodynamical simulations. However, different numerical methods yield discrepant predictions for crucial hydrodynamical processes, leading for example to different entropy profiles in clusters of galaxies. In particular, the widely used Lagrangian smoothed particle hydrodynamics (SPH) scheme is suspected to strongly damp fluid instabilities and turbulence, which are both crucial to establish the thermodynamic structure of clusters. In this study, we test to which extent our recently developed Voronoi particle hydrodynamics (VPH) scheme yields different results for the stripping of gas out of infalling galaxies and for the bulk gas properties of cluster. We consider both the evolution of isolated galaxy models that are exposed to a stream of intracluster medium or are dropped into cluster models, as well as non-radiative cosmological simulations of cluster formation. We also compare our particle-based method with results obtained with a fundamentally different discretization approach as implemented in the moving-mesh code AREPO. We find that VPH leads to noticeably faster stripping of gas out of galaxies than SPH, in better agreement with the mesh-code than with SPH. We show that despite the fact that VPH in its present form is not as accurate as the moving mesh code in our investigated cases, its improved accuracy of gradient estimates makes VPH an attractive alternative to SPH.
Influence of exposure differences on city-to-city heterogeneity ...
Multi-city population-based epidemiological studies have observed heterogeneity between city-specific fine particulate matter (PM2.5)-mortality effect estimates. These studies typically use ambient monitoring data as a surrogate for exposure leading to potential exposure misclassification. The level of exposure misclassification can differ by city affecting the observed health effect estimate. The objective of this analysis is to evaluate whether previously developed residential infiltration-based city clusters can explain city-to-city heterogeneity in PM2.5 mortality risk estimates. In a prior paper 94 cities were clustered based on residential infiltration factors (e.g. home age/size, prevalence of air conditioning (AC)), resulting in 5 clusters. For this analysis, the association between PM2.5 and all-cause mortality was first determined in 77 cities across the United States for 2001–2005. Next, a second stage analysis was conducted evaluating the influence of cluster assignment on heterogeneity in the risk estimates. Associations between a 2-day (lag 0–1 days) moving average of PM2.5 concentrations and non-accidental mortality were determined for each city. Estimated effects ranged from −3.2 to 5.1% with a pooled estimate of 0.33% (95% CI: 0.13, 0.53) increase in mortality per 10 μg/m3 increase in PM2.5. The second stage analysis determined that cluster assignment was marginally significant in explaining the city-to-city heterogeneity. The health effe
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta, Nikhel; Saro, A.; Mohr, J. J.
We study the overdensity of point sources in the direction of X-ray-selected galaxy clusters from the meta-catalogue of X-ray-detected clusters of galaxies (MCXC; < z > = 0.14) at South Pole Telescope (SPT) and Sydney University Molonglo Sky Survey (SUMSS) frequencies. Flux densities at 95, 150 and 220 GHz are extracted from the 2500 deg 2 SPT-SZ survey maps at the locations of SUMSS sources, producing a multifrequency catalogue of radio galaxies. In the direction of massive galaxy clusters, the radio galaxy flux densities at 95 and 150 GHz are biased low by the cluster Sunyaev–Zel’dovich Effect (SZE) signal, whichmore » is negative at these frequencies. We employ a cluster SZE model to remove the expected flux bias and then study these corrected source catalogues. We find that the high-frequency radio galaxies are centrally concentrated within the clusters and that their luminosity functions (LFs) exhibit amplitudes that are characteristically an order of magnitude lower than the cluster LF at 843 MHz. We use the 150 GHz LF to estimate the impact of cluster radio galaxies on an SPT-SZ like survey. The radio galaxy flux typically produces a small bias on the SZE signal and has negligible impact on the observed scatter in the SZE mass–observable relation. If we assume there is no redshift evolution in the radio galaxy LF then 1.8 ± 0.7 per cent of the clusters with detection significance ξ ≥ 4.5 would be lost from the sample. As a result, allowing for redshift evolution of the form (1 + z) 2.5 increases the incompleteness to 5.6 ± 1.0 per cent. Improved constraints on the evolution of the cluster radio galaxy LF require a larger cluster sample extending to higher redshift.« less
NASA Astrophysics Data System (ADS)
Sharon, Keren; Gladders, Michael D.; Rigby, Jane R.; Bayliss, Matthew B.; Wuyts, Eva; Dahle, Håkon; Johnson, Traci L.; Florian, Michael K.; Dunham, Samuel; Murray, Katherine; Whitaker, Kate; Li, Nan
Driven by the unprecedented wealth of high quality data that is accumulating for the Frontier Fields, they are becoming some of the best-studied strong lensing clusters to date, and probably the next few years. As will be discussed intensively in this focus meeting, the FF prove transformative for many fields: from studies of the high redshift Universe, to the assembly and structure of the clusters themselves. The FF data and the extensive collaborative effort around this program will also allow us to examine and improve upon current lens modeling techniques. Strong lensing is a powerful tool for mass reconstruction of the cores of galaxy clusters of all scales, providing an estimate of the total (dark and seen) projected mass density distribution out to 0.5 Mpc. Though SL mass may be biased by contribution from structures along the line of sight, its strength is that it is relatively insensitive to assumptions on cluster baryon astrophysics and dynamical state. Like the Frontier Fields clusters, the most ``famous'' strong lensing clusters are at the high mass end; they lens dozens of background sources into multiple images, providing ample lensing constraints. In this talk, I will focus on how we can leverage what we learn from modeling the FF clusters in strong lensing studies of the hundreds of clusters that will be discovered in upcoming surveys. In typical clusters, unlike the Frontier Fields, the Bullet Cluster and A1689, we observe only one to a handful of background sources, and have limited lensing constraints. I will describe the limitations that such a configuration imposes on strong lens modeling, highlight measurements that are robust to the richness of lensing evidence, and address the sources of uncertainty and what sort of information can help reduce those uncertainties. This category of lensing clusters is most relevant to the wide cluster surveys of the future.
Gupta, Nikhel; Saro, A.; Mohr, J. J.; ...
2017-01-15
We study the overdensity of point sources in the direction of X-ray-selected galaxy clusters from the meta-catalogue of X-ray-detected clusters of galaxies (MCXC; < z > = 0.14) at South Pole Telescope (SPT) and Sydney University Molonglo Sky Survey (SUMSS) frequencies. Flux densities at 95, 150 and 220 GHz are extracted from the 2500 deg 2 SPT-SZ survey maps at the locations of SUMSS sources, producing a multifrequency catalogue of radio galaxies. In the direction of massive galaxy clusters, the radio galaxy flux densities at 95 and 150 GHz are biased low by the cluster Sunyaev–Zel’dovich Effect (SZE) signal, whichmore » is negative at these frequencies. We employ a cluster SZE model to remove the expected flux bias and then study these corrected source catalogues. We find that the high-frequency radio galaxies are centrally concentrated within the clusters and that their luminosity functions (LFs) exhibit amplitudes that are characteristically an order of magnitude lower than the cluster LF at 843 MHz. We use the 150 GHz LF to estimate the impact of cluster radio galaxies on an SPT-SZ like survey. The radio galaxy flux typically produces a small bias on the SZE signal and has negligible impact on the observed scatter in the SZE mass–observable relation. If we assume there is no redshift evolution in the radio galaxy LF then 1.8 ± 0.7 per cent of the clusters with detection significance ξ ≥ 4.5 would be lost from the sample. As a result, allowing for redshift evolution of the form (1 + z) 2.5 increases the incompleteness to 5.6 ± 1.0 per cent. Improved constraints on the evolution of the cluster radio galaxy LF require a larger cluster sample extending to higher redshift.« less
NASA Astrophysics Data System (ADS)
Sharon, Keren
2015-08-01
Driven by the unprecedented wealth of high quality data that is accumulating for the Frontier Fields, they are becoming some of the best-studied strong lensing clusters to date, and probably the next few years. As will be discussed intensively in this focus meeting, the FF prove transformative for many fields: from studies of the high redshift Universe, to the assembly and structure of the clusters themselves. The FF data and the extensive collaborative effort around this program will also allow us to examine and improve upon current lens modeling techniques. Strong lensing is a powerful tool for mass reconstruction of the cores of galaxy clusters of all scales, providing an estimate of the total (dark and seen) projected mass density distribution out to ~0.5 Mpc. Though SL mass may be biased by contribution from structures along the line of sight, its strength is that it is relatively insensitive to assumptions on cluster baryon astrophysics and dynamical state. Like the Frontier Fields clusters, the most "famous" strong lensing clusters are at the high mass end; they lens dozens of background sources into multiple images, providing ample lensing constraints. In this talk, I will focus on how we can leverage what we learn from modeling the FF clusters in strong lensing studies of the hundreds of clusters that will be discovered in upcoming surveys. In typical clusters, unlike the Frontier Fields, the Bullet Cluster and A1689, we observe only one to a handful of background sources, and have limited lensing constraints. I will describe the limitations that such a configuration imposes on strong lens modeling, highlight measurements that are robust to the richness of lensing evidence, and address the sources of uncertainty and what sort of information can help reduce those uncertainties. This category of lensing clusters is most relevant to the wide cluster surveys of the future.
Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo
2016-01-01
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
Olives, Casey; Valadez, Joseph J; Pagano, Marcello
2014-03-01
To assess the bias incurred when curtailment of Lot Quality Assurance Sampling (LQAS) is ignored, to present unbiased estimators, to consider the impact of cluster sampling by simulation and to apply our method to published polio immunization data from Nigeria. We present estimators of coverage when using two kinds of curtailed LQAS strategies: semicurtailed and curtailed. We study the proposed estimators with independent and clustered data using three field-tested LQAS designs for assessing polio vaccination coverage, with samples of size 60 and decision rules of 9, 21 and 33, and compare them to biased maximum likelihood estimators. Lastly, we present estimates of polio vaccination coverage from previously published data in 20 local government authorities (LGAs) from five Nigerian states. Simulations illustrate substantial bias if one ignores the curtailed sampling design. Proposed estimators show no bias. Clustering does not affect the bias of these estimators. Across simulations, standard errors show signs of inflation as clustering increases. Neither sampling strategy nor LQAS design influences estimates of polio vaccination coverage in 20 Nigerian LGAs. When coverage is low, semicurtailed LQAS strategies considerably reduces the sample size required to make a decision. Curtailed LQAS designs further reduce the sample size when coverage is high. Results presented dispel the misconception that curtailed LQAS data are unsuitable for estimation. These findings augment the utility of LQAS as a tool for monitoring vaccination efforts by demonstrating that unbiased estimation using curtailed designs is not only possible but these designs also reduce the sample size. © 2014 John Wiley & Sons Ltd.
Depth of interaction decoding of a continuous crystal detector module.
Ling, T; Lewellen, T K; Miyaoka, R S
2007-04-21
We present a clustering method to extract the depth of interaction (DOI) information from an 8 mm thick crystal version of our continuous miniature crystal element (cMiCE) small animal PET detector. This clustering method, based on the maximum-likelihood (ML) method, can effectively build look-up tables (LUT) for different DOI regions. Combined with our statistics-based positioning (SBP) method, which uses a LUT searching algorithm based on the ML method and two-dimensional mean-variance LUTs of light responses from each photomultiplier channel with respect to different gamma ray interaction positions, the position of interaction and DOI can be estimated simultaneously. Data simulated using DETECT2000 were used to help validate our approach. An experiment using our cMiCE detector was designed to evaluate the performance. Two and four DOI region clustering were applied to the simulated data. Two DOI regions were used for the experimental data. The misclassification rate for simulated data is about 3.5% for two DOI regions and 10.2% for four DOI regions. For the experimental data, the rate is estimated to be approximately 25%. By using multi-DOI LUTs, we also observed improvement of the detector spatial resolution, especially for the corner region of the crystal. These results show that our ML clustering method is a consistent and reliable way to characterize DOI in a continuous crystal detector without requiring any modifications to the crystal or detector front end electronics. The ability to characterize the depth-dependent light response function from measured data is a major step forward in developing practical detectors with DOI positioning capability.
Optical polarimetric and near-infrared photometric study of the RCW95 Galactic H II region
NASA Astrophysics Data System (ADS)
Vargas-González, J.; Roman-Lopes, A.; Santos, F. P.; Franco, G. A. P.; Santos, J. F. C.; Maia, F. F. S.; Sanmartim, D.
2018-02-01
We carried out an optical polarimetric study in the direction of the RCW 95 star-forming region in order to probe the sky-projected magnetic field structure by using the distribution of linear polarization segments which seem to be well aligned with the more extended cloud component. A mean polarization angle of θ = 49.8° ± 7.7°7 was derived. Through the spectral dependence analysis of polarization it was possible to obtain the total-to-selective extinction ratio (RV) by fitting the Serkowski function, resulting in a mean value of RV = 2.93 ± 0.47. The foreground polarization component was estimated and is in agreement with previous studies in this direction of the Galaxy. Further, near-infrared (NIR) images from Vista Variables in the Via Láctea (VVV) survey were collected to improve the study of the stellar population associated with the H II region. The Automated Stellar Cluster Analysis algorithm was employed to derive structural parameters for two clusters in the region, and a set of PAdova and TRieste Stellar Evolution Code (PARSEC) isochrones was superimposed on the decontaminated colour-magnitude diagrams to estimate an age of about 3 Myr for both clusters. Finally, from the NIR photometry study combined with spectra obtained with the Ohio State Infrared Imager and Spectrometer mounted at the Southern Astrophysics Research Telescope we derived the spectral classification of the main ionizing sources in the clusters associated with IRAS 15408-5356 and IRAS 15412-5359, both objects classified as O4V stars.
Last, Anna; Burr, Sarah; Alexander, Neal; Harding-Esch, Emma; Roberts, Chrissy H; Nabicassa, Meno; Cassama, Eunice Teixeira da Silva; Mabey, David; Holland, Martin; Bailey, Robin
2017-07-31
Chlamydia trachomatis (Ct) is the most common cause of bacterial sexually transmitted infection and infectious cause of blindness (trachoma) worldwide. Understanding the spatial distribution of Ct infection may enable us to identify populations at risk and improve our understanding of Ct transmission. In this study, we sought to investigate the spatial distribution of Ct infection and the clinical features associated with high Ct load in trachoma-endemic communities on the Bijagós Archipelago (Guinea Bissau). We collected 1507 conjunctival samples and corresponding detailed clinical data during a cross-sectional population-based geospatially representative trachoma survey. We used droplet digital PCR to estimate Ct load on conjunctival swabs. Geostatistical tools were used to investigate clustering of ocular Ct infections. Spatial clusters (independent of age and gender) of individuals with high Ct loads were identified using local indicators of spatial association. We did not detect clustering of individuals with low load infections. These data suggest that infections with high bacterial load may be important in Ct transmission. These geospatial tools may be useful in the study of ocular Ct transmission dynamics and as part of trachoma surveillance post-treatment, to identify clusters of infection and thresholds of Ct load that may be important foci of re-emergent infection in communities. © FEMS 2017.
Cluster Detection Tests in Spatial Epidemiology: A Global Indicator for Performance Assessment
Guttmann, Aline; Li, Xinran; Feschet, Fabien; Gaudart, Jean; Demongeot, Jacques; Boire, Jean-Yves; Ouchchane, Lemlih
2015-01-01
In cluster detection of disease, the use of local cluster detection tests (CDTs) is current. These methods aim both at locating likely clusters and testing for their statistical significance. New or improved CDTs are regularly proposed to epidemiologists and must be subjected to performance assessment. Because location accuracy has to be considered, performance assessment goes beyond the raw estimation of type I or II errors. As no consensus exists for performance evaluations, heterogeneous methods are used, and therefore studies are rarely comparable. A global indicator of performance, which assesses both spatial accuracy and usual power, would facilitate the exploration of CDTs behaviour and help between-studies comparisons. The Tanimoto coefficient (TC) is a well-known measure of similarity that can assess location accuracy but only for one detected cluster. In a simulation study, performance is measured for many tests. From the TC, we here propose two statistics, the averaged TC and the cumulated TC, as indicators able to provide a global overview of CDTs performance for both usual power and location accuracy. We evidence the properties of these two indicators and the superiority of the cumulated TC to assess performance. We tested these indicators to conduct a systematic spatial assessment displayed through performance maps. PMID:26086911
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Kuan-Hao; Hu, Lingzhi; Traughber, Melanie
Purpose: MR-based pseudo-CT has an important role in MR-based radiation therapy planning and PET attenuation correction. The purpose of this study is to establish a clinically feasible approach, including image acquisition, correction, and CT formation, for pseudo-CT generation of the brain using a single-acquisition, undersampled ultrashort echo time (UTE)-mDixon pulse sequence. Methods: Nine patients were recruited for this study. For each patient, a 190-s, undersampled, single acquisition UTE-mDixon sequence of the brain was acquired (TE = 0.1, 1.5, and 2.8 ms). A novel method of retrospective trajectory correction of the free induction decay (FID) signal was performed based on point-spreadmore » functions of three external MR markers. Two-point Dixon images were reconstructed using the first and second echo data (TE = 1.5 and 2.8 ms). R2{sup ∗} images (1/T2{sup ∗}) were then estimated and were used to provide bone information. Three image features, i.e., Dixon-fat, Dixon-water, and R2{sup ∗}, were used for unsupervised clustering. Five tissue clusters, i.e., air, brain, fat, fluid, and bone, were estimated using the fuzzy c-means (FCM) algorithm. A two-step, automatic tissue-assignment approach was proposed and designed according to the prior information of the given feature space. Pseudo-CTs were generated by a voxelwise linear combination of the membership functions of the FCM. A low-dose CT was acquired for each patient and was used as the gold standard for comparison. Results: The contrast and sharpness of the FID images were improved after trajectory correction was applied. The mean of the estimated trajectory delay was 0.774 μs (max: 1.350 μs; min: 0.180 μs). The FCM-estimated centroids of different tissue types showed a distinguishable pattern for different tissues, and significant differences were found between the centroid locations of different tissue types. Pseudo-CT can provide additional skull detail and has low bias and absolute error of estimated CT numbers of voxels (−22 ± 29 HU and 130 ± 16 HU) when compared to low-dose CT. Conclusions: The MR features generated by the proposed acquisition, correction, and processing methods may provide representative clustering information and could thus be used for clinical pseudo-CT generation.« less
Competing risks regression for clustered data
Zhou, Bingqing; Fine, Jason; Latouche, Aurelien; Labopin, Myriam
2012-01-01
A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine–Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry. PMID:22045910
Peiris, David; Usherwood, Tim; Panaretto, Kathryn; Harris, Mark; Hunt, Jennifer; Redfern, Julie; Zwar, Nicholas; Colagiuri, Stephen; Hayman, Noel; Lo, Serigne; Patel, Bindu; Lyford, Marilyn; MacMahon, Stephen; Neal, Bruce; Sullivan, David; Cass, Alan; Jackson, Rod; Patel, Anushka
2015-01-01
Despite effective treatments to reduce cardiovascular disease risk, their translation into practice is limited. Using a parallel arm cluster-randomized controlled trial in 60 Australian primary healthcare centers, we tested whether a multifaceted quality improvement intervention comprising computerized decision support, audit/feedback tools, and staff training improved (1) guideline-indicated risk factor measurements and (2) guideline-indicated medications for those at high cardiovascular disease risk. Centers had to use a compatible software system, and eligible patients were regular attendees (Aboriginal and Torres Strait Islander people aged ≥ 35 years and others aged ≥ 45 years). Patient-level analyses were conducted using generalized estimating equations to account for clustering. Median follow-up for 38,725 patients (mean age, 61.0 years; 42% men) was 17.5 months. Mean monthly staff support was <1 hour/site. For the coprimary outcomes, the intervention was associated with improved overall risk factor measurements (62.8% versus 53.4% risk ratio; 1.25; 95% confidence interval, 1.04-1.50; P=0.02), but there was no significant differences in recommended prescriptions for the high-risk cohort (n=10,308; 56.8% versus 51.2%; P=0.12). There were significant treatment escalations (new prescriptions or increased numbers of medicines) for antiplatelet (17.9% versus 2.7%; P<0.001), lipid-lowering (19.2% versus 4.8%; P<0.001), and blood pressure-lowering medications (23.3% versus 12.1%; P=0.02). In Australian primary healthcare settings, a computer-guided quality improvement intervention, requiring minimal support, improved cardiovascular disease risk measurement but did not increase prescription rates in the high-risk group. Computerized quality improvement tools offer an important, albeit partial, solution to improving primary healthcare system capacity for cardiovascular disease risk management. https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=336630. Australian New Zealand Clinical Trials Registry No. 12611000478910. © 2015 American Heart Association, Inc.
Checking the possibility of controlling fuel element by X-ray computerized tomography
NASA Astrophysics Data System (ADS)
Trinh, V. B.; Zhong, Y.; Osipov, S. P.; Batranin, A. V.
2017-08-01
The article considers the possibility of checking fuel elements by X-ray computerized tomography. The checking tasks are based on the detection of particles of active material, evaluation of the heterogeneity of the distribution of uranium salts and the detection of clusters of uranium particles. First of all, scheme of scanning improve the performance and quality of the resulting three-dimensional images of the internal structure is determined. Further, the possibility of detecting clusters of uranium particles having the size of 1 mm3 and measuring the coordinates of clusters of uranium particles in the middle layer with the accuracy of within a voxel size (for the considered experiments of about 80 μm) is experimentally proved in the main part. The problem of estimating the heterogeneity of the distribution of the active material in the middle layer and the detection of particles of active material with a nominal diameter of 0.1 mm in the “blank” is solved.
A Comparative Evaluation of Anomaly Detection Algorithms for Maritime Video Surveillance
2011-01-01
of k-means clustering and the k- NN Localized p-value Estimator ( KNN -LPE). K-means is a popular distance-based clustering algorithm while KNN -LPE...implemented the sparse cluster identification rule we described in Section 3.1. 2. k-NN Localized p-value Estimator ( KNN -LPE): We implemented this using...Average Density ( KNN -NAD): This was implemented as described in Section 3.4. Algorithm Parameter Settings The global and local density-based anomaly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.
Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less
Nanospectroscopy of thiacyanine dye molecules adsorbed on silver nanoparticle clusters
NASA Astrophysics Data System (ADS)
Ralević, Uroš; Isić, Goran; Anicijević, Dragana Vasić; Laban, Bojana; Bogdanović, Una; Lazović, Vladimir M.; Vodnik, Vesna; Gajić, Radoš
2018-03-01
The adsorption of thiacyanine dye molecules on citrate-stabilized silver nanoparticle clusters drop-cast onto freshly cleaved mica or highly oriented pyrolytic graphite surfaces is examined using colocalized surface-enhanced Raman spectroscopy and atomic force microscopy. The incidence of dye Raman signatures in photoluminescence hotspots identified around nanoparticle clusters is considered for both citrate- and borate-capped silver nanoparticles and found to be substantially lower in the former case, suggesting that the citrate anions impede the efficient dye adsorption. Rigorous numerical simulations of light scattering on random nanoparticle clusters are used for estimating the electromagnetic enhancement and elucidating the hotspot formation mechanism. The majority of the enhanced Raman signal, estimated to be more than 90%, is found to originate from the nanogaps between adjacent nanoparticles in the cluster, regardless of the cluster size and geometry.
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Structural parameters of young star clusters: fractal analysis
NASA Astrophysics Data System (ADS)
Hetem, A.
2017-07-01
A unified view of star formation in the Universe demand detailed and in-depth studies of young star clusters. This work is related to our previous study of fractal statistics estimated for a sample of young stellar clusters (Gregorio-Hetem et al. 2015, MNRAS 448, 2504). The structural properties can lead to significant conclusions about the early stages of cluster formation: 1) virial conditions can be used to distinguish warm collapsed; 2) bound or unbound behaviour can lead to conclusions about expansion; and 3) fractal statistics are correlated to the dynamical evolution and age. The technique of error bars estimation most used in the literature is to adopt inferential methods (like bootstrap) to estimate deviation and variance, which are valid only for an artificially generated cluster. In this paper, we expanded the number of studied clusters, in order to enhance the investigation of the cluster properties and dynamic evolution. The structural parameters were compared with fractal statistics and reveal that the clusters radial density profile show a tendency of the mean separation of the stars increase with the average surface density. The sample can be divided into two groups showing different dynamic behaviour, but they have the same dynamic evolution, since the entire sample was revealed as being expanding objects, for which the substructures do not seem to have been completely erased. These results are in agreement with the simulations adopting low surface densities and supervirial conditions.
Locally Weighted Ensemble Clustering.
Huang, Dong; Wang, Chang-Dong; Lai, Jian-Huang
2018-05-01
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
2012-01-01
Background Estimation of vaccination coverage at the local level is essential to identify communities that may require additional support. Cluster surveys can be used in resource-poor settings, when population figures are inaccurate. To be feasible, cluster samples need to be small, without losing robustness of results. The clustered LQAS (CLQAS) approach has been proposed as an alternative, as smaller sample sizes are required. Methods We explored (i) the efficiency of cluster surveys of decreasing sample size through bootstrapping analysis and (ii) the performance of CLQAS under three alternative sampling plans to classify local VC, using data from a survey carried out in Mali after mass vaccination against meningococcal meningitis group A. Results VC estimates provided by a 10 × 15 cluster survey design were reasonably robust. We used them to classify health areas in three categories and guide mop-up activities: i) health areas not requiring supplemental activities; ii) health areas requiring additional vaccination; iii) health areas requiring further evaluation. As sample size decreased (from 10 × 15 to 10 × 3), standard error of VC and ICC estimates were increasingly unstable. Results of CLQAS simulations were not accurate for most health areas, with an overall risk of misclassification greater than 0.25 in one health area out of three. It was greater than 0.50 in one health area out of two under two of the three sampling plans. Conclusions Small sample cluster surveys (10 × 15) are acceptably robust for classification of VC at local level. We do not recommend the CLQAS method as currently formulated for evaluating vaccination programmes. PMID:23057445
NASA Astrophysics Data System (ADS)
Zhang, Yu-Ying; Reiprich, Thomas H.; Schneider, Peter; Clerc, Nicolas; Merloni, Andrea; Schwope, Axel; Borm, Katharina; Andernach, Heinz; Caretta, César A.; Wu, Xiang-Ping
2017-03-01
We present the relation of X-ray luminosity versus dynamical mass for 63 nearby clusters of galaxies in a flux-limited sample, the HIghest X-ray FLUx Galaxy Cluster Sample (HIFLUGCS, consisting of 64 clusters). The luminosity measurements are obtained based on 1.3 Ms of clean XMM-Newton data and ROSAT pointed observations. The masses are estimated using optical spectroscopic redshifts of 13647 cluster galaxies in total. We classify clusters into disturbed and undisturbed based on a combination of the X-ray luminosity concentration and the offset between the brightest cluster galaxy and X-ray flux-weighted center. Given sufficient numbers (I.e., ≥45) of member galaxies when the dynamical masses are computed, the luminosity versus mass relations agree between the disturbed and undisturbed clusters. The cool-core clusters still dominate the scatter in the luminosity versus mass relation even when a core-corrected X-ray luminosity is used, which indicates that the scatter of this scaling relation mainly reflects the structure formation history of the clusters. As shown by the clusters with only few spectroscopically confirmed members, the dynamical masses can be underestimated and thus lead to a biased scaling relation. To investigate the potential of spectroscopic surveys to follow up high-redshift galaxy clusters or groups observed in X-ray surveys for the identifications and mass calibrations, we carried out Monte Carlo resampling of the cluster galaxy redshifts and calibrated the uncertainties of the redshift and dynamical mass estimates when only reduced numbers of galaxy redshifts per cluster are available. The resampling considers the SPIDERS and 4MOST configurations, designed for the follow-up of the eROSITA clusters, and was carried out for each cluster in the sample at the actual cluster redshift as well as at the assigned input cluster redshifts of 0.2, 0.4, 0.6, and 0.8. To follow up very distant clusters or groups, we also carried out the mass calibration based on the resampling with only ten redshifts per cluster, and redshift calibration based on the resampling with only five and ten redshifts per cluster, respectively. Our results demonstrate the power of combining upcoming X-ray and optical spectroscopic surveys for mass calibration of clusters. The scatter in the dynamical mass estimates for the clusters with at least ten members is within 50%.
ASCA Temperature Maps for Merging and Relaxed Clusters and Physics of the Cluster Gas
NASA Technical Reports Server (NTRS)
Markevitch, M.; Sarazin, C.; Nevalainen, J.; Vikhlinin, A.; Forman, W.
1999-01-01
ASCA temperature maps for several galaxy clusters undergoing strong mergers will be presented. From these maps, it is possible to estimate velocities of the colliding subclusters. I will discuss several interesting implications of these estimates for the physics of the cluster gas and the shape of the gravitational potential. I will also present temperature maps and profiles for several relaxed clusters selected for X-ray mass determination, and present the mass values derived without the assumption of isothermality. The accurate mass-temperature and luminosity-temperature relations will be discussed. This talk will review how AXAF will revolutionize X-ray astronomy through its radically better imaging and spectroscopic resolution. Examples from many fields of astrophysics will be given.
Calibrating First-Order Strong Lensing Mass Estimates in Clusters of Galaxies
NASA Astrophysics Data System (ADS)
Reed, Brendan; Remolian, Juan; Sharon, Keren; Li, Nan; SPT Clusters Cooperation
2018-01-01
We investigate methods to reduce the statistical and systematic errors inherent to using the Einstein Radius as a first-order mass estimate in strong lensing galaxy clusters. By finding an empirical universal calibration function, we aim to enable a first-order mass estimate of large cluster data sets in a fraction of the time and effort of full-scale strong lensing mass modeling. We use 74 simulated cluster data from the Argonne National Laboratory in a lens redshift slice of [0.159, 0.667] with various source redshifts in the range of [1.23, 2.69]. From the simulated density maps, we calculate the exact mass enclosed within the Einstein Radius. We find that the mass inferred from the Einstein Radius alone produces an error width of ~39% with respect to the true mass. We explore an array of polynomial and exponential correction functions with dependence on cluster redshift and projected radii of the lensed images, aiming to reduce the statistical and systematic uncertainty. We find that the error on the the mass inferred from the Einstein Radius can be reduced significantly by using a universal correction function. Our study has implications for current and future large galaxy cluster surveys aiming to measure cluster mass, and the mass-concentration relation.
Revisiting Abell 2744: a powerful synergy of GLASS spectroscopy and HFF photometry
NASA Astrophysics Data System (ADS)
Wang, Xin; Wang
We present new emission line identifications and improve the lensing reconstruction of the mass distribution of galaxy cluster Abell 2744 using the Grism Lens-Amplified Survey from Space (GLASS) spectroscopy and the Hubble Frontier Fields (HFF) imaging. We performed blind and targeted searches for faint line emitters on all objects, including the arc sample, within the field of view (FoV) of GLASS prime pointings. We report 55 high quality spectroscopic redshifts, 5 of which are for arc images. We also present an extensive analysis based on the HFF photometry, measuring the colors and photometric redshifts of all objects within the FoV, and comparing the spectroscopic and photometric redshift estimates. In order to improve the lens model of Abell 2744, we develop a rigorous algorithm to screen arc images, based on their colors and morphology, and selecting the most reliable ones to use. As a result, 25 systems (corresponding to 72 images) pass the screening process and are used to reconstruct the gravitational potential of the cluster pixellated on an adaptive mesh. The resulting total mass distribution is compared with a stellar mass map obtained from the Spitzer Frontier Fields data in order to study the relative distribution of stars and dark matter in the cluster.
Soft context clustering for F0 modeling in HMM-based speech synthesis
NASA Astrophysics Data System (ADS)
Khorram, Soheil; Sameti, Hossein; King, Simon
2015-12-01
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
An astrophysics data program investigation of cluster evolution
NASA Technical Reports Server (NTRS)
Kellogg, Edwin M.
1990-01-01
A preliminary status report is given on studies using the Einstein x ray observations of distant clusters of galaxies that are also candidates for gravitational lenses. The studies will determine the location and surface brightness distribution of the x ray emission from clusters associated with selected gravitational lenses. The x ray emission comes from hot gas that traces out the total gravitational potential in the cluster, so its distribution is approximately the same as the mass distribution causing gravitational lensing. Core radii and x ray virial masses can be computed for several of the brighter Einstein sources, and preliminary results are presented on A2218. Preliminary status is also reported on a study of the optical data from 0024+16. A provisional value of 1800 to 2200 km/s for the equivalent velocity dispersion is obtained. The ultimate objective is to extract the mass of the gravitational lens, and perhaps more detailed information on the distribution of matter as warranted. A survey of the Einstein archive shows that the clusters A520, A1704, 3C295, A2397, A1722, SC5029-247, A3186 and A370 have enough x ray counts observed to warrant more detailed optical observations of arcs for comparison. Mass estimates for these clusters can therefore be obtained from three independent sources: the length scale (core radius) that characterizes the density dropoff of the x ray emitting hot gas away from its center, the velocity dispersion of the galaxies moving in the cluster potential, and gravitational bending of light by the total cluster mass. This study will allow the comparison of these three techniques and ultimately improve the knowledge of cluster masses.
Nair, Nirmala; Tripathy, Prasanta; Sachdev, Harshpal S; Bhattacharyya, Sanghita; Gope, Rajkumar; Gagrai, Sumitra; Rath, Shibanand; Rath, Suchitra; Sinha, Rajesh; Roy, Swati Sarbani; Shewale, Suhas; Singh, Vijay; Srivastava, Aradhana; Pradhan, Hemanta; Costello, Anthony; Copas, Andrew; Skordis-Worrall, Jolene; Haghparast-Bidgoli, Hassan; Saville, Naomi; Prost, Audrey
2015-04-15
Child stunting (low height-for-age) is a marker of chronic undernutrition and predicts children's subsequent physical and cognitive development. Around one third of the world's stunted children live in India. Our study aims to assess the impact, cost-effectiveness, and scalability of a community intervention with a government-proposed community-based worker to improve growth in children under two in rural India. The study is a cluster randomised controlled trial in two rural districts of Jharkhand and Odisha (eastern India). The intervention tested involves a community-based worker carrying out two activities: (a) one home visit to all pregnant women in the third trimester, followed by subsequent monthly home visits to all infants aged 0-24 months to support appropriate feeding, infection control, and care-giving; (b) a monthly women's group meeting using participatory learning and action to catalyse individual and community action for maternal and child health and nutrition. Both intervention and control clusters also receive an intervention to strengthen Village Health Sanitation and Nutrition Committees. The unit of randomisation is a purposively selected cluster of approximately 1000 population. A total of 120 geographical clusters covering an estimated population of 121,531 were randomised to two trial arms: 60 clusters in the intervention arm receive home visits, group meetings, and support to Village Health Sanitation and Nutrition Committees; 60 clusters in the control arm receive support to Committees only. The study participants are pregnant women identified in the third trimester of pregnancy and their children (n = 2520). Mothers and their children are followed up at seven time points: during pregnancy, within 72 hours of delivery, and at 3, 6, 9, 12 and 18 months after birth. The trial's primary outcome is children's mean length-for-age Z scores at 18 months. Secondary outcomes include wasting and underweight at all time points, birth weight, growth velocity, feeding, infection control, and care-giving practices. Additional qualitative and quantitative data are collected for process and economic evaluations. This trial will contribute to evidence on effective strategies to improve children's growth in India. ISRCTN register 51505201 ; Clinical Trials Registry of India number 2014/06/004664.
An improved level set method for brain MR images segmentation and bias correction.
Chen, Yunjie; Zhang, Jianwei; Macione, Jim
2009-10-01
Intensity inhomogeneities cause considerable difficulty in the quantitative analysis of magnetic resonance (MR) images. Thus, bias field estimation is a necessary step before quantitative analysis of MR data can be undertaken. This paper presents a variational level set approach to bias correction and segmentation for images with intensity inhomogeneities. Our method is based on an observation that intensities in a relatively small local region are separable, despite of the inseparability of the intensities in the whole image caused by the overall intensity inhomogeneity. We first define a localized K-means-type clustering objective function for image intensities in a neighborhood around each point. The cluster centers in this objective function have a multiplicative factor that estimates the bias within the neighborhood. The objective function is then integrated over the entire domain to define the data term into the level set framework. Our method is able to capture bias of quite general profiles. Moreover, it is robust to initialization, and thereby allows fully automated applications. The proposed method has been used for images of various modalities with promising results.
Structure of clusters with bimodal distribution of galaxy line-of-sight velocities III: A1831
NASA Astrophysics Data System (ADS)
Kopylov, A. I.; Kopylova, F. G.
2010-07-01
We study the A1831 cluster within the framework of our program of the investigation of galaxy clusters with bimodal velocity distributions (i.e., clusters where the velocities of subsystems differ by more than Δ cz ˜ 3000 km/s).We identify two subsystems in this cluster: A1831A ( cz = 18970 km/s) and A1831B ( cz = 22629 km/s) and directly estimate the distances to these subsystems using three methods applied to early-type galaxies: the Kormendy relation, the photometric plane, and the fundamental plane. To this end, we use the results of our observations made with the 1-m telescope of the Special Astrophysical Observatory of the Russian Academy of Sciences and the data adopted from the SDSS DR6 catalog. We confirmed at a 99% confidence level that (1) the two subsystems are located at different distances, which are close to their Hubble distances, and (2) the two subsystems are located behind one another along the line of sight and are not gravitationally bound to each other. Both clusters have a complex internal structure, which makes it difficult to determine their dynamical parameters. Our estimates for the velocity dispersions and masses of the two clusters: 480 km/s and 1.9 × 1014 M ⊙ for A1831A, 952 km/s and 1.4 × 1015 M ⊙ for A1831B should be views as upper limits. At least three spatially and kinematically distinct groups of galaxies can be identified in the foreground cluster A1831A, and this fact is indicative of its incomplete dynamical relaxation. Neither can we rule out the possibility of a random projection. The estimate of the mass of the main cluster A1831B based on the dispersion of the line-of-sight velocities of galaxies is two-to-three times greater than the independent mass estimates based on the total K-band luminosity, temperature, and luminosity of the X-ray gas of the cluster. This fact, combined with the peculiarities of its kinematical structure, leads us to conclude that the cluster is in a dynamically active state: galaxies and groups of galaxies with large line-of-sight velocities relative to the center of the cluster accrete onto the virialized nucleus of the cluster (possibly, along the filament directed close to the line of sight).
Return period estimates for European windstorm clusters: a multi-model perspective
NASA Astrophysics Data System (ADS)
Renggli, Dominik; Zimmerli, Peter
2017-04-01
Clusters of storms over Europe can lead to very large aggregated losses. Realistic return period estimates for such cluster are therefore of vital interest to the (re)insurance industry. Such return period estimates are usually derived from historical storm activity statistics of the last 30 to 40 years. However, climate models provide an alternative source, potentially representing thousands of simulated storm seasons. In this study, we made use of decadal hindcast data from eight different climate models in the CMIP5 archive. We used an objective tracking algorithm to identify individual windstorms in the climate model data. The algorithm also computes a (population density weighted) Storm Severity Index (SSI) for each of the identified storms (both on a continental and more regional basis). We derived return period estimates for the cluster seasons 1990, 1999, 2013/2014 and 1884 in the following way: For each climate model, we extracted two different exceedance frequency curves. The first describes the exceedance frequency (or the return period as the inverse of it) of a given SSI level due to an individual storm occurrence. The second describes the exceedance frequency of the seasonally aggregated SSI level (i.e. the sum of the SSI values of all storms in a given season). Starting from appropriate return period assumptions for each individual storm of a historical cluster (e.g. Anatol, Lothar and Martin in 1999) and using the first curve, we extracted the SSI levels at the corresponding return periods. Summing these SSI values results in the seasonally aggregated SSI value. Combining this with the second (aggregated) exceedance frequency curve results in return period estimate of the historical cluster season. Since we do this for each model separately, we obtain eight different return period estimates for each historical cluster. In this way, we obtained the following return period estimates: 50 to 80 years for the 1990 season, 20 to 45 years for the 1999 season, 3 to 4 years for the 2013/2014 season, and 14 to 16 years for the 1884 season. More detailed results show substantial variation between five different regions (UK, France, Germany, Benelux and Scandinavia), as expected from the path and footprints of the different events. For example, the 1990 season is estimated to be well beyond a 100-year season for Germany and Benelux. 1999 clearly was an extreme season for France, whereas the1884 was very disruptive for the UK. Such return period estimates can be used as an independent benchmark for other approaches quantifying clustering of European windstorms. The study might also serve as an example to derive similar risk measures also for other climate-related perils from a robust, publicly available data source.
Combining Image Processing with Signal Processing to Improve Transmitter Geolocation Estimation
2014-03-27
transmitter by searching a grid of possible transmitter locations within the image region. At each evaluated grid point, theoretical TDOA values are computed...requires converting the image to a grayscale intensity image. This allows efficient manipulation of data and ease of comparison among pixel values . The...cluster of redundant y values along the top edge of an ideal rectangle. The same is true for the bottom edge, as well as for the x values along the
Improvements in Ionized Cluster-Beam Deposition
NASA Technical Reports Server (NTRS)
Fitzgerald, D. J.; Compton, L. E.; Pawlik, E. V.
1986-01-01
Lower temperatures result in higher purity and fewer equipment problems. In cluster-beam deposition, clusters of atoms formed by adiabatic expansion nozzle and with proper nozzle design, expanding vapor cools sufficiently to become supersaturated and form clusters of material deposited. Clusters are ionized and accelerated in electric field and then impacted on substrate where films form. Improved cluster-beam technique useful for deposition of refractory metals.
Range of plasma ions in cold cluster gases near the critical point
NASA Astrophysics Data System (ADS)
Zhang, G.; Quevedo, H. J.; Bonasera, A.; Donovan, M.; Dyer, G.; Gaul, E.; Guardo, G. L.; Gulino, M.; La Cognata, M.; Lattuada, D.; Palmerini, S.; Pizzone, R. G.; Romano, S.; Smith, H.; Trippella, O.; Anzalone, A.; Spitaleri, C.; Ditmire, T.
2017-05-01
We measure the range of plasma ions in cold cluster gases by using the Petawatt laser at the University of Texas-Austin. The produced plasma propagated in all directions some hitting the cold cluster gas not illuminated by the laser. From the ratio of the measured ion distributions at different angles we can estimate the range of the ions in the cold cluster gas. It is much smaller than estimated using popular models, which take only into account the slowing down of charged particles in uniform matter. We discuss the ion range in systems prepared near a liquid-gas phase transition.
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
Fast clustering using adaptive density peak detection.
Wang, Xiao-Feng; Xu, Yifan
2017-12-01
Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Dyons near the transition temperature in SU(3) lattice gluodynamics
NASA Astrophysics Data System (ADS)
Bornyakov, V. G.; Ilgenfritz, E.-M.; Martemyanov, B. V.
2018-05-01
We study the topological structure of SU(3) lattice gluodynamics by cluster analysis. This methodological study is meant as preparation for full QCD. The topological charge density is becoming visible in the process of over-improved gradient flow, which is monitored by means of the inverse participation ratio. The flow is stopped at the moment when calorons dissociate into dyons due to the over-improved character of the underlying action. This gives the possibility to simultaneously detect all three dyonic constituents of KvBLL calorons in the gluonic field. The behavior of the average Polyakov loop (PL) under (over-improved) gradient flow could also serve as a diagnostics for the actual phase the configuration is belonging to. Time-like Abelian monopole currents and specific patterns of the local PL are correlated with the topological clusters. The spectrum of reconstructed cluster charges Q cl corresponds to the phases. It is scattered around Q cl ≈ ±1/3 in the confined phase, whereas it is Q cl ≈ ±(0.5 ÷ 0.7) for heavy dyons and | {Q}{{cl}}| < 0.3 for light dyons in the deconfined phase. We estimate the density of heavy and light dyons at three values of temperature. We find that heavy dyons are increasingly suppressed with increasing temperature. The paper is dedicated to the memory of Michael Müller-Preussker who was a member of our research group for more than twenty years.
Hoddinott, John; Ahmed, Akhter; Karachiwalla, Naureen I; Roy, Shalini
2018-01-01
Behaviour change communication (BCC) can improve infant and young child nutrition (IYCN) knowledge, practices, and health outcomes. However, few studies have examined whether the improved knowledge persists after BCC activities end. This paper assesses the effect of nutrition sensitive social protection interventions on IYCN knowledge in rural Bangladesh, both during and after intervention activities. We use data from two, 2-year, cluster randomised control trials that included nutrition BCC in some treatment arms. These data were collected at intervention baseline, midline, and endline, and 6-10 months after the intervention ended. We analyse data on IYCN knowledge from the same 2,341 women over these 4 survey rounds. We construct a number correct score on 18 IYCN knowledge questions and assess whether the impact of the BCC changes over time for the different treatment groups. Effects are estimated using ordinary least squares accounting for the clustered design of the study. There are 3 main findings: First, the BCC improves IYCN knowledge substantially in the 1st year of the intervention; participants correctly answer 3.0-3.2 more questions (36% more) compared to the non-BCC groups. Second, the increase in knowledge between the 1st and 2nd year was smaller, an additional 0.7-0.9 correct answers. Third, knowledge persists; there are no significant decreases in IYCN knowledge 6-10 months after nutrition BCC activities ended. © 2017 The Authors. Maternal and Child Nutrition Published by John Wiley & Sons, Ltd.
AMMI adjustment for statistical analysis of an international wheat yield trial.
Crossa, J; Fox, P N; Pfeiffer, W H; Rajaram, S; Gauch, H G
1991-01-01
Multilocation trials are important for the CIMMYT Bread Wheat Program in producing high-yielding, adapted lines for a wide range of environments. This study investigated procedures for improving predictive success of a yield trial, grouping environments and genotypes into homogeneous subsets, and determining the yield stability of 18 CIMMYT bread wheats evaluated at 25 locations. Additive Main effects and Multiplicative Interaction (AMMI) analysis gave more precise estimates of genotypic yields within locations than means across replicates. This precision facilitated formation by cluster analysis of more cohesive groups of genotypes and locations for biological interpretation of interactions than occurred with unadjusted means. Locations were clustered into two subsets for which genotypes with positive interactions manifested in high, stable yields were identified. The analyses highlighted superior selections with both broad and specific adaptation.
Automatic Regionalization Algorithm for Distributed State Estimation in Power Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Dexin; Yang, Liuqing; Florita, Anthony
The deregulation of the power system and the incorporation of generation from renewable energy sources recessitates faster state estimation in the smart grid. Distributed state estimation (DSE) has become a promising and scalable solution to this urgent demand. In this paper, we investigate the regionalization algorithms for the power system, a necessary step before distributed state estimation can be performed. To the best of the authors' knowledge, this is the first investigation on automatic regionalization (AR). We propose three spectral clustering based AR algorithms. Simulations show that our proposed algorithms outperform the two investigated manual regionalization cases. With the helpmore » of AR algorithms, we also show how the number of regions impacts the accuracy and convergence speed of the DSE and conclude that the number of regions needs to be chosen carefully to improve the convergence speed of DSEs.« less
Automatic Regionalization Algorithm for Distributed State Estimation in Power Systems: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Dexin; Yang, Liuqing; Florita, Anthony
The deregulation of the power system and the incorporation of generation from renewable energy sources recessitates faster state estimation in the smart grid. Distributed state estimation (DSE) has become a promising and scalable solution to this urgent demand. In this paper, we investigate the regionalization algorithms for the power system, a necessary step before distributed state estimation can be performed. To the best of the authors' knowledge, this is the first investigation on automatic regionalization (AR). We propose three spectral clustering based AR algorithms. Simulations show that our proposed algorithms outperform the two investigated manual regionalization cases. With the helpmore » of AR algorithms, we also show how the number of regions impacts the accuracy and convergence speed of the DSE and conclude that the number of regions needs to be chosen carefully to improve the convergence speed of DSEs.« less
NASA Astrophysics Data System (ADS)
Mahanta, Upakul; Goswami, Aruna; Duorah, Hiralal; Duorah, Kalpana
2017-08-01
Elemental abundance patterns of globular cluster stars can provide important clues for understanding cluster formation and early chemical evolution. The origin of the abundance patterns, however, still remains poorly understood. We have studied the impact of p-capture reaction cycles on the abundances of oxygen, sodium and aluminium considering nuclear reaction cycles of carbon-nitrogen-oxygen-fluorine, neon-sodium and magnesium-aluminium in massive stars in stellar conditions of temperature range 2×107 to 10×107 K and typical density of 102 gm cc-1. We have estimated abundances of oxygen, sodium and aluminium with respect to Fe, which are then assumed to be ejected from those stars because of rotation reaching a critical limit. These ejected abundances of elements are then compared with their counterparts that have been observed in some metal-poor evolved stars, mainly giants and red giants, of globular clusters M3, M4, M13 and NGC 6752. We observe an excellent agreement with [O/Fe] between the estimated and observed abundance values for globular clusters M3 and M4 with a correlation coefficient above 0.9 and a strong linear correlation for the remaining two clusters with a correlation coefficient above 0.7. The estimated [Na/Fe] is found to have a correlation coefficient above 0.7, thus implying a strong correlation for all four globular clusters. As far as [Al/Fe] is concerned, it also shows a strong correlation between the estimated abundance and the observed abundance for globular clusters M13 and NGC 6752, since here also the correlation coefficient is above 0.7 whereas for globular cluster M4 there is a moderate correlation found with a correlation coefficient above 0.6. Possible sources of these discrepancies are discussed.
Remote sensing of a NTC radio source from a Cluster tilted spacecraft pair
NASA Astrophysics Data System (ADS)
Décréau, Pierrette; Kougblénou, Séna; Lointier, Guillaume; Rauch, Jean Louis; Trotignon, Jean Gabriel; Vallières, Xavier; Canu, Patrick; Rochel Grimald, Sandrine; El-Lemdani Mazouz, Farida; Darrouzet, Fabien
2014-05-01
The non-thermal continuum (NTC) radiation is a radio wave produced within the magnetosphere of a planet. It has been observed in space around Earth since the '70s, and within the magnetospheres of other planets since the late '80s. A new study using ESA's Cluster mission has shown improved precision in determining the source of various radio emissions produced by the Earth. The experiment involved tilting one of the four identical Cluster spacecraft to measure the electric field of this emission in three dimensions for the first time. Our analysis of a NTC case event pinpointed a small deviation from the generally assumed (circular) polarization of this emission. We show that classical triangulation, in this case using three of the spacecraft located thousands of kilometres apart, can lead to an erroneous source location. A second method, using the new 3D electric field measurements, indicated a source located along the plasmapause at medium geomagnetic latitude, far away from the source location estimated by triangulation. Cluster observations reveal that this NTC source emits from the flank of the plasmapause towards the polar cap. Understanding the source of NTC waves will help with the broader understanding of their generation, amplification, and propagation.
Penalized unsupervised learning with outliers
Witten, Daniela M.
2013-01-01
We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057
Braschel, Melissa C; Svec, Ivana; Darlington, Gerarda A; Donner, Allan
2016-04-01
Many investigators rely on previously published point estimates of the intraclass correlation coefficient rather than on their associated confidence intervals to determine the required size of a newly planned cluster randomized trial. Although confidence interval methods for the intraclass correlation coefficient that can be applied to community-based trials have been developed for a continuous outcome variable, fewer methods exist for a binary outcome variable. The aim of this study is to evaluate confidence interval methods for the intraclass correlation coefficient applied to binary outcomes in community intervention trials enrolling a small number of large clusters. Existing methods for confidence interval construction are examined and compared to a new ad hoc approach based on dividing clusters into a large number of smaller sub-clusters and subsequently applying existing methods to the resulting data. Monte Carlo simulation is used to assess the width and coverage of confidence intervals for the intraclass correlation coefficient based on Smith's large sample approximation of the standard error of the one-way analysis of variance estimator, an inverted modified Wald test for the Fleiss-Cuzick estimator, and intervals constructed using a bootstrap-t applied to a variance-stabilizing transformation of the intraclass correlation coefficient estimate. In addition, a new approach is applied in which clusters are randomly divided into a large number of smaller sub-clusters with the same methods applied to these data (with the exception of the bootstrap-t interval, which assumes large cluster sizes). These methods are also applied to a cluster randomized trial on adolescent tobacco use for illustration. When applied to a binary outcome variable in a small number of large clusters, existing confidence interval methods for the intraclass correlation coefficient provide poor coverage. However, confidence intervals constructed using the new approach combined with Smith's method provide nominal or close to nominal coverage when the intraclass correlation coefficient is small (<0.05), as is the case in most community intervention trials. This study concludes that when a binary outcome variable is measured in a small number of large clusters, confidence intervals for the intraclass correlation coefficient may be constructed by dividing existing clusters into sub-clusters (e.g. groups of 5) and using Smith's method. The resulting confidence intervals provide nominal or close to nominal coverage across a wide range of parameters when the intraclass correlation coefficient is small (<0.05). Application of this method should provide investigators with a better understanding of the uncertainty associated with a point estimator of the intraclass correlation coefficient used for determining the sample size needed for a newly designed community-based trial. © The Author(s) 2015.
Migration in the shearing sheet and estimates for young open cluster migration
NASA Astrophysics Data System (ADS)
Quillen, Alice C.; Nolting, Eric; Minchev, Ivan; De Silva, Gayandhi; Chiappini, Cristina
2018-04-01
Using tracer particles embedded in self-gravitating shearing sheet N-body simulations, we investigate the distance in guiding centre radius that stars or star clusters can migrate in a few orbital periods. The standard deviations of guiding centre distributions and maximum migration distances depend on the Toomre or critical wavelength and the contrast in mass surface density caused by spiral structure. Comparison between our simulations and estimated guiding radii for a few young supersolar metallicity open clusters, including NGC 6583, suggests that the contrast in mass surface density in the solar neighbourhood has standard deviation (in the surface density distribution) divided by mean of about 1/4 and larger than measured using COBE data by Drimmel and Spergel. Our estimate is consistent with a standard deviation of ˜0.07 dex in the metallicities measured from high-quality spectroscopic data for 38 young open clusters (<1 Gyr) with mean galactocentric radius 7-9 kpc.
Accounting for twin births in sample size calculations for randomised trials.
Yelland, Lisa N; Sullivan, Thomas R; Collins, Carmel T; Price, David J; McPhee, Andrew J; Lee, Katherine J
2018-05-04
Including twins in randomised trials leads to non-independence or clustering in the data. Clustering has important implications for sample size calculations, yet few trials take this into account. Estimates of the intracluster correlation coefficient (ICC), or the correlation between outcomes of twins, are needed to assist with sample size planning. Our aims were to provide ICC estimates for infant outcomes, describe the information that must be specified in order to account for clustering due to twins in sample size calculations, and develop a simple tool for performing sample size calculations for trials including twins. ICCs were estimated for infant outcomes collected in four randomised trials that included twins. The information required to account for clustering due to twins in sample size calculations is described. A tool that calculates the sample size based on this information was developed in Microsoft Excel and in R as a Shiny web app. ICC estimates ranged between -0.12, indicating a weak negative relationship, and 0.98, indicating a strong positive relationship between outcomes of twins. Example calculations illustrate how the ICC estimates and sample size calculator can be used to determine the target sample size for trials including twins. Clustering among outcomes measured on twins should be taken into account in sample size calculations to obtain the desired power. Our ICC estimates and sample size calculator will be useful for designing future trials that include twins. Publication of additional ICCs is needed to further assist with sample size planning for future trials. © 2018 John Wiley & Sons Ltd.
Decentralized cooperative TOA/AOA target tracking for hierarchical wireless sensor networks.
Chen, Ying-Chih; Wen, Chih-Yu
2012-11-08
This paper proposes a distributed method for cooperative target tracking in hierarchical wireless sensor networks. The concept of leader-based information processing is conducted to achieve object positioning, considering a cluster-based network topology. Random timers and local information are applied to adaptively select a sub-cluster for the localization task. The proposed energy-efficient tracking algorithm allows each sub-cluster member to locally estimate the target position with a Bayesian filtering framework and a neural networking model, and further performs estimation fusion in the leader node with the covariance intersection algorithm. This paper evaluates the merits and trade-offs of the protocol design towards developing more efficient and practical algorithms for object position estimation.
Star clusters: age, metallicity and extinction from integrated spectra
NASA Astrophysics Data System (ADS)
González Delgado, Rosa M.; Cid Fernandes, Roberto
2010-01-01
Integrated optical spectra of star clusters in the Magellanic Clouds and a few Galactic globular clusters are fitted using high-resolution spectral models for single stellar populations. The goal is to estimate the age, metallicity and extinction of the clusters, and evaluate the degeneracies among these parameters. Several sets of evolutionary models that were computed with recent high-spectral-resolution stellar libraries (MILES, GRANADA, STELIB), are used as inputs to the starlight code to perform the fits. The comparison of the results derived from this method and previous estimates available in the literature allow us to evaluate the pros and cons of each set of models to determine star cluster properties. In addition, we quantify the uncertainties associated with the age, metallicity and extinction determinations resulting from variance in the ingredients for the analysis.
High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.
For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less
Impact of a star formation efficiency profile on the evolution of open clusters
NASA Astrophysics Data System (ADS)
Shukirgaliyev, B.; Parmentier, G.; Berczik, P.; Just, A.
2017-09-01
Aims: We study the effect of the instantaneous expulsion of residual star-forming gas on star clusters in which the residual gas has a density profile that is shallower than that of the embedded cluster. This configuration is expected if star formation proceeds with a given star-formation efficiency per free-fall time in a centrally concentrated molecular gas clump. Methods: We performed direct N-body simulations whose initial conditions were generated by the program "mkhalo" from the package "falcON", adapted for our models. Our model clusters initially had a Plummer profile and are in virial equilibrium with the gravitational potential of the cluster-forming clump. The residual gas contribution was computed based on a local-density driven clustered star formation model. Our simulations included mass loss by stellar evolution and the tidal field of a host galaxy. Results: We find that a star cluster with a minimum global star formation efficiency (SFE) of 15 percent is able to survive instantaneous gas expulsion and to produce a bound cluster. Its violent relaxation lasts no longer than 20 Myr, independently of its global SFE and initial stellar mass. At the end of violent relaxation, the bound fractions of the surviving clusters with the same global SFEs are similar, regardless of their initial stellar mass. Their subsequent lifetime in the gravitational field of the Galaxy depends on their bound stellar masses. Conclusions: We therefore conclude that the critical SFE needed to produce a bound cluster is 15 percent, which is roughly half the earlier estimates of 33 percent. Thus we have improved the survival likelihood of young clusters after instantaneous gas expulsion. Young clusters can now survive instantaneous gas expulsion with a global SFEs as low as the SFEs observed for embedded clusters in the solar neighborhood (15-30 percent). The reason is that the star cluster density profile is steeper than that of the residual gas. However, in terms of the effective SFE, measured by the virial ratio of the cluster at gas expulsion, our results are in agreement with previous studies.
iGLASS: An Improvement to the GLASS Method for Estimating Species Trees from Gene Trees
Rosenberg, Noah A.
2012-01-01
Abstract Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree. PMID:22216756
Cosmology from galaxy clusters as observed by Planck
NASA Astrophysics Data System (ADS)
Pierpaoli, Elena
We propose to use current all-sky data on galaxy clusters in the radio/infrared bands in order to constrain cosmology. This will be achieved performing parameter estimation with number counts and power spectra for galaxy clusters detected by Planck through their Sunyaev—Zeldovich signature. The ultimate goal of this proposal is to use clusters as tracers of matter density in order to provide information about fundamental properties of our Universe, such as the law of gravity on large scale, early Universe phenomena, structure formation and the nature of dark matter and dark energy. We will leverage on the availability of a larger and deeper cluster catalog from the latest Planck data release in order to include, for the first time, the cluster power spectrum in the cosmological parameter determination analysis. Furthermore, we will extend clusters' analysis to cosmological models not yet investigated by the Planck collaboration. These aims require a diverse set of activities, ranging from the characterization of the clusters' selection function, the choice of the cosmological cluster sample to be used for parameter estimation, the construction of mock samples in the various cosmological models with correct correlation properties in order to produce reliable selection functions and noise covariance matrices, and finally the construction of the appropriate likelihood for number counts and power spectra. We plan to make the final code available to the community and compatible with the most widely used cosmological parameter estimation code. This research makes use of data from the NASA satellites Planck and, less directly, Chandra, in order to constrain cosmology; and therefore perfectly fits the NASA objectives and the specifications of this solicitation.
Two serendipitous low-mass LMC clusters discovered with HST1
NASA Astrophysics Data System (ADS)
Santiago, Basilio X.; Elson, Rebecca A. W.; Sigurdsson, Steinn; Gilmore, Gerard F.
1998-04-01
We present V and I photometry of two open clusters in the LMC down to V~26. The clusters were imaged with the Wide Field and Planetary Camera 2 (WFPC2) on board the Hubble Space Telescope (HST), as part of the Medium Deep Survey Key Project. Both are low-luminosity (M_V~-3.5), low-mass (M~10^3 Msolar) systems. The chance discovery of these two clusters in two parallel WFPC2 fields suggests a significant incompleteness in the LMC cluster census near the bar. One of the clusters is roughly elliptical and compact, with a steep light profile, a central surface brightness mu_V(0)~20.2 mag arcsec^-2, a half-light radius r_hl~0.9 pc (total visual major diameter D~3 pc) and an estimated mass M~1500 Msolar. From the colour-magnitude diagram and isochrone fits we estimate its age as tau~(2-5)x10^8 yr. Its mass function has a fitted slope of Gamma=Deltalogphi(M)/DeltalogM=-1.8+/-0.7 in the range probed (0.9<~M/Msolar<~4.5). The other cluster is more irregular and sparse, having shallower density and surface brightness profiles. We obtain Gamma=-1.2+/-0.4, and estimate its mass as M~400 Msolar. A derived upper limit for its age is tau<~5x10^8 yr. Both clusters have mass functions with slopes similar to that of R136, a massive LMC cluster, for which HST results indicate Gamma~-1.2. They also seem to be relaxed in their cores and well contained in their tidal radii.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Yuyu; Smith, Steven J.; Elvidge, Christopher
Accurate information of urban areas at regional and global scales is important for both the science and policy-making communities. The Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) nighttime stable light data (NTL) provide a potential way to map urban area and its dynamics economically and timely. In this study, we developed a cluster-based method to estimate the optimal thresholds and map urban extents from the DMSP/OLS NTL data in five major steps, including data preprocessing, urban cluster segmentation, logistic model development, threshold estimation, and urban extent delineation. Different from previous fixed threshold method with over- and under-estimation issues, in ourmore » method the optimal thresholds are estimated based on cluster size and overall nightlight magnitude in the cluster, and they vary with clusters. Two large countries of United States and China with different urbanization patterns were selected to map urban extents using the proposed method. The result indicates that the urbanized area occupies about 2% of total land area in the US ranging from lower than 0.5% to higher than 10% at the state level, and less than 1% in China, ranging from lower than 0.1% to about 5% at the province level with some municipalities as high as 10%. The derived thresholds and urban extents were evaluated using high-resolution land cover data at the cluster and regional levels. It was found that our method can map urban area in both countries efficiently and accurately. Compared to previous threshold techniques, our method reduces the over- and under-estimation issues, when mapping urban extent over a large area. More important, our method shows its potential to map global urban extents and temporal dynamics using the DMSP/OLS NTL data in a timely, cost-effective way.« less
Estimators for Clustered Education RCTs Using the Neyman Model for Causal Inference
ERIC Educational Resources Information Center
Schochet, Peter Z.
2013-01-01
This article examines the estimation of two-stage clustered designs for education randomized control trials (RCTs) using the nonparametric Neyman causal inference framework that underlies experiments. The key distinction between the considered causal models is whether potential treatment and control group outcomes are considered to be fixed for…
Estimates of the Internal Consistency of a Factorially Complex Composite.
ERIC Educational Resources Information Center
Benito, Juana Gomez
1989-01-01
This study of 852 subjects in Barcelona (Spain) between 4 and 9 years old estimated the degree of consistency among elements of the Borelli-Oleron Performance Scale by taking into account item clusters and subtest clusters. The internal consistency of the subtests rose when all ages were analyzed jointly. (SLD)
Ward, M.P.; Ramsay, B.H.; Gallo, K.
2005-01-01
Data from an outbreak (August to October, 2002) of West Nile virus (WNV) encephalomyelitis in a population of horses located in northern Indiana was scanned for clusters in time and space. One significant (p = 0.04) cluster of case premises was detected, occurring between September 4 and 10 in the south-west part of the study area (85.70??N, 45.50??W). It included 10 case premises (3.67 case premises expected) within a radius of 2264 m. Image data were acquired by the Advanced Very High Resolution Radiometer (AVHRR) sensor onboard a National Oceanic and Atmospheric Administration polar-orbiting satellite. The Normalized Difference Vegetation Index (NDVI) was calculated from visible and near-infrared data of daily observations, which were composited to produce a weekly-1km2 resolution raster image product. During the epidemic, a significant (p<0.01) decrease (0.025 per week) in estimated NDVI was observed at all case and control premise sites. The median estimated NDVI (0.659) for case premises within the cluster identified was significantly (p<0.01) greater than the median estimated NDVI for other case (0.571) and control (0.596) premises during the same period. The difference in median estimated NDVI for case premises within this cluster, compared to cases not included in this cluster, was greatest (5.3% and 5.1%, respectively) at 1 and 5 weeks preceding occurrence of the cluster. The NDVI may be useful for identifying foci of WNV transmission. ?? Mary Ann Liebert, Inc.
A Hierarchical Clustering Methodology for the Estimation of Toxicity
A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...
Viscous self interacting dark matter and cosmic acceleration
NASA Astrophysics Data System (ADS)
Atreya, Abhishek; Bhatt, Jitesh R.; Mishra, Arvind
2018-02-01
Self interacting dark matter (SIDM) provides us with a consistent solution to certain astrophysical observations in conflict with collision-less cold DM paradigm. In this work we estimate the shear viscosity (η) and bulk viscosity (ζ) of SIDM, within kinetic theory formalism, for galactic and cluster size SIDM halos. To that extent we make use of the recent constraints on SIDM cross-section for the dwarf galaxies, LSB galaxies and clusters. We also estimate the change in solution of Einstein's equation due to these viscous effects and find that σ/m constraints on SIDM from astrophysical data provide us with sufficient viscosity to account for the observed cosmic acceleration at present epoch, without the need of any additional dark energy component. Using the estimates of dark matter density for galactic and cluster size halo we find that the mean free path of dark matter ~ few Mpc. Thus the smallest scale at which the viscous effect start playing the role is cluster scale. Astrophysical data for dwarf, LSB galaxies and clusters also seems to suggest the same. The entire analysis is independent of any specific particle physics motivated model for SIDM.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Gurmeet; Nandi, Apurba; Gadre, Shridhar R., E-mail: gadre@iitk.ac.in
2016-03-14
A pragmatic method based on the molecular tailoring approach (MTA) for estimating the complete basis set (CBS) limit at Møller-Plesset second order perturbation (MP2) theory accurately for large molecular clusters with limited computational resources is developed. It is applied to water clusters, (H{sub 2}O){sub n} (n = 7, 8, 10, 16, 17, and 25) optimized employing aug-cc-pVDZ (aVDZ) basis-set. Binding energies (BEs) of these clusters are estimated at the MP2/aug-cc-pVNZ (aVNZ) [N = T, Q, and 5 (whenever possible)] levels of theory employing grafted MTA (GMTA) methodology and are found to lie within 0.2 kcal/mol of the corresponding full calculationmore » MP2 BE, wherever available. The results are extrapolated to CBS limit using a three point formula. The GMTA-MP2 calculations are feasible on off-the-shelf hardware and show around 50%–65% saving of computational time. The methodology has a potential for application to molecular clusters containing ∼100 atoms.« less
Estimating carbon cluster binding energies from measured Cn distributions, n <= 10
NASA Astrophysics Data System (ADS)
Pargellis, A. N.
1990-08-01
Experimental data are presented for the cluster distribution of sputtered negative carbon clusters, C-n, with n≤10. Additionally, clusters have been observed with masses indicating they are CsC-2n, with n≤4. The C-n data are compared with the data obtained by other groups, for neutral and charged clusters, using a variety of sources such as evaporation, sputtering, and laser ablation. The data are used to estimate the cluster binding energies En, using the universal relation, En=(n-1)ΔHn+RTe [ln(Jn/J1)+0.5 ln(n)-α-(ΔSn-ΔS1)/R], derived from basic kinetic and thermodynamic relations. The estimated values agree astonishingly well with values from the literature, varying from published values by at most a few percent. In this equation, Jn is the observed current of n-atom clusters, ΔHn is the heat of vaporization, ΔH1=7.41 eV, and Te ≊0.25 eV (2900 K) is the effective source temperature. The relative change in cluster entropy during sublimation from the solid to vapor phase is approximated to first order by the relation (ΔSn-ΔS1)/R =3.1+0.9(n-2), and is fit to published data for n between 2 and 5 and temperatures between 2000 and 4000 K. The parameter α is empirical, obtained by fitting the data to known binding energies for Cn≤5 clusters. For evaporation sources, α must be zero, but α˜7 when sputtering with Cs+ ions, indicating the sputtered clusters appear to be in thermodynamic equilibrium, but not the atoms. Several possible mechanisms for the formation of clusters during sputtering are examined. One plausible mechanism is that atoms diffuse on the graphite surface to form clusters which are then desorbed by energetic, recoil atoms created in subsequent sputtering events.
Jongenburger, I; Reij, M W; Boer, E P J; Gorris, L G M; Zwietering, M H
2011-11-15
The actual spatial distribution of microorganisms within a batch of food influences the results of sampling for microbiological testing when this distribution is non-homogeneous. In the case of pathogens being non-homogeneously distributed, it markedly influences public health risk. This study investigated the spatial distribution of Cronobacter spp. in powdered infant formula (PIF) on industrial batch-scale for both a recalled batch as well a reference batch. Additionally, local spatial occurrence of clusters of Cronobacter cells was assessed, as well as the performance of typical sampling strategies to determine the presence of the microorganisms. The concentration of Cronobacter spp. was assessed in the course of the filling time of each batch, by taking samples of 333 g using the most probable number (MPN) enrichment technique. The occurrence of clusters of Cronobacter spp. cells was investigated by plate counting. From the recalled batch, 415 MPN samples were drawn. The expected heterogeneous distribution of Cronobacter spp. could be quantified from these samples, which showed no detectable level (detection limit of -2.52 log CFU/g) in 58% of samples, whilst in the remainder concentrations were found to be between -2.52 and 2.75 log CFU/g. The estimated average concentration in the recalled batch was -2.78 log CFU/g and a standard deviation of 1.10 log CFU/g. The estimated average concentration in the reference batch was -4.41 log CFU/g, with 99% of the 93 samples being below the detection limit. In the recalled batch, clusters of cells occurred sporadically in 8 out of 2290 samples of 1g taken. The two largest clusters contained 123 (2.09 log CFU/g) and 560 (2.75 log CFU/g) cells. Various sampling strategies were evaluated for the recalled batch. Taking more and smaller samples and keeping the total sampling weight constant, considerably improved the performance of the sampling plans to detect such a type of contaminated batch. Compared to random sampling, stratified random sampling improved the probability to detect the heterogeneous contamination. Copyright © 2011 Elsevier B.V. All rights reserved.
Explore the Impacts of River Flow and Water Quality on Fish Communities
NASA Astrophysics Data System (ADS)
Tsai, W. P.; Chang, F. J.; Lin, C. Y.; Hu, J. H.; Yu, C. J.; Chu, T. J.
2015-12-01
Owing to the limitation of geographical environment in Taiwan, the uneven temporal and spatial distribution of rainfall would cause significant impacts on river ecosystems. To pursue sustainable water resources development, integrity and rationality is important to water management planning. The water quality and the flow regimes of rivers are closely related to each other and affect river ecosystems simultaneously. Therefore, this study collects long-term observational heterogeneity data, which includes water quality parameters, stream flow and fish species in the Danshui River of norther Taiwan, and aims to explore the complex impacts of water quality and flow regime on fish communities in order to comprehend the situations of the eco-hydrological system in this river basin. First, this study improves the understanding of the relationship between water quality parameters, flow regime and fish species by using artificial neural networks (ANNs). The Self-organizing feature map (SOM) is an unsupervised learning process used to cluster, analyze and visualize a large number of data. The results of SOM show that nine clusters (3x3) forms the optimum map size based on the local minimum values of both quantization error (QE) and topographic error (TE). Second, the fish diversity indexes are estimated by using the Adapted network-based fuzzy inference system (ANFIS) based on key input factors determined by the Gamma Test (GT), which is a useful tool for reducing model dimension and the structure complexity of ANNs. The result reveals that the constructed models can effectively estimate fish diversity indexes and produce good estimation performance based on the 9 clusters identified by the SOM, in which RMSE is 0.18 and CE is 0.84 for the training data set while RMSE is 0.20 and CE is 0.80 for the testing data set.
Mortality following the Haitian earthquake of 2010: a stratified cluster survey
2013-01-01
Introduction Research that seeks to better understand vulnerability to earthquakes and risk factors associated with mortality in low resource settings is critical to earthquake preparedness and response efforts. This study aims to characterize mortality and associated risk factors in the 2010 Haitian earthquake. Methods In January 2011, a survey of the earthquake affected Haitian population was conducted in metropolitan Port-au-Prince. A stratified 60x20 cluster design (n = 1200 households) was used with 30 clusters sampled in both camp and neighborhood locations. Households were surveyed regarding earthquake impact, current living conditions, and unmet needs. Results Mortality was estimated at 24 deaths (confidence interval [CI]: 20–28) per 1,000 in the sample population. Using two approaches, extrapolation of the survey mortality rate to the exposed population yielded mortality estimates ranging from a low of 49,033 to a high of 86,555. No significant difference in mortality was observed by sex (p = .786); however, age was significant with adults age 50+ years facing increased mortality risk. Odds of death were not significantly higher in camps, with 27 deaths per 1,000 (CI: 22–34), compared to neighborhoods, where the death rate was 19 per 1,000 (CI: 15–25; p = 0.080). Crowding and residence in a multistory building were also associated with increased risk of death. Conclusions Haiti earthquake mortality estimates are widely varied, though epidemiologic surveys conducted to date suggest lower levels of mortality than officially reported figures. Strategies to mitigate future mortality burden in future earthquakes should consider improvements to the built environment that are feasible in urban resource-poor settings. PMID:23618373
Kaliappan, Saravanakumar Puthupalayam; George, Santosh; Francis, Mark Rohit; Kattula, Deepthi; Sarkar, Rajiv; Minz, Shantidani; Mohan, Venkata Raghava; George, Kuryan; Roy, Sheela; Ajjampur, Sitara Swarna Rao; Muliyil, Jayaprakash; Kang, Gagandeep
2013-12-01
To estimate the prevalence, spatial patterns and clustering in the distribution of soil-transmitted helminth (STH) infections, and factors associated with hookworm infections in a tribal population in Tamil Nadu, India. Cross-sectional study with one-stage cluster sampling of 22 clusters. Demographic and risk factor data and stool samples for microscopic ova/cysts examination were collected from 1237 participants. Geographical information systems mapping assessed spatial patterns of infection. The overall prevalence of STH was 39% (95% CI 36%–42%), with hookworm 38% (95% CI 35–41%) and Ascaris lumbricoides 1.5% (95% CI 0.8–2.2%). No Trichuris trichiura infection was detected. People involved in farming had higher odds of hookworm infection (1.68, 95% CI 1.31–2.17, P < 0.001). In the multiple logistic regression, adults (2.31, 95% CI 1.80–2.96, P < 0.001), people with pet cats (1.55, 95% CI 1.10–2.18, P = 0.011) and people who did not wash their hands with soap after defecation (1.84, 95% CI 1.27–2.67, P = 0.001) had higher odds of hookworm infection, but gender and poor usage of foot wear did not significantly increase risk. Cluster analysis, based on design effect calculation, did not show any clustering of cases among the study population; however, spatial scan statistic detected a significant cluster for hookworm infections in one village. Multiple approaches including health education, improving the existing sanitary practices and regular preventive chemotherapy are needed to control the burden of STH in similar endemic areas.
Sauzet, Odile; Peacock, Janet L
2017-07-20
The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.
Diffuse radio emission in the complex merging galaxy cluster Abell2069
NASA Astrophysics Data System (ADS)
Drabent, A.; Hoeft, M.; Pizzo, R. F.; Bonafede, A.; van Weeren, R. J.; Klein, U.
2015-03-01
Context. Galaxy clusters with signs of a recent merger in many cases show extended diffuse radio features. This emission originates from relativistic electrons that suffer synchrotron losses due to the intracluster magnetic field. The mechanisms of particle acceleration and the properties of the magnetic field are still poorly understood. Aims: We search for diffuse radio emission in galaxy clusters. Here, we study the complex galaxy cluster Abell 2069, for which X-ray observations indicate a recent merger. Methods: We investigate the cluster's radio continuum emission by deep Westerbork Synthesis Radio Telescope (WSRT) observations at 346 MHz and Giant Metrewave Radio Telescope (GMRT) observations at 322 MHz. Results: We find an extended diffuse radio feature roughly coinciding with the main component of the cluster. We classify this emission as a radio halo and estimate its lower limit flux density at 25 ± 9 mJy. Moreover, we find a second extended diffuse source located at the cluster's companion and estimate its flux density at 15 ± 2 mJy. We speculate that this is a small halo or a mini-halo. If true, this cluster is the first example of a double-halo in a single galaxy cluster.
Rates of collapse and evaporation of globular clusters
NASA Technical Reports Server (NTRS)
Hut, Piet; Djorgovski, S.
1992-01-01
Observational estimates of the dynamical relaxation times of Galactic globular clusters are used here to estimate the present rate at which core collapse and evaporation are occurring in them. A core collapse rate of 2 +/- 1 per Gyr is found, which for a Galactic age of about 12 Gyr agrees well with the fact that 27 clusters have surface brightness profiles with the morphology expected for the postcollapse phase. A destruction and evaporation rate of 5 +/- 3 per Gyr is found, suggesting that a significant fraction of the Galaxy's original complement of globular clusters have perished through the combined effects of mechanisms such as relaxation-driven evaporation and shocking due to interaction with the Galactic disk and bulge.
Borchers, D L; Langrock, R
2015-12-01
We develop maximum likelihood methods for line transect surveys in which animals go undetected at distance zero, either because they are stochastically unavailable while within view or because they are missed when they are available. These incorporate a Markov-modulated Poisson process model for animal availability, allowing more clustered availability events than is possible with Poisson availability models. They include a mark-recapture component arising from the independent-observer survey, leading to more accurate estimation of detection probability given availability. We develop models for situations in which (a) multiple detections of the same individual are possible and (b) some or all of the availability process parameters are estimated from the line transect survey itself, rather than from independent data. We investigate estimator performance by simulation, and compare the multiple-detection estimators with estimators that use only initial detections of individuals, and with a single-observer estimator. Simultaneous estimation of detection function parameters and availability model parameters is shown to be feasible from the line transect survey alone with multiple detections and double-observer data but not with single-observer data. Recording multiple detections of individuals improves estimator precision substantially when estimating the availability model parameters from survey data, and we recommend that these data be gathered. We apply the methods to estimate detection probability from a double-observer survey of North Atlantic minke whales, and find that double-observer data greatly improve estimator precision here too. © 2015 The Authors Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society.
New Cepheid variables in the young open clusters Berkeley 51 and Berkeley 55
NASA Astrophysics Data System (ADS)
Lohr, M. E.; Negueruela, I.; Tabernero, H. M.; Clark, J. S.; Lewis, F.; Roche, P.
2018-05-01
As part of a wider investigation of evolved massive stars in Galactic open clusters, we have spectroscopically identified three candidate classical Cepheids in the little-studied clusters Berkeley 51, Berkeley 55 and NGC 6603. Using new multi-epoch photometry, we confirm that Be 51 #162 and Be 55 #107 are bona fide Cepheids, with pulsation periods of 9.83±0.01 d and 5.850±0.005 d respectively, while NGC 6603 star W2249 does not show significant photometric variability. Using the period-luminosity relationship for Cepheid variables, we determine a distance to Be 51 of 5.3^{+1.0}_{-0.8} kpc and an age of 44^{+9}_{-8} Myr, placing it in a sparsely-attested region of the Perseus arm. For Be 55, we find a distance of 2.2±0.3 kpc and age of 63^{+12}_{-11} Myr, locating the cluster in the Local arm. Taken together with our recent discovery of a long-period Cepheid in the starburst cluster VdBH222, these represent an important increase in the number of young, massive Cepheids known in Galactic open clusters. We also consider new Gaia (data release 2) parallaxes and proper motions for members of Be 51 and Be 55; the uncertainties on the parallaxes do not allow us to refine our distance estimates to these clusters, but the well-constrained proper motion measurements furnish further confirmation of cluster membership. However, future final Gaia parallaxes for such objects should provide valuable independent distance measurements, improving the calibration of the period-luminosity relationship, with implications for the distance ladder out to cosmological scales.
Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo
2017-03-15
Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Phiri, Sam; Tweya, Hannock; van Lettow, Monique; Rosenberg, Nora E; Trapence, Clement; Kapito-Tembo, Atupele; Kaunda-Khangamwa, Blessings; Kasende, Florence; Kayoyo, Virginia; Cataldo, Fabian; Stanley, Christopher; Gugsa, Salem; Sampathkumar, Veena; Schouten, Erik; Chiwaula, Levison; Eliya, Michael; Chimbwandira, Frank; Hosseinipour, Mina C
2017-06-01
Many sub-Saharan African countries have adopted Option B+, a prevention of mother-to-child transmission approach providing HIV-infected pregnant and lactating women with immediate lifelong antiretroviral therapy. High maternal attrition has been observed in Option B+. Peer-based support may improve retention. A 3-arm stratified cluster randomized controlled trial was conducted in Malawi to assess whether facility- and community-based peer support would improve Option B+ uptake and retention compared with standard of care (SOC). In SOC, no enhancements were made (control). In facility-based and community-based models, peers provided patient education, support groups, and patient tracing. Uptake was defined as attending a second scheduled follow-up visit. Retention was defined as being alive and in-care at 2 years without defaulting. Attrition was defined as death, default, or stopping antiretroviral therapy. Generalized estimating equations were used to estimate risk differences (RDs) in uptake. Cox proportional hazards regression with shared frailties was used to estimate hazard of attrition. Twenty-one facilities were randomized and enrolled 1269 women: 447, 428, and 394 in facilities that implemented SOC, facility-based, and community-based peer support models, respectively. Mean age was 27 years. Uptake was higher in facility-based (86%; RD: 6%, confidence interval [CI]: -3% to 15%) and community-based (90%; RD: 9%, CI: 1% to 18%) models compared with SOC (81%). At 24 months, retention was higher in facility-based (80%; RD: 13%, CI: 1% to 26%) and community-based (83%; RD: 16%, CI: 3% to 30%) models compared with SOC (66%). Facility- and community-based peer support interventions can benefit maternal uptake and retention in Option B+.
Estimation of the prevalence of adverse drug reactions from social media.
Nguyen, Thin; Larsen, Mark E; O'Dea, Bridianne; Phung, Dinh; Venkatesh, Svetha; Christensen, Helen
2017-06-01
This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale. Copyright © 2017 Elsevier B.V. All rights reserved.
Typology of patients with fibromyalgia: cluster analysis of duloxetine study patients.
Lipkovich, Ilya A; Choy, Ernest H; Van Wambeke, Peter; Deberdt, Walter; Sagman, Doron
2014-12-23
To identify distinct groups of patients with fibromyalgia (FM) with respect to multiple outcome measures. Data from 631 duloxetine-treated women in 4 randomized, placebo-controlled trials were included in a cluster analysis based on outcomes after up to 12 weeks of treatment. Corresponding classification rules were constructed using a classification tree method. Probabilities for transitioning from baseline to Week 12 category were estimated for placebo and duloxetine patients (Ntotal = 1188) using logistic regression. Five clusters were identified, from "worst" (high pain levels and severe mental/physical impairment) to "best" (low pain levels and nearly normal mental/physical function). For patients with moderate overall severity, mental and physical symptoms were less correlated, resulting in 2 distinct clusters based on these 2 symptom domains. Three key variables with threshold values were identified for classification of patients: Brief Pain Inventory (BPI) pain interference overall scores of <3.29 and <7.14, respectively, a Fibromyalgia Impact Questionnaire (FIQ) interference with work score of <2, and an FIQ depression score of ≥5. Patient characteristics and frequencies per baseline category were similar between treatments; >80% of patients were in the 3 worst categories. Duloxetine patients were significantly more likely to improve after 12 weeks than placebo patients. A sustained effect was seen with continued duloxetine treatment. FM patients are heterogeneous and can be classified into distinct subgroups by simple descriptive rules derived from only 3 variables, which may guide individual patient management. Duloxetine showed higher improvement rates than placebo and had a sustained effect beyond 12 weeks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Applegate, D. E; Mantz, A.; Allen, S. W.
This is the fourth in a series of papers studying the astrophysics and cosmology of massive, dynamically relaxed galaxy clusters. Here, we use measurements of weak gravitational lensing from the Weighing the Giants project to calibrate Chandra X-ray measurements of total mass that rely on the assumption of hydrostatic equilibrium. This comparison of X-ray and lensing masses measures the combined bias of X-ray hydrostatic masses from both astrophysical and instrumental sources. While we cannot disentangle the two sources of bias, only the combined bias is relevant for calibrating cosmological measurements using relaxed clusters. Assuming a fixed cosmology, and within amore » characteristic radius (r 2500) determined from the X-ray data, we measure a lensing to X-ray mass ratio of 0.96 ± 9% (stat) ± 9% (sys). We find no significant trends of this ratio with mass, redshift or the morphological indicators used to select the sample. Our results imply that any departures from hydrostatic equilibrium at these radii are offset by calibration errors of comparable magnitude, with large departures of tens-of-percent unlikely. In addition, we find a mean concentration of the sample measured from lensing data of c 200 = 3.0 +4.4 –1.8. In conclusion, anticipated short-term improvements in lensing systematics, and a modest expansion of the relaxed lensing sample, can easily increase the measurement precision by 30–50%, leading to similar improvements in cosmological constraints that employ X-ray hydrostatic mass estimates, such as on Ω m from the cluster gas mass fraction.« less
Applegate, D. E; Mantz, A.; Allen, S. W.; ...
2016-02-04
This is the fourth in a series of papers studying the astrophysics and cosmology of massive, dynamically relaxed galaxy clusters. Here, we use measurements of weak gravitational lensing from the Weighing the Giants project to calibrate Chandra X-ray measurements of total mass that rely on the assumption of hydrostatic equilibrium. This comparison of X-ray and lensing masses measures the combined bias of X-ray hydrostatic masses from both astrophysical and instrumental sources. While we cannot disentangle the two sources of bias, only the combined bias is relevant for calibrating cosmological measurements using relaxed clusters. Assuming a fixed cosmology, and within amore » characteristic radius (r 2500) determined from the X-ray data, we measure a lensing to X-ray mass ratio of 0.96 ± 9% (stat) ± 9% (sys). We find no significant trends of this ratio with mass, redshift or the morphological indicators used to select the sample. Our results imply that any departures from hydrostatic equilibrium at these radii are offset by calibration errors of comparable magnitude, with large departures of tens-of-percent unlikely. In addition, we find a mean concentration of the sample measured from lensing data of c 200 = 3.0 +4.4 –1.8. In conclusion, anticipated short-term improvements in lensing systematics, and a modest expansion of the relaxed lensing sample, can easily increase the measurement precision by 30–50%, leading to similar improvements in cosmological constraints that employ X-ray hydrostatic mass estimates, such as on Ω m from the cluster gas mass fraction.« less
NASA Astrophysics Data System (ADS)
Applegate, D. E.; Mantz, A.; Allen, S. W.; von der Linden, A.; Morris, R. Glenn; Hilbert, S.; Kelly, Patrick L.; Burke, D. L.; Ebeling, H.; Rapetti, D. A.; Schmidt, R. W.
2016-04-01
This is the fourth in a series of papers studying the astrophysics and cosmology of massive, dynamically relaxed galaxy clusters. Here, we use measurements of weak gravitational lensing from the Weighing the Giants project to calibrate Chandra X-ray measurements of total mass that rely on the assumption of hydrostatic equilibrium. This comparison of X-ray and lensing masses measures the combined bias of X-ray hydrostatic masses from both astrophysical and instrumental sources. While we cannot disentangle the two sources of bias, only the combined bias is relevant for calibrating cosmological measurements using relaxed clusters. Assuming a fixed cosmology, and within a characteristic radius (r2500) determined from the X-ray data, we measure a lensing to X-ray mass ratio of 0.96 ± 9 per cent (stat) ± 9 per cent (sys). We find no significant trends of this ratio with mass, redshift or the morphological indicators used to select the sample. Our results imply that any departures from hydrostatic equilibrium at these radii are offset by calibration errors of comparable magnitude, with large departures of tens-of-percent unlikely. In addition, we find a mean concentration of the sample measured from lensing data of c_{200} = 3.0_{-1.8}^{+4.4}. Anticipated short-term improvements in lensing systematics, and a modest expansion of the relaxed lensing sample, can easily increase the measurement precision by 30-50 per cent, leading to similar improvements in cosmological constraints that employ X-ray hydrostatic mass estimates, such as on Ωm from the cluster gas mass fraction.
An improved method to detect correct protein folds using partial clustering.
Zhou, Jianjun; Wishart, David S
2013-01-16
Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.
An improved method to detect correct protein folds using partial clustering
2013-01-01
Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835
Accurate estimations of electromagnetic transitions of Sn IV for stellar and interstellar media
NASA Astrophysics Data System (ADS)
Biswas, Swapan; Das, Arghya; Bhowmik, Anal; Majumder, Sonjoy
2018-04-01
Here we report on accurate ab initio calculations to study astrophysically important electromagnetic transition parameters among different low-lying states of Sn IV. Our ab initio calculations are based on the sophisticated relativistic coupled-cluster theory, which almost exhausts many important electron correlations. To establish the accuracy of the calculations, we compare our results with the available experiments and estimates the transition amplitudes in length and velocity gauged forms. Most of these allowed and forbidden transition wavelengths lie in the infrared region, and they can be observed in the different cool stellar and interstellar media. For the improvement of uncertainty, we use experimental energies to the estimations of the above transition parameters. The presented data will be helpful to find the abundances of the ion in different astrophysical and laboratory plasma.
Accurate estimations of electromagnetic transitions of Sn IV for stellar and interstellar media
NASA Astrophysics Data System (ADS)
Biswas, Swapan; Das, Arghya; Bhowmik, Anal; Majumder, Sonjoy
2018-07-01
Here, we report on accurate ab initio calculations to study astrophysically important electromagnetic transition parameters among different low-lying states of Sn IV. Our ab initio calculations are based on the sophisticated relativistic coupled cluster theory, which almost exhausts many important electron correlations. To establish the accuracy of the calculations, we compare our results with the available experiments and estimate the transition amplitudes in length and velocity gauged forms. Most of these allowed and forbidden transition wavelengths lie in the infrared region, and they can be observed in the different cool stellar and interstellar media. For the improvement of uncertainty, we use experimental energies to the estimations of the above transition parameters. The presented data will be helpful to find the abundances of the ion in different astrophysical and laboratory plasma.
Rain volume estimation over areas using satellite and radar data
NASA Technical Reports Server (NTRS)
Doneaud, A. A.; Vonderhaar, T. H.
1985-01-01
The feasibility of rain volume estimation over fixed and floating areas was investigated using rapid scan satellite data following a technique recently developed with radar data, called the Area Time Integral (ATI) technique. The radar and rapid scan GOES satellite data were collected during the Cooperative Convective Precipitation Experiment (CCOPE) and North Dakota Cloud Modification Project (NDCMP). Six multicell clusters and cells were analyzed to the present time. A two-cycle oscillation emphasizing the multicell character of the clusters is demonstrated. Three clusters were selected on each day, 12 June and 2 July. The 12 June clusters occurred during the daytime, while the 2 July clusters during the nighttime. A total of 86 time steps of radar and 79 time steps of satellite images were analyzed. There were approximately 12-min time intervals between radar scans on the average.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vlcek, Lukas; Chialvo, Ariel; Simonson, J Michael
2013-01-01
Molecular models and experimental estimates based on the cluster pair approximation (CPA) provide inconsistent predictions of absolute single-ion hydration properties. To understand the origin of this discrepancy we used molecular simulations to study the transition between hydration of alkali metal and halide ions in small aqueous clusters and bulk water. The results demonstrate that the assumptions underlying the CPA are not generally valid as a result of a significant shift in the ion hydration free energies (~15 kJ/mol) and enthalpies (~47 kJ/mol) in the intermediate range of cluster sizes. When this effect is accounted for, the systematic differences between modelsmore » and experimental predictions disappear, and the value of absolute proton hydration enthalpy based on the CPA gets in closer agreement with other estimates.« less
Kappa statistic for the clustered dichotomous responses from physicians and patients
Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L.; Cai, Jianwen
2013-01-01
The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared to the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. An example of an application to a coronary heart disease prevention study is presented. PMID:23533082
Nitrogen nucleation in a cryogenic supersonic nozzle
NASA Astrophysics Data System (ADS)
Bhabhe, Ashutosh; Wyslouzil, Barbara
2011-12-01
We follow the vapor-liquid phase transition of N2 in a cryogenic supersonic nozzle apparatus using static pressure measurements. Under our operating conditions, condensation always occurs well below the triple point. Mean field kinetic nucleation theory (MKNT) does a better job of predicting the conditions corresponding to the estimated maximum nucleation rates, Jmax = 1017±1 cm-3 s-1, than two variants of classical nucleation theory. Combining the current results with the nucleation pulse chamber measurements of Iland et al. [J. Chem. Phys. 130, 114508-1 (2009)], we use nucleation theorems to estimate the critical cluster properties. Both the theories overestimate the size of the critical cluster, but MKNT does a good job of estimating the excess internal energy of the clusters.
Impact of missing data imputation methods on gene expression clustering and classification.
de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G
2015-02-26
Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .
Xue, Zhong; Shen, Dinggang; Li, Hai; Wong, Stephen
2010-01-01
The traditional fuzzy clustering algorithm and its extensions have been successfully applied in medical image segmentation. However, because of the variability of tissues and anatomical structures, the clustering results might be biased by the tissue population and intensity differences. For example, clustering-based algorithms tend to over-segment white matter tissues of MR brain images. To solve this problem, we introduce a tissue probability map constrained clustering algorithm and apply it to serial MR brain image segmentation, i.e., a series of 3-D MR brain images of the same subject at different time points. Using the new serial image segmentation algorithm in the framework of the CLASSIC framework, which iteratively segments the images and estimates the longitudinal deformations, we improved both accuracy and robustness for serial image computing, and at the mean time produced longitudinally consistent segmentation and stable measures. In the algorithm, the tissue probability maps consist of both the population-based and subject-specific segmentation priors. Experimental study using both simulated longitudinal MR brain data and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data confirmed that using both priors more accurate and robust segmentation results can be obtained. The proposed algorithm can be applied in longitudinal follow up studies of MR brain imaging with subtle morphological changes for neurological disorders. PMID:26566399
Gullo, Sara; Galavotti, Christine; Sebert Kuhlmann, Anne; Msiska, Thumbiko; Hastings, Phil; Marti, C Nathan
2017-01-01
Social accountability approaches, which emphasize mutual responsibility and accountability by community members, health care workers, and local health officials for improving health outcomes in the community, are increasingly being employed in low-resource settings. We evaluated the effects of a social accountability approach, CARE's Community Score Card (CSC), on reproductive health outcomes in Ntcheu district, Malawi using a cluster-randomized control design. We matched 10 pairs of communities, randomly assigning one from each pair to intervention and control arms. We conducted two independent cross-sectional surveys of women who had given birth in the last 12 months, at baseline and at two years post-baseline. Using difference-in-difference (DiD) and local average treatment effect (LATE) estimates, we evaluated the effects on outcomes including modern contraceptive use, antenatal and postnatal care service utilization, and service satisfaction. We also evaluated changes in indicators developed by community members and service providers in the intervention areas. DiD analyses showed significantly greater improvements in the proportion of women receiving a home visit during pregnancy (B = 0.20, P < .01), receiving a postnatal visit (B = 0.06, P = .01), and overall service satisfaction (B = 0.16, P < .001) in intervention compared to control areas. LATE analyses estimated significant effects of the CSC intervention on home visits by health workers (114% higher in intervention compared to control) (B = 1.14, P < .001) and current use of modern contraceptives (57% higher) (B = 0.57, P < .01). All 13 community- and provider-developed indicators improved, with 6 of them showing significant improvements. By facilitating the relationship between community members, health service providers, and local government officials, the CSC contributed to important improvements in reproductive health-related outcomes. Further, the CSC builds mutual accountability, and ensures that solutions to problems are locally-relevant, locally-supported and feasible to implement.
Jeon, Jihyoun; Hsu, Li; Gorfine, Malka
2012-07-01
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.
NASA Astrophysics Data System (ADS)
Galleti, S.; Bellazzini, M.; Buzzoni, A.; Federici, L.; Fusi Pecci, F.
2009-12-01
Aims. We present a new homogeneous set of metallicity estimates based on Lick indices for the old globular clusters of the M 31 galaxy. The final aim is to add homogeneous spectroscopic metallicities to as many entries as possible of the Revised Bologna Catalog of M 31 clusters, by reporting Lick index measurements from any source (literature, new observations, etc.) on the same scale. Methods: New empirical relations of [Fe/H] as a function of [MgFe] and Mg2 indices are based on the well-studied galactic globular clusters, complemented with theoretical model predictions for -0.2≤ [Fe/H]≤ +0.5. Lick indices for M 31 clusters from various literature sources (225 clusters) and from new observations by our team (71 clusters) have been transformed into the Trager et al. system, yielding new metallicity estimates for 245 globular clusters of M 31. Results: Our values are in good agreement with recent estimates based on detailed spectral fitting and with those obtained from color magnitude diagrams of clusters imaged with the Hubble Space Telescope. The typical uncertainty on individual estimates is ≃±0.25 dex, as resulted from the comparison with metallicities derived from color magnitude diagrams of individual clusters. Conclusions: The metallicity distribution of M 31 globular cluster is briefly discussed and compared with that of the Milky Way. Simple parametric statistical tests suggest that the distribution is probably not unimodal. The strong correlation between metallicity and kinematics found in previous studies is confirmed. The most metal-rich GCs tend to be packed into the center of the system and to cluster tightly around the galactic rotation curve defined by the HI disk, while the velocity dispersion about the curve increases with decreasing metallicity. However, also the clusters with [Fe/H]<-1.0 display a clear rotation pattern, at odds with their Milky Way counterparts. Based on observations made at La Palma, at the Spanish Observatorio del Roque de los Muchachos of the IAC, with the William Herschel Telescope of the Isaac Newton Group and with the Italian Telescopio Nazionale Galileo (TNG) operated by the Fundación Galileo Galilei of INAF. Also based on observations made with the G.B. Cassini Telescope at Loiano (Italy), operated by the Osservatorio Astronomico di Bologna (INAF). Appendices are only available in electronic form at http://www.aanda.org
LOW-METALLICITY YOUNG CLUSTERS IN THE OUTER GALAXY. II. SH 2-208
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yasui, Chikako; Kobayashi, Naoto; Izumi, Natsuko
We obtained deep near-infrared images of Sh 2-208, one of the lowest-metallicity H ii regions in the Galaxy, [O/H] = −0.8 dex. We detected a young cluster in the center of the H ii region with a limiting magnitude of K = 18.0 mag (10 σ ), which corresponds to a mass detection limit of ∼0.2 M {sub ⊙}. This enables the comparison of star-forming properties under low metallicity with those of the solar neighborhood. We identified 89 cluster members. From the fitting of the K -band luminosity function (KLF), the age and distance of the cluster are estimated to be ∼0.5more » Myr and ∼4 kpc, respectively. The estimated young age is consistent with the detection of strong CO emission in the cluster region and the estimated large extinction of cluster members ( A{sub V} ∼ 4–25 mag). The observed KLF suggests that the underlying initial mass function (IMF) of the low-metallicity cluster is not significantly different from canonical IMFs in the solar neighborhood in terms of both high-mass slope and IMF peak (characteristic mass). Despite the very young age, the disk fraction of the cluster is estimated at only 27% ± 6%, which is significantly lower than those in the solar metallicity. Those results are similar to Sh 2-207, which is another star-forming region close to Sh 2-208 with a separation of 12 pc, suggesting that their star-forming activities in low-metallicity environments are essentially identical to those in the solar neighborhood, except for the disk dispersal timescale. From large-scale mid-infrared images, we suggest that sequential star formation is taking place in Sh 2-207, Sh 2-208, and the surrounding region, triggered by an expanding bubble with a ∼30 pc radius.« less
Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo
2015-02-01
Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.
The most metal-poor Galactic globular cluster: the first spectroscopic observations of ESO280-SC06
NASA Astrophysics Data System (ADS)
Simpson, Jeffrey D.
2018-07-01
We present the first spectroscopic observations of the very metal-poor Milky Way globular cluster ESO280-SC06. Using spectra acquired with the 2dF/AAOmega spectrograph on the Anglo-Australian Telescope, we have identified 13 members of the cluster, and estimate from their infrared calcium triplet lines that the cluster has a metallicity of [Fe/H]=-2.48^{+0.06 }_{ -0.11}. This would make it the most metal-poor globular cluster known in the Milky Way. This result was verified with comparisons to three other metal-poor globular clusters that had been observed and analysed in the same manner. We also present new photometry of the cluster from EFOSC2 and SkyMapper and confirm that the cluster is located 22.9 ± 2.1 kpc from the Sun and 15.2 ± 2.1 kpc from the Galactic Centre, and has a radial velocity of 92.5^{+2.4 }_{ -1.6} km s-1. These new data finds the cluster to have a radius about half that previously estimated, and we find that the cluster has a dynamical mass of the cluster of (12 ± 2) × 103 M⊙. Unfortunately, we lack reliable proper motions to fully characterize its orbit about the Galaxy. Intriguingly, the photometry suggests that the cluster lacks a well-populated horizontal branch, something that has not been observed in a cluster so ancient or metal poor.
The most metal-poor Galactic globular cluster: the first spectroscopic observations of ESO280-SC06
NASA Astrophysics Data System (ADS)
Simpson, Jeffrey D.
2018-04-01
We present the first spectroscopic observations of the very metal-poor Milky Way globular cluster ESO280-SC06. Using spectra acquired with the 2dF/AAOmega spectrograph on the Anglo-Australian Telescope, we have identified 13 members of the cluster, and estimate from their infrared calcium triplet lines that the cluster has a metallicity of [Fe/H]={-2.48}^{+0.06}_{-0.11}. This would make it the most metal-poor globular cluster known in the Milky Way. This result was verified with comparisons to three other metal-poor globular clusters that had been observed and analyzed in the same manner. We also present new photometry of the cluster from EFOSC2 and SkyMapper and confirm that the cluster is located 22.9 ± 2.1 kpc from the Sun and 15.2 ± 2.1 kpc from the Galactic centre, and has a radial velocity of 92.5 + 2.4-1.6 km s-1. These new data finds the cluster to have a radius about half that previously estimated, and we find that the cluster has a dynamical mass of the cluster of (12 ± 2) × 103 M⊙. Unfortunately, we lack reliable proper motions to fully characterize its orbit about the Galaxy. Intriguingly, the photometry suggests that the cluster lacks a well-populated horizontal branch, something that has not been observed in a cluster so ancient or metal-poor.
Cosmological study with galaxy clusters detected by the Sunyaev-Zel'dovich effect
NASA Astrophysics Data System (ADS)
Mak, Suet-Ying
In this work, we present various studies to forecast the power of the galaxy clusters detected by the Sunyaev-Zel'dovich (SZ) effect in constraining cosmological models. The SZ effect is regarded as one of the new and promising technique to identify and study cluster physics. With the latest data being released in recent years from the SZ telescopes, it is essential to explore their potentials in providing cosmological information and investigate their relative strengths with respect to galaxy cluster data from X-ray and optical, as well as other cosmological probes such as Cosmic Microwave Background (CMB). One of the topics regard resolving the debate on the existence of an anomalous large scale bulk flow as measured from the kinetic SZ signal of galaxy clusters in the WMAP CMB data. We predict that if such measurement is done with the latest CMB data from the Planck satellite, the sensitivity will be improved by a factor of >5 and thus be able to provide an independent view of its existence. As it turns out, the Planck data, when using the technique developed in this work, find that the observed bulk flow amplitude is consistent with those expected from the LambdaCDM, which is in clear contradiction to the previous claim of a significant bulk flow detection in the WMAP data. We also forecast on the capability of the ongoing and future cluster surveys identified through thermal SZ (tSZ) in constraining three extended models to the LambdaCDM model: modified gravity f( R) model, primordial non-Gaussianity of density perturbation, and the presence of massive neutrinos. We do so by employing their effects on the cluster number count and power spectrum and using Fisher Matrix analysis to estimate the errors on the model parameters. We find that SZ cluster surveys can provide vital complementary information to those expected from non-cluster probes. Our results therefore give the confidence for pursuing these extended cosmological models with SZ clusters.
Feng, Sujuan; Qian, Xiaosong; Li, Han; Zhang, Xiaodong
2017-12-01
The aim of the present study was to investigate the effectiveness of the miR-17-92 cluster as a disease progression marker in prostate cancer (PCa). Reverse transcription-quantitative polymerase chain reaction analysis was used to detect the microRNA (miR)-17-92 cluster expression levels in tissues from patients with PCa or benign prostatic hyperplasia (BPH), in addition to in PCa and BPH cell lines. Spearman correlation was used for comparison and estimation of correlations between miRNA expression levels and clinicopathological characteristics such as the Gleason score and prostate-specific antigen (PSA). Receiver operating curve (ROC) analysis was performed for evaluation of specificity and sensitivity of miR-17-92 cluster expression levels for discriminating patients with PCa from patients with BPH. Kaplan-Meier analysis was plotted to investigate the predictive potential of miR-17-92 cluster for PCa biochemical recurrence. Expression of the majority of miRNAs in the miR-17-92 cluster was identified to be significantly increased in PCa tissues and cell lines. Bivariate correlation analysis indicated that the high expression of unregulated miRNAs was positively correlated with Gleason grade, but had no significant association with PSA. ROC curves demonstrated that high expression of miR-17-92 cluster predicted a higher diagnostic accuracy compared with PSA. Improved discriminating quotients were observed when combinations of unregulated miRNAs with PSA were used. Survival analysis confirmed a high combined miRNA score of miR-17-92 cluster was associated with shorter biochemical recurrence interval. miR-17-92 cluster could be a potential diagnostic and prognostic biomarker for PCa, and the combination of the miR-17-92 cluster and serum PSA may enhance the accuracy for diagnosis of PCa.
The SAMI Galaxy Survey: the cluster redshift survey, target selection and cluster properties
NASA Astrophysics Data System (ADS)
Owers, M. S.; Allen, J. T.; Baldry, I.; Bryant, J. J.; Cecil, G. N.; Cortese, L.; Croom, S. M.; Driver, S. P.; Fogarty, L. M. R.; Green, A. W.; Helmich, E.; de Jong, J. T. A.; Kuijken, K.; Mahajan, S.; McFarland, J.; Pracy, M. B.; Robotham, A. G. S.; Sikkema, G.; Sweet, S.; Taylor, E. N.; Verdoes Kleijn, G.; Bauer, A. E.; Bland-Hawthorn, J.; Brough, S.; Colless, M.; Couch, W. J.; Davies, R. L.; Drinkwater, M. J.; Goodwin, M.; Hopkins, A. M.; Konstantopoulos, I. S.; Foster, C.; Lawrence, J. S.; Lorente, N. P. F.; Medling, A. M.; Metcalfe, N.; Richards, S. N.; van de Sande, J.; Scott, N.; Shanks, T.; Sharp, R.; Thomas, A. D.; Tonini, C.
2017-06-01
We describe the selection of galaxies targeted in eight low-redshift clusters (APMCC0917, A168, A4038, EDCC442, A3880, A2399, A119 and A85; 0.029 < z < 0.058) as part of the Sydney-AAO Multi-Object Integral field spectrograph Galaxy Survey (SAMI-GS). We have conducted a redshift survey of these clusters using the AAOmega multi-object spectrograph on the 3.9-m Anglo-Australian Telescope. The redshift survey is used to determine cluster membership and to characterize the dynamical properties of the clusters. In combination with existing data, the survey resulted in 21 257 reliable redshift measurements and 2899 confirmed cluster member galaxies. Our redshift catalogue has a high spectroscopic completeness (˜94 per cent) for rpetro ≤ 19.4 and cluster-centric distances R < 2R200. We use the confirmed cluster member positions and redshifts to determine cluster velocity dispersion, R200, virial and caustic masses, as well as cluster structure. The clusters have virial masses 14.25 ≤ log(M200/M⊙) ≤ 15.19. The cluster sample exhibits a range of dynamical states, from relatively relaxed-appearing systems, to clusters with strong indications of merger-related substructure. Aperture- and point spread function matched photometry are derived from Sloan Digital Sky Survey and VLT Survey Telescope/ATLAS imaging and used to estimate stellar masses. These estimates, in combination with the redshifts, are used to define the input target catalogue for the cluster portion of the SAMI-GS. The primary SAMI-GS cluster targets have R
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.
Nixon, R M; Thompson, S G
2003-09-15
Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-11-27
A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-01-01
Background A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Results Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Conclusion Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries. PMID:18047705
Arnup, Sarah J; McKenzie, Joanne E; Hemming, Karla; Pilcher, David; Forbes, Andrew B
2017-08-15
In a cluster randomised crossover (CRXO) design, a sequence of interventions is assigned to a group, or 'cluster' of individuals. Each cluster receives each intervention in a separate period of time, forming 'cluster-periods'. Sample size calculations for CRXO trials need to account for both the cluster randomisation and crossover aspects of the design. Formulae are available for the two-period, two-intervention, cross-sectional CRXO design, however implementation of these formulae is known to be suboptimal. The aims of this tutorial are to illustrate the intuition behind the design; and provide guidance on performing sample size calculations. Graphical illustrations are used to describe the effect of the cluster randomisation and crossover aspects of the design on the correlation between individual responses in a CRXO trial. Sample size calculations for binary and continuous outcomes are illustrated using parameters estimated from the Australia and New Zealand Intensive Care Society - Adult Patient Database (ANZICS-APD) for patient mortality and length(s) of stay (LOS). The similarity between individual responses in a CRXO trial can be understood in terms of three components of variation: variation in cluster mean response; variation in the cluster-period mean response; and variation between individual responses within a cluster-period; or equivalently in terms of the correlation between individual responses in the same cluster-period (within-cluster within-period correlation, WPC), and between individual responses in the same cluster, but in different periods (within-cluster between-period correlation, BPC). The BPC lies between zero and the WPC. When the WPC and BPC are equal the precision gained by crossover aspect of the CRXO design equals the precision lost by cluster randomisation. When the BPC is zero there is no advantage in a CRXO over a parallel-group cluster randomised trial. Sample size calculations illustrate that small changes in the specification of the WPC or BPC can increase the required number of clusters. By illustrating how the parameters required for sample size calculations arise from the CRXO design and by providing guidance on both how to choose values for the parameters and perform the sample size calculations, the implementation of the sample size formulae for CRXO trials may improve.
Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F
2017-04-01
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.
Verdery, Ashton M; Siripong, Nalyn; Pence, Brian W
2017-09-01
The Philippines has seen rapid increases in HIV prevalence among people who inject drugs. We study 2 neighboring cities where a linked HIV epidemic differed in timing of onset and levels of prevalence. In Cebu, prevalence rose rapidly from below 1% to 54% between 2009 and 2011 and remained high through 2013. In nearby Mandaue, HIV remained below 4% through 2011 then rose rapidly to 38% by 2013. We hypothesize that infection prevalence differences in these cities may owe to aspects of social network structure, specifically levels of network clustering. Building on previous research, we hypothesize that higher levels of network clustering are associated with greater epidemic potential. Data were collected with respondent-driven sampling among men who inject drugs in Cebu and Mandaue in 2013. We first examine sample composition using estimators for population means. We then apply new estimators of network clustering in respondent-driven sampling data to examine associations with HIV prevalence. Samples in both cities were comparable in composition by age, education, and injection locations. Dyadic needle-sharing levels were also similar between the 2 cities, but network clustering in the needle-sharing network differed dramatically. We found higher clustering in Cebu than Mandaue, consistent with expectations that higher clustering is associated with faster epidemic spread. This article is the first to apply estimators of network clustering to empirical respondent-driven samples, and it offers suggestive evidence that researchers should pay greater attention to network structure's role in HIV transmission dynamics.
McGovern, Mark E.; Canning, David
2015-01-01
Based on models with calibrated parameters for infection, case fatality rates, and vaccine efficacy, basic childhood vaccinations have been estimated to be highly cost effective. We estimated the association of vaccination with mortality directly from survey data. Using 149 cross-sectional Demographic and Health Surveys, we determined the relationship between vaccination coverage and the probability of dying between birth and 5 years of age at the survey cluster level. Our data included approximately 1 million children in 68,490 clusters from 62 countries. We considered the childhood measles, bacillus Calmette-Guérin, diphtheria-pertussis-tetanus, polio, and maternal tetanus vaccinations. Using modified Poisson regression to estimate the relative risk of child mortality in each cluster, we also adjusted for selection bias that resulted from the vaccination status of dead children not being reported. Childhood vaccination, and in particular measles and tetanus vaccination, is associated with substantial reductions in childhood mortality. We estimated that children in clusters with complete vaccination coverage have a relative risk of mortality that is 0.73 (95% confidence interval: 0.68, 0.77) times that of children in a cluster with no vaccinations. Although widely used, basic vaccines still have coverage rates well below 100% in many countries, and our results emphasize the effectiveness of increasing coverage rates in order to reduce child mortality. PMID:26453618
Luo, Junhai; Fu, Liang
2017-06-09
With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Battaglia, N.; Miyatake, H.; Hasselfield, M.
Mass calibration uncertainty is the largest systematic effect for using clusters of galaxies to constrain cosmological parameters. We present weak lensing mass measurements from the Canada-France-Hawaii Telescope Stripe 82 Survey for galaxy clusters selected through their high signal-to-noise thermal Sunyaev-Zeldovich (tSZ) signal measured with the Atacama Cosmology Telescope (ACT). For a sample of 9 ACT clusters with a tSZ signal-to-noise greater than five the average weak lensing mass is (4.8±0.8) ×10{sup 14} M{sub ⊙}, consistent with the tSZ mass estimate of (4.70±1.0) ×10{sup 14} M{sub ⊙} which assumes a universal pressure profile for the cluster gas. Our results are consistentmore » with previous weak-lensing measurements of tSZ-detected clusters from the Planck satellite. When comparing our results, we estimate the Eddington bias correction for the sample intersection of Planck and weak-lensing clusters which was previously excluded.« less
The peculiar cluster MACS J0417.5-1154 in the C and X-bands
NASA Astrophysics Data System (ADS)
Sandhu, Pritpal; Malu, Siddharth; Raja, Ramij; Datta, Abhirup
2018-06-01
We present 5.5 and 9.0 GHz Australia Telescope Compact Array (ATCA) observations of the cluster MACSJ0417.5-1154, one of the most massive galaxy clusters and one of the brightest in X-ray in the Massive Cluster Survey (MACS). We estimate diffuse emission at 5.5 and 9.0 GHz from our ATCA observations, and compare the results with the 235 MHz and 610 MHz GMRT observations and 1575 MHz VLA observations. We also estimate the diffuse emission at low frequencies from existing GLEAM survey data (using the MWA telescope (http://www.mwatelescope.org)), and find that the steepening reported in earlier studies may have been an artefact of underestimates of diffuse emission at low frequencies. High-frequency radio observations of galaxy cluster mergers therefore provide an important complement to low-frequency observations, not only for a probing the `on' and `off' state of radio halos in these mergers, but also to constrain energetics of cluster mergers. We comment on the future directions that further studies of this cluster can take.
A roadmap of clustering algorithms: finding a match for a biomedical application.
Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael
2009-05-01
Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.
Evaluation of large area crop estimation techniques using LANDSAT and ground-derived data. [Missouri
NASA Technical Reports Server (NTRS)
Amis, M. L.; Lennington, R. K.; Martin, M. V.; Mcguire, W. G.; Shen, S. S. (Principal Investigator)
1981-01-01
The results of the Domestic Crops and Land Cover Classification and Clustering study on large area crop estimation using LANDSAT and ground truth data are reported. The current crop area estimation approach of the Economics and Statistics Service of the U.S. Department of Agriculture was evaluated in terms of the factors that are likely to influence the bias and variance of the estimator. Also, alternative procedures involving replacements for the clustering algorithm, the classifier, or the regression model used in the original U.S. Department of Agriculture procedures were investigated.
FAR-FLUNG GALAXY CLUSTERS MAY REVEAL FATE OF UNIVERSE
NASA Technical Reports Server (NTRS)
2002-01-01
A selection of NASA Hubble Space Telescope snapshots of huge galaxy clusters that lie far away and far back in time. These are selected from a catalog of 92 new clusters uncovered during a six-year Hubble observing program known as the Medium Deep Survey. If the distances and masses of the clusters are confirmed by ground based telescopes, the survey may hold clues to how galaxies quickly formed into massive large-scale structures after the big bang, and what that may mean for the eventual fate of the expanding universe. The images are each a combination of two exposures in yellow and deep red taken with Hubble's Wide Field and Planetary Camera 2. Each cluster's distance is inferred from the reddening of the starlight, which is due to the expansion of space. Astronomers assume these clusters all formed early in the history of the universe. HST133617-00529 (left) This collection of spiral and elliptical galaxies lies an estimated 4 to 6 billion light-years away. It is in the constellation of Virgo not far from the 3rd magnitude star Zeta Virginis. The brighter galaxies in this cluster have red magnitudes between 20 and 22 near the limit of the Palomar Sky Survey. The bright blue galaxy (upper left) is probably a foreground galaxy, and not a cluster member. The larger of the galaxies in the cluster are probably about the size of our Milky Way Galaxy. The diagonal line at lower right is an artificial satellite trail. HST002013+28366 (upper right) This cluster of galaxies lies in the constellation of Andromeda a few degrees from the star Alpheratz in the northeast corner of the constellation Pegasus. It is at an estimated distance of 4 billion light-years, which means the light we are seeing from the cluster is as it appeared when the universe was roughly 2/3 of its present age. HST035528+09435 (lower right) At an estimated distance of about 7 to 10 billion light-years (z=1), this is one of the farthest clusters in the Hubble sample. The cluster lies in the constellation of Taurus. Credit: K. Ratnatunga, R. Griffiths (Carnegie Mellon University); and NASA
Zhou, Huan; Sun, Shuai; Sylvia, Sean; Yue, Ai; Shi, Yaojiang; Zhang, Linxiu; Medina, Alexis; Rozelle, Scott
2016-01-01
Objectives. To test whether text message reminders sent to caregivers improve the effectiveness of a home micronutrient fortification program in western China. Methods. We carried out a cluster-randomized controlled trial in 351 villages (clusters) in Shaanxi Province in 2013 and 2014, enrolling children aged 6 to 12 months. We randomly assigned each village to 1 of 3 groups: free delivery group, text messaging group, or control group. We collected information on compliance with treatments and hemoglobin concentrations from all children at baseline and 6-month follow-up. We estimated the intent-to-treat effects on compliance and child anemia using a logistic regression model. Results. There were 1393 eligible children. We found that assignment to the text messaging group led to an increase in full compliance (marginal effect = 0.10; 95% confidence interval [CI] = 0.03, 0.16) compared with the free delivery group and decrease in the rate of anemia at end line relative to the control group (marginal effect = −0.07; 95% CI = −0.12, −0.01), but not relative to the free delivery group (marginal effect = −0.03; 95% CI = −0.09, 0.03). Conclusions. Text messages improved compliance of caregivers to a home fortification program and children’s nutrition. PMID:27077354
Stiell, Ian G.; Callaway, Clif; Davis, Dan; Terndrup, Tom; Powell, Judy; Cook, Andrea; Kudenchuk, Peter J.; Daya, Mohamud; Kerber, Richard; Idris, Ahamed; Morrison, Laurie J.; Aufderheide, Tom
2008-01-01
Objective The primary objective of the trial is to compare survival to hospital discharge with Modified Rankin Score (MRS) ≤3 between a strategy that prioritizes a specified period of CPR before rhythm analysis (Analyze Later) versus a strategy of minimal CPR followed by early rhythm analysis (Analyze Early) in patients with out-of-hospital cardiac arrest. Methods Design Cluster randomized trial with cluster units defined by geographic region, or monitor/defibrillator machine. Population Adults treated by Emergency Medical Service (EMS) providers for non-traumatic out-of-hospital cardiac arrest not witnessed by EMS. Setting EMS systems participating in the Resuscitation Outcomes Consortium and agreeing to cluster randomization to the Analyze Later versus Analyze Early intervention in a crossover fashion. Sample Size Based on a two-sided significance level of 0.05, a maximum of 13,239 evaluable patients will allow statistical power of 0.996 to detect a hypothesized improvement in the probability of survival to discharge with MRS ≤ 3 rate from 5.41% after Analyze Early to 7.45% after Analyze Later (2.04% absolute increase in primary outcome). Conclusion If this trial demonstrates a significant improvement in survival with a strategy of Analyze Later, it is estimated that 4,000 premature deaths from cardiac arrest would be averted annually in North America alone. PMID:18487004
Improving Spectral Image Classification through Band-Ratio Optimization and Pixel Clustering
NASA Astrophysics Data System (ADS)
O'Neill, M.; Burt, C.; McKenna, I.; Kimblin, C.
2017-12-01
The Underground Nuclear Explosion Signatures Experiment (UNESE) seeks to characterize non-prompt observables from underground nuclear explosions (UNE). As part of this effort, we evaluated the ability of DigitalGlobe's WorldView-3 (WV3) to detect and map UNE signatures. WV3 is the current state-of-the-art, commercial, multispectral imaging satellite; however, it has relatively limited spectral and spatial resolutions. These limitations impede image classifiers from detecting targets that are spatially small and lack distinct spectral features. In order to improve classification results, we developed custom algorithms to reduce false positive rates while increasing true positive rates via a band-ratio optimization and pixel clustering front-end. The clusters resulting from these algorithms were processed with standard spectral image classifiers such as Mixture-Tuned Matched Filter (MTMF) and Adaptive Coherence Estimator (ACE). WV3 and AVIRIS data of Cuprite, Nevada, were used as a validation data set. These data were processed with a standard classification approach using MTMF and ACE algorithms. They were also processed using the custom front-end prior to the standard approach. A comparison of the results shows that the custom front-end significantly increases the true positive rate and decreases the false positive rate.This work was done by National Security Technologies, LLC, under Contract No. DE-AC52-06NA25946 with the U.S. Department of Energy. DOE/NV/25946-3283.
Stiell, Ian G; Callaway, Clif; Davis, Dan; Terndrup, Tom; Powell, Judy; Cook, Andrea; Kudenchuk, Peter J; Daya, Mohamud; Kerber, Richard; Idris, Ahamed; Morrison, Laurie J; Aufderheide, Tom
2008-08-01
The primary objective of the trial is to compare survival to hospital discharge with modified Rankin score (MRS) < or =3 between a strategy that prioritizes a specified period of CPR before rhythm analysis (Analyze Later) versus a strategy of minimal CPR followed by early rhythm analysis (Analyze Early) in patients with out-of-hospital cardiac arrest. Design-Cluster randomized trial with cluster units defined by geographic region, or monitor/defibrillator machine. Population-Adults treated by emergency medical service (EMS) providers for non-traumatic out-of-hospital cardiac arrest not witnessed by EMS. Setting-EMS systems participating in the Resuscitation Outcomes Consortium and agreeing to cluster randomization to the Analyze Later versus Analyze Early intervention in a crossover fashion. Sample size-Based on a two-sided significance level of 0.05, a maximum of 13,239 evaluable patients will allow statistical power of 0.996 to detect a hypothesized improvement in the probability of survival to discharge with MRS < or =3 rate from 5.41% after Analyze Early to 7.45% after Analyze Later (2.04% absolute increase in primary outcome). If this trial demonstrates a significant improvement in survival with a strategy of Analyze Later, it is estimated that 4000 premature deaths from cardiac arrest would be averted annually in North America alone.
NASA Astrophysics Data System (ADS)
Higaki, Tatsuya; Kitazawa, Hirokazu; Yamazoe, Seiji; Tsukuda, Tatsuya
2016-06-01
Iridium clusters nominally composed of 15, 30 or 60 atoms were size-selectively synthesized within OH-terminated poly(amidoamine) dendrimers of generation 6. Spectroscopic characterization revealed that the Ir clusters were partially oxidized. All the Ir clusters efficiently converted 2-nitrobenzaldehyde to anthranil and 2-aminobenzaldehyde under atmospheric hydrogen at room temperature in toluene via selective hydrogenation of the NO2 group. The selectivity toward 2-aminobenzaldehyde over anthranil was improved with the reduction of the cluster size. The improved selectivity is ascribed to more efficient reduction than intramolecular heterocyclization of a hydroxylamine intermediate on smaller clusters that have a higher Ir(0)-phase population on the surface.Iridium clusters nominally composed of 15, 30 or 60 atoms were size-selectively synthesized within OH-terminated poly(amidoamine) dendrimers of generation 6. Spectroscopic characterization revealed that the Ir clusters were partially oxidized. All the Ir clusters efficiently converted 2-nitrobenzaldehyde to anthranil and 2-aminobenzaldehyde under atmospheric hydrogen at room temperature in toluene via selective hydrogenation of the NO2 group. The selectivity toward 2-aminobenzaldehyde over anthranil was improved with the reduction of the cluster size. The improved selectivity is ascribed to more efficient reduction than intramolecular heterocyclization of a hydroxylamine intermediate on smaller clusters that have a higher Ir(0)-phase population on the surface. Electronic supplementary information (ESI) available. See DOI: 10.1039/c6nr01460g
Baryon Distribution in Galaxy Clusters as a Result of Sedimentation of Helium Nuclei.
Qin; Wu
2000-01-20
Heavy particles in galaxy clusters tend to be more centrally concentrated than light ones according to the Boltzmann distribution. An estimate of the drift velocity suggests that it is possible that the helium nuclei may have entirely or partially sedimented into the cluster core within the Hubble time. We demonstrate this scenario using the Navarro-Frenk-White profile as the dark matter distribution of clusters and assuming that the intracluster gas is isothermal and in hydrostatic equilibrium. We find that a greater fraction of baryonic matter is distributed at small radii than at large radii, which challenges the prevailing claim that the baryon fraction increases monotonically with cluster radius. It shows that the conventional mass estimate using X-ray measurements of intracluster gas along with a constant mean molecular weight may have underestimated the total cluster mass by approximately 20%, which in turn leads to an overestimate of the total baryon fraction by the same percentage. Additionally, it is pointed out that the sedimentation of helium nuclei toward cluster cores may at least partially account for the sharp peaks in the central X-ray emissions observed in some clusters.
Low-end mass function of the Quintuplet cluster
NASA Astrophysics Data System (ADS)
Shin, Jihye; Kim, Sungsoo S.
2016-08-01
The Quintuplet and Arches clusters, which were formed in the harsh environment of the Galactic Centre (GC) a few million years ago, have been excellent targets for studying the effects of a star-forming environment on the initial mass function (IMF). In order to estimate the shape of the low-end IMF of the Arches cluster, Shin & Kim devised a novel photometric method that utilizes pixel intensity histograms (PIHs) of the observed images. Here, we apply the PIH method to the Quintuplet cluster and estimate the shape of its low-end IMF below the magnitude of completeness limit as set by conventional photometry. We found that the low-end IMF of the Quintuplet is consistent with that found for the Arches cluster-Kroupa MF, with a significant number of low-mass stars below 1 M⊙. We conclude that the most likely IMFs of the Quintuplet and the Arches clusters are not too different from the IMFs found in the Galactic disc. We also find that the observed PIHs and stellar number density profiles of both clusters are best reproduced when the clusters are assumed to be at three-dimensional distances of approximately 100 pc from the GC.
NASA Astrophysics Data System (ADS)
Czarski, T.; Chernyshova, M.; Malinowski, K.; Pozniak, K. T.; Kasprowicz, G.; Kolasinski, P.; Krawczyk, R.; Wojenski, A.; Zabolotny, W.
2016-11-01
The measurement system based on gas electron multiplier detector is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an X-ray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value, and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals, and cluster charge values corresponding to the energy spectra.
Czarski, T; Chernyshova, M; Malinowski, K; Pozniak, K T; Kasprowicz, G; Kolasinski, P; Krawczyk, R; Wojenski, A; Zabolotny, W
2016-11-01
The measurement system based on gas electron multiplier detector is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an X-ray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value, and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals, and cluster charge values corresponding to the energy spectra.
Andridge, Rebecca. R.
2011-01-01
In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309
Impact of Sampling Density on the Extent of HIV Clustering
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2014-01-01
Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Wei-Chen; Maitra, Ranjan
2011-01-01
We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less
Brown, Andrew W; Li, Peng; Bohan Brown, Michelle M; Kaiser, Kathryn A; Keith, Scott W; Oakes, J Michael; Allison, David B
2015-08-01
Cluster randomized controlled trials (cRCTs; also known as group randomized trials and community-randomized trials) are multilevel experiments in which units that are randomly assigned to experimental conditions are sets of grouped individuals, whereas outcomes are recorded at the individual level. In human cRCTs, clusters that are randomly assigned are typically families, classrooms, schools, worksites, or counties. With growing interest in community-based, public health, and policy interventions to reduce obesity or improve nutrition, the use of cRCTs has increased. Errors in the design, analysis, and interpretation of cRCTs are unfortunately all too common. This situation seems to stem in part from investigator confusion about how the unit of randomization affects causal inferences and the statistical procedures required for the valid estimation and testing of effects. In this article, we provide a brief introduction and overview of the importance of cRCTs and highlight and explain important considerations for the design, analysis, and reporting of cRCTs by using published examples. © 2015 American Society for Nutrition.
Protein family clustering for structural genomics.
Yan, Yongpan; Moult, John
2005-10-28
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
Bordner, Andrew J; Gorin, Andrey A
2008-05-12
Protein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB). We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website (see Availability and requirements section). Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.
The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration
ERIC Educational Resources Information Center
McNeish, Daniel M.; Stapleton, Laura M.
2016-01-01
Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the…
Marginal regression approach for additive hazards models with clustered current status data.
Su, Pei-Fang; Chi, Yunchan
2014-01-15
Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.
Fulton, Kara A.; Liu, Danping; Haynie, Denise L.; Albert, Paul S.
2016-01-01
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian–Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored. PMID:26937263
Gaussian mixture clustering and imputation of microarray data.
Ouyang, Ming; Welsh, William J; Georgopoulos, Panos
2004-04-12
In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.
Determination of the masses of globular clusters using proper motions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ninkovich, S.
1984-09-01
Published proper motions of stars in the fields of the globular clusters M 15, M 92, and M 13 (Cudworth, 1976 Cudworth and Monet, 1979) are compiled in tables and used to estimate the masses of the clusters by the method of Naumova and Ogorodnikov (1973). Masses of the order of 10 to the 8th solar mass are calculated, as compared to an M 13 mass of about 10 to the 6th solar mass determined by the virial theorem. The higher masses are considered indicative of the actual cluster masses despite the distortion introduced by the presence in the fieldmore » of stars not belonging to the clusters. It is suggested that the difference between these estimates and the smaller masses proposed by previous authors may represent unobservable peripheral dwarf stars or some invisible mass (like the so-called missing mass of the Galaxy).« less
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
The Flow-field From Galaxy Groups In 2MASS
NASA Astrophysics Data System (ADS)
Crook, Aidan; Huchra, J.; Macri, L.; Masters, K.; Jarrett, T.
2011-01-01
We present the first model of a flow-field in the nearby Universe (cz < 12,000 km/s) constructed from groups of galaxies identified in an all-sky flux-limited survey. The Two Micron All-Sky Redshift Survey (2MRS), upon which the model is based, represents the most complete survey of its class and, with near-IR fluxes, provides the optimal method for tracing baryonic matter in the nearby Universe. Peculiar velocities are reconstructed self-consistently with a density-field based upon groups identified in the 2MRS Ks<11.75 catalog. The model predicts infall toward Virgo, Perseus-Pisces, Hydra-Centaurus, Norma, Coma, Shapley and Hercules, and most notably predicts backside-infall into the Norma Cluster. We discuss the application of the model as a predictor of galaxy distances using only angular position and redshift measurements. By calibrating the model using measured distances to galaxies inside 3000 km/s, we show that, for a randomly-sampled 2MRS galaxy, improvement in the estimated distance over the application of Hubble's law is expected to be 30%, and considerably better in the proximity of clusters. We test the model using distance estimates from the SFI++ sample, and find evidence for improvement over the application of Hubble's law to galaxies inside 4000 km/s, although the performance varies depending on the location of the target. This work has been supported by NSF grant AST 0406906 and the Massachusetts Institute of Technology Bruno Rossi and Whiteman Fellowships.
Jaffe, Klaus
2014-01-01
Do different fields of knowledge require different research strategies? A numerical model exploring different virtual knowledge landscapes, revealed two diverging optimal search strategies. Trend following is maximized when the popularity of new discoveries determine the number of individuals researching it. This strategy works best when many researchers explore few large areas of knowledge. In contrast, individuals or small groups of researchers are better in discovering small bits of information in dispersed knowledge landscapes. Bibliometric data of scientific publications showed a continuous bipolar distribution of these strategies, ranging from natural sciences, with highly cited publications in journals containing a large number of articles, to the social sciences, with rarely cited publications in many journals containing a small number of articles. The natural sciences seem to adapt their research strategies to landscapes with large concentrated knowledge clusters, whereas social sciences seem to have adapted to search in landscapes with many small isolated knowledge clusters. Similar bipolar distributions were obtained when comparing levels of insularity estimated by indicators of international collaboration and levels of country-self citations: researchers in academic areas with many journals such as social sciences, arts and humanities, were the most isolated, and that was true in different regions of the world. The work shows that quantitative measures estimating differences between academic disciplines improve our understanding of different research strategies, eventually helping interdisciplinary research and may be also help improve science policies worldwide.
Kappa statistic for clustered dichotomous responses from physicians and patients.
Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L; Cai, Jianwen
2013-09-20
The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared with the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. We present an example of an application to a coronary heart disease prevention study. Copyright © 2013 John Wiley & Sons, Ltd.
Estimation of homogeneous nucleation flux via a kinetic model
NASA Technical Reports Server (NTRS)
Wilcox, C. F.; Bauer, S. H.
1991-01-01
The proposed kinetic model for condensation under homogeneous conditions, and the onset of unidirectional cluster growth in supersaturated gases, does not suffer from the conceptual flaws that characterize classical nucleation theory. When a full set of simultaneous rate equation is solved, a characteristic time emerges, for each cluster size, at which the production rate, and its rate of conversion to the next size (n + 1) are equal. Procedures for estimating the essential parameters are proposed; condensation fluxes J(kin) exp ss are evaluated. Since there are practical limits to the cluster size that can be incorporated in the set of simultaneous first-order differential equations, a code was developed for computing an approximate J(th) exp ss based on estimates of a 'constrained equilibrium' distribution, and identification of its minimum.
NASA Astrophysics Data System (ADS)
Riley, Steven; Fraser, Christophe; Donnelly, Christl A.; Ghani, Azra C.; Abu-Raddad, Laith J.; Hedley, Anthony J.; Leung, Gabriel M.; Ho, Lai-Ming; Lam, Tai-Hing; Thach, Thuan Q.; Chau, Patsy; Chan, King-Pan; Lo, Su-Vui; Leung, Pak-Yin; Tsang, Thomas; Ho, William; Lee, Koon-Hung; Lau, Edith M. C.; Ferguson, Neil M.; Anderson, Roy M.
2003-06-01
We present an analysis of the first 10 weeks of the severe acute respiratory syndrome (SARS) epidemic in Hong Kong. The epidemic to date has been characterized by two large clusters-initiated by two separate ``super-spread'' events (SSEs)-and by ongoing community transmission. By fitting a stochastic model to data on 1512 cases, including these clusters, we show that the etiological agent of SARS is moderately transmissible. Excluding SSEs, we estimate that 2.7 secondary infections were generated per case on average at the start of the epidemic, with a substantial contribution from hospital transmission. Transmission rates fell during the epidemic, primarily as a result of reductions in population contact rates and improved hospital infection control, but also because of more rapid hospital attendance by symptomatic individuals. As a result, the epidemic is now in decline, although continued vigilance is necessary for this to be maintained. Restrictions on longer range population movement are shown to be a potentially useful additional control measure in some contexts. We estimate that most currently infected persons are now hospitalized, which highlights the importance of control of nosocomial transmission.
Bible, Joe; Beck, James D.; Datta, Somnath
2016-01-01
Summary Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497–505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study. PMID:26682911
Kasza, J; Hemming, K; Hooper, R; Matthews, Jns; Forbes, A B
2017-01-01
Stepped wedge and cluster randomised crossover trials are examples of cluster randomised designs conducted over multiple time periods that are being used with increasing frequency in health research. Recent systematic reviews of both of these designs indicate that the within-cluster correlation is typically taken account of in the analysis of data using a random intercept mixed model, implying a constant correlation between any two individuals in the same cluster no matter how far apart in time they are measured: within-period and between-period intra-cluster correlations are assumed to be identical. Recently proposed extensions allow the within- and between-period intra-cluster correlations to differ, although these methods require that all between-period intra-cluster correlations are identical, which may not be appropriate in all situations. Motivated by a proposed intensive care cluster randomised trial, we propose an alternative correlation structure for repeated cross-sectional multiple-period cluster randomised trials in which the between-period intra-cluster correlation is allowed to decay depending on the distance between measurements. We present results for the variance of treatment effect estimators for varying amounts of decay, investigating the consequences of the variation in decay on sample size planning for stepped wedge, cluster crossover and multiple-period parallel-arm cluster randomised trials. We also investigate the impact of assuming constant between-period intra-cluster correlations instead of decaying between-period intra-cluster correlations. Our results indicate that in certain design configurations, including the one corresponding to the proposed trial, a correlation decay can have an important impact on variances of treatment effect estimators, and hence on sample size and power. An R Shiny app allows readers to interactively explore the impact of correlation decay.
2013-01-01
Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the Δglnrps4 and Δglpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom. PMID:23688303
Hahn, Noel G.
2017-01-01
Geospatial analyses were used to investigate the spatial distribution of populations of Halyomorpha halys, an important invasive agricultural pest in mid-Atlantic peach orchards. This spatial analysis will improve efficiency by allowing growers and farm managers to predict insect arrangement and target management strategies. Data on the presence of H. halys were collected from five peach orchards at four farms in New Jersey from 2012–2014 located in different land-use contexts. A point pattern analysis, using Ripley’s K function, was used to describe clustering of H. halys. In addition, the clustering of damage indicative of H. halys feeding was described. With low populations early in the growing season, H. halys did not exhibit signs of clustering in the orchards at most distances. At sites with low populations throughout the season, clustering was not apparent. However, later in the season, high infestation levels led to more evident clustering of H. halys. Damage, although present throughout the entire orchard, was found at low levels. When looking at trees with greater than 10% fruit damage, damage was shown to cluster in orchards. The Moran’s I statistic showed that spatial autocorrelation of H. halys was present within the orchards on the August sample dates, in relation to both populations density and levels of damage. Kriging the abundance of H. halys and the severity of damage to peaches revealed that the estimations of these are generally found in the same region of the orchards. This information on the clustering of H. halys populations will be useful to help predict presence of insects for use in management or scouting programs. PMID:28362797
NASA Astrophysics Data System (ADS)
Tanikawa, A.
2013-10-01
We have performed N-body simulations of globular clusters (GCs) in order to estimate a detection rate of mergers of binary stellar mass black holes (BBHs) by means of gravitational wave (GW) observatories. For our estimate, we have only considered mergers of BBHs which escape from GCs (BBH escapers). BBH escapers merge more quickly than BBHs inside GCs because of their small semimajor axes. N-body simulation cannot deal with a GC with the number of stars N ˜ 106 due to its high calculation cost. We have simulated dynamical evolution of small N clusters (104 ≲ N ≲ 105), and have extrapolated our simulation results to large N clusters. From our simulation results, we have found the following dependence of BBH properties on N. BBHs escape from a cluster at each two-body relaxation time at a rate proportional to N. Semimajor axes of BBH escapers are inversely proportional to N, if initial mass densities of clusters are fixed. Eccentricities, primary masses and mass ratios of BBH escapers are independent of N. Using this dependence of BBH properties, we have artificially generated a population of BBH escapers from a GC with N ˜ 106, and have estimated a detection rate of mergers of BBH escapers by next-generation GW observatories. We have assumed that all the GCs are formed 10 or 12 Gyr ago with their initial numbers of stars Ni = 5 × 105-2 × 106 and their initial stellar mass densities inside their half-mass radii ρh,i = 6 × 103-106 M⊙ pc-3. Then, the detection rate of BBH escapers is 0.5-20 yr-1 for a BH retention fraction RBH = 0.5. A few BBH escapers are components of hierarchical triple systems, although we do not consider secular perturbation on such BBH escapers for our estimate. Our simulations have shown that BHs are still inside some of GCs at the present day. These BHs may marginally contribute to BBH detection.
Mass profile and dynamical status of the z ~ 0.8 galaxy cluster LCDCS 0504
NASA Astrophysics Data System (ADS)
Guennou, L.; Biviano, A.; Adami, C.; Limousin, M.; Lima Neto, G. B.; Mamon, G. A.; Ulmer, M. P.; Gavazzi, R.; Cypriano, E. S.; Durret, F.; Clowe, D.; LeBrun, V.; Allam, S.; Basa, S.; Benoist, C.; Cappi, A.; Halliday, C.; Ilbert, O.; Johnston, D.; Jullo, E.; Just, D.; Kubo, J. M.; Márquez, I.; Marshall, P.; Martinet, N.; Maurogordato, S.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Schrabback, T.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.
2014-06-01
Context. Constraints on the mass distribution in high-redshift clusters of galaxies are currently not very strong. Aims: We aim to constrain the mass profile, M(r), and dynamical status of the z ~ 0.8 LCDCS 0504 cluster of galaxies that is characterized by prominent giant gravitational arcs near its center. Methods: Our analysis is based on deep X-ray, optical, and infrared imaging as well as optical spectroscopy, collected with various instruments, which we complemented with archival data. We modeled the mass distribution of the cluster with three different mass density profiles, whose parameters were constrained by the strong lensing features of the inner cluster region, by the X-ray emission from the intracluster medium, and by the kinematics of 71 cluster members. Results: We obtain consistent M(r) determinations from three methods based on kinematics (dispersion-kurtosis, caustics, and MAMPOSSt), out to the cluster virial radius, ≃1.3 Mpc and beyond. The mass profile inferred by the strong lensing analysis in the central cluster region is slightly higher than, but still consistent with, the kinematics estimate. On the other hand, the X-ray based M(r) is significantly lower than the kinematics and strong lensing estimates. Theoretical predictions from ΛCDM cosmology for the concentration-mass relation agree with our observational results, when taking into account the uncertainties in the observational and theoretical estimates. There appears to be a central deficit in the intracluster gas mass fraction compared with nearby clusters. Conclusions: Despite the relaxed appearance of this cluster, the determinations of its mass profile by different probes show substantial discrepancies, the origin of which remains to be determined. The extension of a dynamical analysis similar to that of other clusters of the DAFT/FADA survey with multiwavelength data of sufficient quality will allow shedding light on the possible systematics that affect the determination of mass profiles of high-z clusters, which is possibly related to our incomplete understanding of intracluster baryon physics. Table 2 is available in electronic form at http://www.aanda.org
Method of identifying clusters representing statistical dependencies in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.
A Note on Cluster Effects in Latent Class Analysis
ERIC Educational Resources Information Center
Kaplan, David; Keller, Bryan
2011-01-01
This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…
Comparing population structure as inferred from genealogical versus genetic information.
Colonna, Vincenza; Nutile, Teresa; Ferrucci, Ronald R; Fardella, Giulio; Aversano, Mario; Barbujani, Guido; Ciullo, Marina
2009-12-01
Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite F(st) between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and F(st) between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when F(st) values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data.
Comparing population structure as inferred from genealogical versus genetic information
Colonna, Vincenza; Nutile, Teresa; Ferrucci, Ronald R; Fardella, Giulio; Aversano, Mario; Barbujani, Guido; Ciullo, Marina
2009-01-01
Algorithms for inferring population structure from genetic data (ie, population assignment methods) have shown to effectively recognize genetic clusters in human populations. However, their performance in identifying groups of genealogically related individuals, especially in scanty-differentiated populations, has not been tested empirically thus far. For this study, we had access to both genealogical and genetic data from two closely related, isolated villages in southern Italy. We found that nearly all living individuals were included in a single pedigree, with multiple inbreeding loops. Despite Fst between villages being a low 0.008, genetic clustering analysis identified two clusters roughly corresponding to the two villages. Average kinship between individuals (estimated from genealogies) increased at increasing values of group membership (estimated from the genetic data), showing that the observed genetic clusters represent individuals who are more closely related to each other than to random members of the population. Further, average kinship within clusters and Fst between clusters increases with increasingly stringent membership threshold requirements. We conclude that a limited number of genetic markers is sufficient to detect structuring, and that the results of genetic analyses faithfully mirror the structuring inferred from detailed analyses of population genealogies, even when Fst values are low, as in the case of the two villages. We then estimate the impact of observed levels of population structure on association studies using simulated data. PMID:19550436
NASA Astrophysics Data System (ADS)
Benson, Charles; Watson, Philip; Taylor, Garth; Cook, Philip; Hollenhorst, Steve
2013-10-01
Yellowstone National Park visitor data were obtained from a survey collected for the National Park Service by the Park Studies Unit at the University of Idaho. Travel cost models have been conducted for national parks in the United States; however, this study builds on these studies and investigates how benefits vary by types of visitors who participate in different activities while at the park. Visitor clusters were developed based on activities in which a visitor participated while at the park. The clusters were analyzed and then incorporated into a travel cost model to determine the economic value (consumer surplus) that the different visitor groups received from visiting the park. The model was estimated using a zero-truncated negative binomial regression corrected for endogenous stratification. The travel cost price variable was estimated using both 1/3 and 1/4 the wage rate to test for sensitivity to opportunity cost specification. The average benefit across all visitor cluster groups was estimated at between 235 and 276 per person per trip. However, per trip benefits varied substantially across clusters; from 90 to 103 for the "value picnickers," to 185-263 for the "backcountry enthusiasts," 189-278 for the "do it all adventurists," 204-303 for the "windshield tourists," and 323-714 for the "creature comfort" cluster group.
Puchalski Ritchie, Lisa M; van Lettow, Monique; Makwakwa, Austine; Chan, Adrienne K; Hamid, Jemila S; Kawonga, Harry; Martiniuk, Alexandra L C; Schull, Michael J; van Schoor, Vanessa; Zwarenstein, Merrick; Barnsley, Jan; Straus, Sharon E
2016-09-07
Despite availability of effective treatment, tuberculosis (TB) remains an important cause of morbidity and mortality globally, with low- and middle-income countries most affected. In many such settings, including Malawi, the high burden of disease and severe shortage of skilled healthcare workers has led to task-shifting of outpatient TB care to lay health workers (LHWs). LHWs improve access to healthcare and some outcomes, including TB completion rates, but lack of training and supervision limit their impact. The goals of this study are to improve TB care provided by LHWs in Malawi by refining, implementing, and evaluating a knowledge translation strategy designed to address a recognized gap in LHWs' TB and job-specific knowledge and, through this, to improve patient outcomes. We are employing a mixed-methods design that includes a pragmatic cluster randomized controlled trial and a process evaluation using qualitative methods. Trial participants will include all health centers providing TB care in four districts in the South East Zone of Malawi. The intervention employs educational outreach, a point-of-care reminder tool, and a peer support network. The primary outcome is proportion of treatment successes, defined as the total of TB patients cured or completing treatment, with outcomes taken from Ministry of Health treatment records. With an alpha of 0.05, power of 0.80, a baseline treatment success of 0.80, intraclass correlation coefficient of 0.1 based on our pilot study, and an estimated 100 clusters (health centers providing TB care), a minimum of 6 patients per cluster is required to detect a clinically significant 0.10 increase in the proportion of treatment successes. Our process evaluation will include interviews with LHWs and patients, and a document analysis of LHW training logs, quarterly peer trainer meetings, and mentorship meeting notes. An estimated 10-15 LHWs and 10-15 patients will be required to reach saturation in each of 2 planned interview periods, for a total of 40-60 interview participants. This study will directly inform the efforts of knowledge users within TB care and, through extension of the approach, other areas of care provided by LHWs in Malawi and other low- and middle-income countries. ClinicalTrials.gov NCT02533089 . Registered 20 August 2015. Protocol Date/Version 29 May 2016/Version 2.
Hargreaves, James R; Fearon, Elizabeth; Davey, Calum; Phillips, Andrew; Cambiano, Valentina; Cowan, Frances M
2016-01-05
Pragmatic cluster-randomised trials should seek to make unbiased estimates of effect and be reported according to CONSORT principles, and the study population should be representative of the target population. This is challenging when conducting trials amongst 'hidden' populations without a sample frame. We describe a pair-matched cluster-randomised trial of a combination HIV-prevention intervention to reduce the proportion of female sex workers (FSW) with a detectable HIV viral load in Zimbabwe, recruiting via respondent driven sampling (RDS). We will cross-sectionally survey approximately 200 FSW at baseline and at endline to characterise each of 14 sites. RDS is a variant of chain referral sampling and has been adapted to approximate random sampling. Primary analysis will use the 'RDS-2' method to estimate cluster summaries and will adapt Hayes and Moulton's '2-step' method to adjust effect estimates for individual-level confounders and further adjust for cluster baseline prevalence. We will adapt CONSORT to accommodate RDS. In the absence of observable refusal rates, we will compare the recruitment process between matched pairs. We will need to investigate whether cluster-specific recruitment or the intervention itself affects the accuracy of the RDS estimation process, potentially causing differential biases. To do this, we will calculate RDS-diagnostic statistics for each cluster at each time point and compare these statistics within matched pairs and time points. Sensitivity analyses will assess the impact of potential biases arising from assumptions made by the RDS-2 estimation. We are not aware of any other completed pragmatic cluster RCTs that are recruiting participants using RDS. Our statistical design and analysis approach seeks to transparently document participant recruitment and allow an assessment of the representativeness of the study to the target population, a key aspect of pragmatic trials. The challenges we have faced in the design of this trial are likely to be shared in other contexts aiming to serve the needs of legally and/or socially marginalised populations for which no sampling frame exists and especially when the social networks of participants are both the target of intervention and the means of recruitment. The trial was registered at Pan African Clinical Trials Registry (PACTR201312000722390) on 9 December 2013.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pereira, Sebastián; Campusano, Luis E.; Hitschfeld-Kahler, Nancy
This paper is the first in a series, presenting a new galaxy cluster finder based on a three-dimensional Voronoi Tesselation plus a maximum likelihood estimator, followed by gapping-filtering in radial velocity(VoML+G). The scientific aim of the series is a reassessment of the diversity of optical clusters in the local universe. A mock galaxy database mimicking the southern strip of the magnitude(blue)-limited 2dF Galaxy Redshift Survey (2dFGRS), for the redshift range 0.009 < z < 0.22, is built on the basis of the Millennium Simulation of the LCDM cosmology and a reference catalog of “Millennium clusters,” spannning across the 1.0 ×more » 10{sup 12}–1.0 × 10{sup 15} M {sub ⊙} h {sup −1} dark matter (DM) halo mass range, is recorded. The validation of VoML+G is performed through its application to the mock data and the ensuing determination of the completeness and purity of the cluster detections by comparison with the reference catalog. The execution of VoML+G over the 2dFGRS mock data identified 1614 clusters, 22% with N {sub g} ≥ 10, 64 percent with 10 > N {sub g} ≥ 5, and 14% with N {sub g} < 5. The ensemble of VoML+G clusters has a ∼59% completeness and a ∼66% purity, whereas the subsample with N {sub g} ≥ 10, to z ∼ 0.14, has greatly improved mean rates of ∼75% and ∼90%, respectively. The VoML+G cluster velocity dispersions are found to be compatible with those corresponding to “Millennium clusters” over the 300–1000 km s{sup −1} interval, i.e., for cluster halo masses in excess of ∼3.0 × 10{sup 13} M {sub ⊙} h {sup −1}.« less
Bayliss, Matthew. B.; Zengo, Kyle; Ruel, Jonathan; ...
2017-03-07
The velocity distribution of galaxies in clusters is not universal; rather, galaxies are segregated according to their spectral type and relative luminosity. We examine the velocity distributions of different populations of galaxies within 89 Sunyaev Zel'dovich (SZ) selected galaxy clusters spanningmore » $ 0.28 < z < 1.08$. Our sample is primarily draw from the SPT-GMOS spectroscopic survey, supplemented by additional published spectroscopy, resulting in a final spectroscopic sample of 4148 galaxy spectra---2868 cluster members. The velocity dispersion of star-forming cluster galaxies is $$17\\pm4$$% greater than that of passive cluster galaxies, and the velocity dispersion of bright ($$m < m^{*}-0.5$$) cluster galaxies is $$11\\pm4$$% lower than the velocity dispersion of our total member population. We find good agreement with simulations regarding the shape of the relationship between the measured velocity dispersion and the fraction of passive vs. star-forming galaxies used to measure it, but we find a small offset between this relationship as measured in data and simulations in which suggests that our dispersions are systematically low by as much as 3\\% relative to simulations. We argue that this offset could be interpreted as a measurement of the effective velocity bias that describes the ratio of our observed velocity dispersions and the intrinsic velocity dispersion of dark matter particles in a published simulation result. Here, by measuring velocity bias in this way suggests that large spectroscopic surveys can improve dispersion-based mass-observable scaling relations for cosmology even in the face of velocity biases, by quantifying and ultimately calibrating them out.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bayliss, Matthew. B.; Zengo, Kyle; Ruel, Jonathan
The velocity distribution of galaxies in clusters is not universal; rather, galaxies are segregated according to their spectral type and relative luminosity. We examine the velocity distributions of different populations of galaxies within 89 Sunyaev Zel'dovich (SZ) selected galaxy clusters spanningmore » $ 0.28 < z < 1.08$. Our sample is primarily draw from the SPT-GMOS spectroscopic survey, supplemented by additional published spectroscopy, resulting in a final spectroscopic sample of 4148 galaxy spectra---2868 cluster members. The velocity dispersion of star-forming cluster galaxies is $$17\\pm4$$% greater than that of passive cluster galaxies, and the velocity dispersion of bright ($$m < m^{*}-0.5$$) cluster galaxies is $$11\\pm4$$% lower than the velocity dispersion of our total member population. We find good agreement with simulations regarding the shape of the relationship between the measured velocity dispersion and the fraction of passive vs. star-forming galaxies used to measure it, but we find a small offset between this relationship as measured in data and simulations in which suggests that our dispersions are systematically low by as much as 3\\% relative to simulations. We argue that this offset could be interpreted as a measurement of the effective velocity bias that describes the ratio of our observed velocity dispersions and the intrinsic velocity dispersion of dark matter particles in a published simulation result. Here, by measuring velocity bias in this way suggests that large spectroscopic surveys can improve dispersion-based mass-observable scaling relations for cosmology even in the face of velocity biases, by quantifying and ultimately calibrating them out.« less
Detection of the YORP Effect for Small Asteroids in the Karin Cluster
NASA Astrophysics Data System (ADS)
Carruba, V.; Nesvorný, D.; Vokrouhlický, D.
2016-06-01
The Karin cluster is a young asteroid family thought to have formed only ≃ 5.75 Myr ago. The young age can be demonstrated by numerically integrating the orbits of Karin cluster members backward in time and showing the convergence of the perihelion and nodal longitudes (as well as other orbital elements). Previous work has pointed out that the convergence is not ideal if the backward integration only accounts for the gravitational perturbations from the solar system planets. It improves when the thermal radiation force known as the Yarkovsky effect is accounted for. This argument can be used to estimate the spin obliquities of the Karin cluster members. Here we take advantage of the fast growing membership of the Karin cluster and show that the obliquity distribution of diameter D≃ 1{--}2 km Karin asteroids is bimodal, as expected if the YORP effect acted to move obliquities toward extreme values (0° or 180°). The measured magnitude of the effect is consistent with the standard YORP model. The surface thermal conductivity is inferred to be 0.07-0.2 W m-1 K-1 (thermal inertia ≃ 300{--}500 J m-2 K-1 s{}-1/2). We find that the strength of the YORP effect is roughly ≃ 0.7 of the nominal strength obtained for a collection of random Gaussian spheroids. These results are consistent with a surface composed of rough, rocky regolith. The obliquity values predicted here for 480 members of the Karin cluster can be validated by the light-curve inversion method.
NASA Astrophysics Data System (ADS)
Bayliss, Matthew. B.; Zengo, Kyle; Ruel, Jonathan; Benson, Bradford A.; Bleem, Lindsey E.; Bocquet, Sebastian; Bulbul, Esra; Brodwin, Mark; Capasso, Raffaella; Chiu, I.-non; McDonald, Michael; Rapetti, David; Saro, Alex; Stalder, Brian; Stark, Antony A.; Strazzullo, Veronica; Stubbs, Christopher W.; Zenteno, Alfredo
2017-03-01
The velocity distribution of galaxies in clusters is not universal; rather, galaxies are segregated according to their spectral type and relative luminosity. We examine the velocity distributions of different populations of galaxies within 89 Sunyaev Zel’dovich (SZ) selected galaxy clusters spanning 0.28< z< 1.08. Our sample is primarily draw from the SPT-GMOS spectroscopic survey, supplemented by additional published spectroscopy, resulting in a final spectroscopic sample of 4148 galaxy spectra—2868 cluster members. The velocity dispersion of star-forming cluster galaxies is 17 ± 4% greater than that of passive cluster galaxies, and the velocity dispersion of bright (m< {m}* -0.5) cluster galaxies is 11 ± 4% lower than the velocity dispersion of our total member population. We find good agreement with simulations regarding the shape of the relationship between the measured velocity dispersion and the fraction of passive versus star-forming galaxies used to measure it, but we find a small offset between this relationship as measured in data and simulations, which suggests that our dispersions are systematically low by as much as 3% relative to simulations. We argue that this offset could be interpreted as a measurement of the effective velocity bias that describes the ratio of our observed velocity dispersions and the intrinsic velocity dispersion of dark matter particles in a published simulation result. Measuring velocity bias in this way suggests that large spectroscopic surveys can improve dispersion-based mass-observable scaling relations for cosmology even in the face of velocity biases, by quantifying and ultimately calibrating them out.
Two-stage sequential sampling: A neighborhood-free adaptive sampling procedure
Salehi, M.; Smith, D.R.
2005-01-01
Designing an efficient sampling scheme for a rare and clustered population is a challenging area of research. Adaptive cluster sampling, which has been shown to be viable for such a population, is based on sampling a neighborhood of units around a unit that meets a specified condition. However, the edge units produced by sampling neighborhoods have proven to limit the efficiency and applicability of adaptive cluster sampling. We propose a sampling design that is adaptive in the sense that the final sample depends on observed values, but it avoids the use of neighborhoods and the sampling of edge units. Unbiased estimators of population total and its variance are derived using Murthy's estimator. The modified two-stage sampling design is easy to implement and can be applied to a wider range of populations than adaptive cluster sampling. We evaluate the proposed sampling design by simulating sampling of two real biological populations and an artificial population for which the variable of interest took the value either 0 or 1 (e.g., indicating presence and absence of a rare event). We show that the proposed sampling design is more efficient than conventional sampling in nearly all cases. The approach used to derive estimators (Murthy's estimator) opens the door for unbiased estimators to be found for similar sequential sampling designs. ?? 2005 American Statistical Association and the International Biometric Society.
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Biased phylodynamic inferences from analysing clusters of viral sequences
Xiang, Fei; Frost, Simon D. W.
2017-01-01
Abstract Phylogenetic methods are being increasingly used to help understand the transmission dynamics of measurably evolving viruses, including HIV. Clusters of highly similar sequences are often observed, which appear to follow a ‘power law’ behaviour, with a small number of very large clusters. These clusters may help to identify subpopulations in an epidemic, and inform where intervention strategies should be implemented. However, clustering of samples does not necessarily imply the presence of a subpopulation with high transmission rates, as groups of closely related viruses can also occur due to non-epidemiological effects such as over-sampling. It is important to ensure that observed phylogenetic clustering reflects true heterogeneity in the transmitting population, and is not being driven by non-epidemiological effects. We qualify the effect of using a falsely identified ‘transmission cluster’ of sequences to estimate phylodynamic parameters including the effective population size and exponential growth rate under several demographic scenarios. Our simulation studies show that taking the maximum size cluster to re-estimate parameters from trees simulated under a randomly mixing, constant population size coalescent process systematically underestimates the overall effective population size. In addition, the transmission cluster wrongly resembles an exponential or logistic growth model 99% of the time. We also illustrate the consequences of false clusters in exponentially growing coalescent and birth-death trees, where again, the growth rate is skewed upwards. This has clear implications for identifying clusters in large viral databases, where a false cluster could result in wasted intervention resources. PMID:28852573
Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data
Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.
2016-01-01
We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872
ERIC Educational Resources Information Center
Hunt, Charles R.
A study developed a model to assist school administrators to estimate costs associated with the delivery of a metals cluster program at Norfolk State College, Virginia. It sought to construct the model so that costs could be explained as a function of enrollment levels. Data were collected through a literature review, computer searches of the…
Optical Substructure and BCG Offsets of Sunyaev-Zel'dovich and X-ray Selected Galaxy Clusters
NASA Astrophysics Data System (ADS)
Lopes, Paulo AA; Trevisan, M.; Laganá, T. F.; Durret, F.; Ribeiro, A. LB; Rembold, S. B.
2018-05-01
We used optical imaging and spectroscopic data to derive substructure estimates for local Universe (z < 0.11) galaxy clusters from two different samples. The first was selected through the Sunyaev-Zel'dovich (SZ) effect by the Planck satellite and the second is an X-ray selected sample. In agreement to X-ray substructure estimates we found that the SZ systems have a larger fraction of substructure than the X-ray clusters. We have also found evidence that the higher mass regime of the SZ clusters, compared to the X-ray sample, explains the larger fraction of disturbed objects in the Planck data. Although we detect a redshift evolution in the substructure fraction, it is not sufficient to explain the different results between the higher-z SZ sample and the X-ray one. We have also verified a good agreement (˜60%) between the optical and X-ray substructure estimates. However, the best level of agreement is given by the substructure classification given by measures based on the brightest cluster galaxy (BCG), either the BCG-X-ray centroid offset, or the magnitude gap between the first and second BCGs. We advocate the use of those two parameters as the most reliable and cheap way to assess cluster dynamical state. We recommend an offset cut of ˜0.01 ×R500 to separate relaxed and disturbed clusters. Regarding the magnitude gap the separation can be done at Δm12 = 1.0. The central galaxy paradigm (CGP) may not be valid for ˜20% of relaxed massive clusters. This fraction increases to ˜60% for disturbed systems.
Detection of a pair of prominent X-ray cavities in Abell 3847
NASA Astrophysics Data System (ADS)
Vagshette, Nilkanth D.; Naik, Sachindra; Patil, Madhav. K.; Sonkamble, Satish S.
2017-04-01
We present the results obtained from a detailed analysis of a deep Chandra observation of the bright FRII radio galaxy 3C 444 in Abell 3847 cluster. A pair of huge X-ray cavities are detected along the north and south directions from the centre of 3C 444. X-ray and radio images of the cluster reveal peculiar positioning of the cavities and radio bubbles. The radio lobes and X-ray cavities are apparently not spatially coincident and exhibit offsets by ˜61 and 77 kpc from each other along the north and south directions, respectively. Radial temperature and density profiles reveal the presence of a cool core in the cluster. Imaging and spectral studies showed the removal of substantial amount of matter from the core of the cluster by the radio jets. A detailed analysis of the temperature and density profiles showed the presence of a rarely detected elliptical shock in the cluster. Detection of inflating cavities at an average distance of ˜55 kpc from the centre implies that the central engine feeds a remarkable amount of radio power (˜6.3 × 1044 erg s-1) into the intra-cluster medium over ˜108 yr, the estimated age of cavity. The cooling luminosity of the cluster was estimated to be ˜8.30 × 1043 erg s-1 , which confirms that the AGN power is sufficient to quench the cooling. Ratios of mass accretion rate to Eddington and Bondi rates were estimated to be ˜0.08 and 3.5 × 104, respectively. This indicates that the black hole in the core of the cluster accretes matter through chaotic cold accretion.
Creel survey sampling designs for estimating effort in short-duration Chinook salmon fisheries
McCormick, Joshua L.; Quist, Michael C.; Schill, Daniel J.
2013-01-01
Chinook Salmon Oncorhynchus tshawytscha sport fisheries in the Columbia River basin are commonly monitored using roving creel survey designs and require precise, unbiased catch estimates. The objective of this study was to examine the relative bias and precision of total catch estimates using various sampling designs to estimate angling effort under the assumption that mean catch rate was known. We obtained information on angling populations based on direct visual observations of portions of Chinook Salmon fisheries in three Idaho river systems over a 23-d period. Based on the angling population, Monte Carlo simulations were used to evaluate the properties of effort and catch estimates for each sampling design. All sampling designs evaluated were relatively unbiased. Systematic random sampling (SYS) resulted in the most precise estimates. The SYS and simple random sampling designs had mean square error (MSE) estimates that were generally half of those observed with cluster sampling designs. The SYS design was more efficient (i.e., higher accuracy per unit cost) than a two-cluster design. Increasing the number of clusters available for sampling within a day decreased the MSE of estimates of daily angling effort, but the MSE of total catch estimates was variable depending on the fishery. The results of our simulations provide guidelines on the relative influence of sample sizes and sampling designs on parameters of interest in short-duration Chinook Salmon fisheries.
Applications of cluster analysis to satellite soundings
NASA Technical Reports Server (NTRS)
Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.
1984-01-01
The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.
NASA Astrophysics Data System (ADS)
von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam
2014-03-01
This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic uncertainty for our weak-lensing mass measurements. In accompanying papers, we discuss the key aspects of our photometric calibration and photometric redshift measurements (Kelly et al.), and measure cluster masses using two methods, including a novel Bayesian weak-lensing approach that makes full use of the photometric redshift probability distributions for individual background galaxies (Applegate et al.). In subsequent papers, we will incorporate these weak-lensing mass measurements into a self-consistent framework to simultaneously determine cluster scaling relations and cosmological parameters.
NASA Astrophysics Data System (ADS)
Gerlich, Nikolas; Rostek, Stefan
2015-09-01
We derive a heuristic method to estimate the degree of self-similarity and serial correlation in financial time series. Especially, we propagate the use of a tailor-made selection of different estimation techniques that are used in various fields of time series analysis but until now have not consequently found their way into the finance literature. Following the idea of portfolio diversification, we show that considerable improvements with respect to robustness and unbiasedness can be achieved by using a basket of estimation methods. With this methodological toolbox at hand, we investigate real market data to show that noticeable deviations from the assumptions of constant self-similarity and absence of serial correlation occur during certain periods. On the one hand, this may shed a new light on seemingly ambiguous scientific findings concerning serial correlation of financial time series. On the other hand, a proven time-changing degree of self-similarity may help to explain high-volatility clusters of stock price indices.
MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.
Hedeker, D; Gibbons, R D
1996-05-01
MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Czarski, T., E-mail: tomasz.czarski@ifpilm.pl; Chernyshova, M.; Malinowski, K.
2016-11-15
The measurement system based on gas electron multiplier detector is developed for soft X-ray diagnostics of tokamak plasmas. The multi-channel setup is designed for estimation of the energy and the position distribution of an X-ray source. The focal measuring issue is the charge cluster identification by its value and position estimation. The fast and accurate mode of the serial data acquisition is applied for the dynamic plasma diagnostics. The charge clusters are counted in the space determined by 2D position, charge value, and time intervals. Radiation source characteristics are presented by histograms for a selected range of position, time intervals,more » and cluster charge values corresponding to the energy spectra.« less
Longo, Dario Livio; Dastrù, Walter; Consolino, Lorena; Espak, Miklos; Arigoni, Maddalena; Cavallo, Federica; Aime, Silvio
2015-07-01
The objective of this study was to compare a clustering approach to conventional analysis methods for assessing changes in pharmacokinetic parameters obtained from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) during antiangiogenic treatment in a breast cancer model. BALB/c mice bearing established transplantable her2+ tumors were treated with a DNA-based antiangiogenic vaccine or with an empty plasmid (untreated group). DCE-MRI was carried out by administering a dose of 0.05 mmol/kg of Gadocoletic acid trisodium salt, a Gd-based blood pool contrast agent (CA) at 1T. Changes in pharmacokinetic estimates (K(trans) and vp) in a nine-day interval were compared between treated and untreated groups on a voxel-by-voxel analysis. The tumor response to therapy was assessed by a clustering approach and compared with conventional summary statistics, with sub-regions analysis and with histogram analysis. Both the K(trans) and vp estimates, following blood-pool CA injection, showed marked and spatial heterogeneous changes with antiangiogenic treatment. Averaged values for the whole tumor region, as well as from the rim/core sub-regions analysis were unable to assess the antiangiogenic response. Histogram analysis resulted in significant changes only in the vp estimates (p<0.05). The proposed clustering approach depicted marked changes in both the K(trans) and vp estimates, with significant spatial heterogeneity in vp maps in response to treatment (p<0.05), provided that DCE-MRI data are properly clustered in three or four sub-regions. This study demonstrated the value of cluster analysis applied to pharmacokinetic DCE-MRI parametric maps for assessing tumor response to antiangiogenic therapy. Copyright © 2015 Elsevier Inc. All rights reserved.
A mesh partitioning algorithm for preserving spatial locality in arbitrary geometries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nivarti, Girish V., E-mail: g.nivarti@alumni.ubc.ca; Salehi, M. Mahdi; Bushe, W. Kendal
2015-01-15
Highlights: •An algorithm for partitioning computational meshes is proposed. •The Morton order space-filling curve is modified to achieve improved locality. •A spatial locality metric is defined to compare results with existing approaches. •Results indicate improved performance of the algorithm in complex geometries. -- Abstract: A space-filling curve (SFC) is a proximity preserving linear mapping of any multi-dimensional space and is widely used as a clustering tool. Equi-sized partitioning of an SFC ignores the loss in clustering quality that occurs due to inaccuracies in the mapping. Often, this results in poor locality within partitions, especially for the conceptually simple, Morton ordermore » curves. We present a heuristic that improves partition locality in arbitrary geometries by slicing a Morton order curve at points where spatial locality is sacrificed. In addition, we develop algorithms that evenly distribute points to the extent possible while maintaining spatial locality. A metric is defined to estimate relative inter-partition contact as an indicator of communication in parallel computing architectures. Domain partitioning tests have been conducted on geometries relevant to turbulent reactive flow simulations. The results obtained highlight the performance of our method as an unsupervised and computationally inexpensive domain partitioning tool.« less
Newmann, Sara J; Rocca, Corinne H; Zakaras, Jennifer M; Onono, Maricianah; Bukusi, Elizabeth A; Grossman, Daniel; Cohen, Craig R
2016-09-01
This study investigated whether integrating family planning (FP) services into HIV care was associated with gender equitable attitudes among HIV-positive adults in western Kenya. Surveys were conducted with 480 women and 480 men obtaining HIV services from 18 clinics 1 year after the sites were randomized to integrated FP/HIV services (N = 12) or standard referral for FP (N = 6). We used multivariable regression, with generalized estimating equations to account for clustering, to assess whether gender attitudes (range 0-12) were associated with integrated care and with contraceptive use. Men at intervention sites had stronger gender equitable attitudes than those at control sites (adjusted mean difference in scores = 0.89, 95 % CI 0.03-1.74). Among women, attitudes did not differ by study arm. Gender equitable attitudes were not associated with contraceptive use among men (AOR = 1.06, 95 % CI 0.93-1.21) or women (AOR = 1.03, 95 % CI 0.94-1.13). Further work is needed to understand how integrating FP into HIV care affects gender relations, and how improved gender equity among men might be leveraged to improve contraceptive use and other reproductive health outcomes.
Ding, Jiarui; Shah, Sohrab; Condon, Anne
2016-01-01
Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. Availability and Implementation: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev. Contact: condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153661
Spatial ecology of refuge selection by an herbivore under risk of predation
Wilson, Tammy L.; Rayburn, Andrew P.; Edwards, Thomas C.
2012-01-01
Prey species use structures such as burrows to minimize predation risk. The spatial arrangement of these resources can have important implications for individual and population fitness. For example, there is evidence that clustered resources can benefit individuals by reducing predation risk and increasing foraging opportunity concurrently, which leads to higher population density. However, the scale of clustering that is important in these processes has been ignored during theoretical and empirical development of resource models. Ecological understanding of refuge exploitation by prey can be improved by spatial analysis of refuge use and availability that incorporates the effect of scale. We measured the spatial distribution of pygmy rabbit (Brachylagus idahoensis) refugia (burrows) through censuses in four 6-ha sites. Point pattern analyses were used to evaluate burrow selection by comparing the spatial distribution of used and available burrows. The presence of food resources and additional overstory cover resources was further examined using logistic regression. Burrows were spatially clustered at scales up to approximately 25 m, and then regularly spaced at distances beyond ~40 m. Pygmy rabbit exploitation of burrows did not match availability. Burrows used by pygmy rabbits were likely to be located in areas with high overall burrow density (resource clusters) and high overstory cover, which together minimized predation risk. However, in some cases we observed an interaction between either overstory cover (safety) or understory cover (forage) and burrow density. The interactions show that pygmy rabbits will use burrows in areas with low relative burrow density (high relative predation risk) if understory food resources are high. This points to a potential trade-off whereby rabbits must sacrifice some safety afforded by additional nearby burrows to obtain ample forage resources. Observed patterns of clustered burrows and non-random burrow use improve understanding of the importance of spatial distribution of refugia for burrowing herbivores. The analyses used allowed for the estimation of the spatial scale where subtle trade-offs between predation avoidance and foraging opportunity are likely to occur in a natural system.
Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System
NASA Astrophysics Data System (ADS)
Demirdjian, L.; Zhou, Y.; Huffman, G. J.
2016-12-01
This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.
NASA Astrophysics Data System (ADS)
Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.
2018-02-01
This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
Core-halo age gradients and star formation in the Orion Nebula and NGS 2024 young stellar clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Getman, Konstantin V.; Feigelson, Eric D.; Kuhn, Michael A.
2014-06-01
We analyze age distributions of two nearby rich stellar clusters, the NGC 2024 (Flame Nebula) and Orion Nebula cluster (ONC) in the Orion molecular cloud complex. Our analysis is based on samples from the MYStIX survey and a new estimator of pre-main sequence (PMS) stellar ages, Age{sub JX} , derived from X-ray and near-infrared photometric data. To overcome the problem of uncertain individual ages and large spreads of age distributions for entire clusters, we compute median ages and their confidence intervals of stellar samples within annular subregions of the clusters. We find core-halo age gradients in both the NGC 2024more » cluster and ONC: PMS stars in cluster cores appear younger and thus were formed later than PMS stars in cluster peripheries. These findings are further supported by the spatial gradients in the disk fraction and K-band excess frequency. Our age analysis is based on Age{sub JX} estimates for PMS stars and is independent of any consideration of OB stars. The result has important implications for the formation of young stellar clusters. One basic implication is that clusters form slowly and the apparent age spreads in young stellar clusters, which are often controversial, are (at least in part) real. The result further implies that simple models where clusters form inside-out are incorrect and more complex models are needed. We provide several star formation scenarios that alone or in combination may lead to the observed core-halo age gradients.« less
Estimation of multiple accelerated motions using chirp-Fourier transform and clustering.
Alexiadis, Dimitrios S; Sergiadis, George D
2007-01-01
Motion estimation in the spatiotemporal domain has been extensively studied and many methodologies have been proposed, which, however, cannot handle both time-varying and multiple motions. Extending previously published ideas, we present an efficient method for estimating multiple, linearly time-varying motions. It is shown that the estimation of accelerated motions is equivalent to the parameter estimation of superpositioned chirp signals. From this viewpoint, one can exploit established signal processing tools such as the chirp-Fourier transform. It is shown that accelerated motion results in energy concentration along planes in the 4-D space: spatial frequencies-temporal frequency-chirp rate. Using fuzzy c-planes clustering, we estimate the plane/motion parameters. The effectiveness of our method is verified on both synthetic as well as real sequences and its advantages are highlighted.
Groenewold, Matthew R
2006-01-01
Local health departments are among the first agencies to respond to disasters or other mass emergencies. However, they often lack the ability to handle large-scale events. Plans including locally developed and deployed tools may enhance local response. Simplified cluster sampling methods can be useful in assessing community needs after a sudden-onset, short duration event. Using an adaptation of the methodology used by the World Health Organization Expanded Programme on Immunization (EPI), a Microsoft Access-based application for two-stage cluster sampling of residential addresses in Louisville/Jefferson County Metro, Kentucky was developed. The sampling frame was derived from geographically referenced data on residential addresses and political districts available through the Louisville/Jefferson County Information Consortium (LOJIC). The program randomly selected 30 clusters, defined as election precincts, from within the area of interest, and then, randomly selected 10 residential addresses from each cluster. The program, called the Rapid Assessment Tools Package (RATP), was tested in terms of accuracy and precision using data on a dichotomous characteristic of residential addresses available from the local tax assessor database. A series of 30 samples were produced and analyzed with respect to their precision and accuracy in estimating the prevalence of the study attribute. Point estimates with 95% confidence intervals were calculated by determining the proportion of the study attribute values in each of the samples and compared with the population proportion. To estimate the design effect, corresponding simple random samples of 300 addresses were taken after each of the 30 cluster samples. The sample proportion fell within +/-10 absolute percentage points of the true proportion in 80% of the samples. In 93.3% of the samples, the point estimate fell within +/-12.5%, and 96.7% fell within +/-15%. All of the point estimates fell within +/-20% of the true proportion. Estimates of the design effect ranged from 0.926 to 1.436 (mean = 1.157, median = 1.170) for the 30 samples. Although prospective evaluation of its performance in field trials or a real emergency is required to confirm its utility, this study suggests that the RATP, a locally designed and deployed tool, may provide population-based estimates of community needs or the extent of event-related consequences that are precise enough to serve as the basis for the initial post-event decisions regarding relief efforts.
Construction and application of Red5 cluster based on OpenStack
NASA Astrophysics Data System (ADS)
Wang, Jiaqing; Song, Jianxin
2017-08-01
With the application and development of cloud computing technology in various fields, the resource utilization rate of the data center has been improved obviously, and the system based on cloud computing platform has also improved the expansibility and stability. In the traditional way, Red5 cluster resource utilization is low and the system stability is poor. This paper uses cloud computing to efficiently calculate the resource allocation ability, and builds a Red5 server cluster based on OpenStack. Multimedia applications can be published to the Red5 cloud server cluster. The system achieves the flexible construction of computing resources, but also greatly improves the stability of the cluster and service efficiency.
[Molecular characterization of osteosarcomas].
Baumhoer, D
2013-11-01
Osteosarcomas are rare with an estimated incidence of 5-6 cases per one million inhabitants per year. As the prognosis has not improved significantly over the last 30 years and more than 30 % of patients still die of the disease a better understanding of the molecular tumorigenesis is urgently needed to identify prognostic and predictive biomarkers as well as potential therapeutic targets. Using genome-wide SNP chip analyses we were able to detect a genetic signature enabling a prognostic prediction of patients already at the time of initial diagnosis. Furthermore, we found the microRNA cluster 17-92 to be constitutively overexpressed in osteosarcomas. The microRNAs included here are intermingled in a complex network of several oncogenes and tumor suppressors that have been described to be deregulated in osteosarcomas. Therefore, the microRNA cluster 17-92 could represent a central regulator in the development of osteosarcomas.
Mena, Carlos; Fuentes, Eduardo; Ormazábal, Yony; Palomo, Iván
2017-05-11
The global percentage of people over 60 is strongly increasing and estimated to exceed 20% by 20,150, which means that there will be an increase in many pathological conditions related to aging. Mapping of the location of aging people and identification of their needs can be extremely valuable from a social-economic point of view. Participants in this study were 148 randomly selected adults from Talca City, Chile aged 60-74 at baseline. Geographic information systems (GIS) analyses were performed using ArcGIS software through its module Spatial Autocorrelation. In this study, we demonstrated that elderly people show geographic clustering according to above-norm results of anthropometric measurements and blood chemistry. The spatial identifications found would facilitate exploring the impact of treatment programmes in communities where many aging people live, thereby improving their quality of life as well as reducing overall costs.
Constraints on a possible variation of the fine structure constant from galaxy cluster data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holanda, R.F.L.; Landau, S.J.; Sánchez G, I.E.
2016-05-01
We propose a new method to probe a possible time evolution of the fine structure constant α from X-ray and Sunyaev-Zel'dovich measurements of the gas mass fraction ( f {sub gas}) in galaxy clusters. Taking into account a direct relation between variations of α and violations of the distance-duality relation, we discuss constraints on α for a class of dilaton runaway models. Although not yet competitive with bounds from high- z quasar absorption systems, our constraints, considering a sample of 29 measurements of f {sub gas}, in the redshift interval 0.14 < z < 0.89, provide an independent estimate ofmore » α variation at low and intermediate redshifts. Furthermore, current and planned surveys will provide a larger amount of data and thus allow to improve the limits on α variation obtained in the present analysis.« less
Kent, Clement; Azanchi, Reza; Smith, Ben; Chu, Adrienne; Levine, Joel
2007-01-01
Drosophila Cuticular Hydrocarbons (CH) influence courtship behaviour, mating, aggregation, oviposition, and resistance to desiccation. We measured levels of 24 different CH compounds of individual male D. melanogaster hourly under a variety of environmental (LD/DD) conditions. Using a model-based analysis of CH variation, we developed an improved normalization method for CH data, and show that CH compounds have reproducible cyclic within-day temporal patterns of expression which differ between LD and DD conditions. Multivariate clustering of expression patterns identified 5 clusters of co-expressed compounds with common chemical characteristics. Turnover rate estimates suggest CH production may be a significant metabolic cost. Male cuticular hydrocarbon expression is a dynamic trait influenced by light and time of day; since abundant hydrocarbons affect male sexual behavior, males may present different pheromonal profiles at different times and under different conditions. PMID:17896002
Gorfine, Malka; Bordo, Nadia; Hsu, Li
2017-01-01
Summary Consider a popular case–control family study where individuals with a disease under study (case probands) and individuals who do not have the disease (control probands) are randomly sampled from a well-defined population. Possibly right-censored age at onset and disease status are observed for both probands and their relatives. For example, case probands are men diagnosed with prostate cancer, control probands are men free of prostate cancer, and the prostate cancer history of the fathers of the probands is also collected. Inherited genetic susceptibility, shared environment, and common behavior lead to correlation among the outcomes within a family. In this article, a novel nonparametric estimator of the marginal survival function is provided. The estimator is defined in the presence of intra-cluster dependence, and is based on consistent smoothed kernel estimators of conditional survival functions. By simulation, it is shown that the proposed estimator performs very well in terms of bias. The utility of the estimator is illustrated by the analysis of case–control family data of early onset prostate cancer. To our knowledge, this is the first article that provides a fully nonparametric marginal survival estimator based on case–control clustered age-at-onset data. PMID:27436674
Kelly, Heath; Riddell, Michaela A; Gidding, Heather F; Nolan, Terry; Gilbert, Gwendolyn L
2002-08-19
We compared estimates of the age-specific population immunity to measles, mumps, rubella, hepatitis B and varicella zoster viruses in Victorian school children obtained by a national sero-survey, using a convenience sample of residual sera from diagnostic laboratories throughout Australia, with those from a three-stage random cluster survey. When grouped according to school age (primary or secondary school) there was no significant difference in the estimates of immunity to measles, mumps, hepatitis B or varicella. Compared with the convenience sample, the random cluster survey estimated higher immunity to rubella in samples from both primary (98.7% versus 93.6%, P = 0.002) and secondary school students (98.4% versus 93.2%, P = 0.03). Despite some limitations, this study suggests that the collection of a convenience sample of sera from diagnostic laboratories is an appropriate sampling strategy to provide population immunity data that will inform Australia's current and future immunisation policies. Copyright 2002 Elsevier Science Ltd.
To center or not to center? Investigating inertia with a multilevel autoregressive model.
Hamaker, Ellen L; Grasman, Raoul P P P
2014-01-01
Whether level 1 predictors should be centered per cluster has received considerable attention in the multilevel literature. While most agree that there is no one preferred approach, it has also been argued that cluster mean centering is desirable when the within-cluster slope and the between-cluster slope are expected to deviate, and the main interest is in the within-cluster slope. However, we show in a series of simulations that if one has a multilevel autoregressive model in which the level 1 predictor is the lagged outcome variable (i.e., the outcome variable at the previous occasion), cluster mean centering will in general lead to a downward bias in the parameter estimate of the within-cluster slope (i.e., the autoregressive relationship). This is particularly relevant if the main question is whether there is on average an autoregressive effect. Nonetheless, we show that if the main interest is in estimating the effect of a level 2 predictor on the autoregressive parameter (i.e., a cross-level interaction), cluster mean centering should be preferred over other forms of centering. Hence, researchers should be clear on what is considered the main goal of their study, and base their choice of centering method on this when using a multilevel autoregressive model.
To center or not to center? Investigating inertia with a multilevel autoregressive model
Hamaker, Ellen L.; Grasman, Raoul P. P. P.
2015-01-01
Whether level 1 predictors should be centered per cluster has received considerable attention in the multilevel literature. While most agree that there is no one preferred approach, it has also been argued that cluster mean centering is desirable when the within-cluster slope and the between-cluster slope are expected to deviate, and the main interest is in the within-cluster slope. However, we show in a series of simulations that if one has a multilevel autoregressive model in which the level 1 predictor is the lagged outcome variable (i.e., the outcome variable at the previous occasion), cluster mean centering will in general lead to a downward bias in the parameter estimate of the within-cluster slope (i.e., the autoregressive relationship). This is particularly relevant if the main question is whether there is on average an autoregressive effect. Nonetheless, we show that if the main interest is in estimating the effect of a level 2 predictor on the autoregressive parameter (i.e., a cross-level interaction), cluster mean centering should be preferred over other forms of centering. Hence, researchers should be clear on what is considered the main goal of their study, and base their choice of centering method on this when using a multilevel autoregressive model. PMID:25688215
Large Scale Structure Studies: Final Results from a Rich Cluster Redshift Survey
NASA Astrophysics Data System (ADS)
Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.
1995-12-01
The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from the Abell-ACO catalogs show evidence of structure on scales of 100 Mpc and hold the promise of confirming structure on the scale of the COBE result. Unfortunately, until now, redshift information has been unavailable for a large percentage of these clusters, so present knowledge of their three dimensional distribution has quite large uncertainties. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 88 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work has resulted in a deeper, 95% complete and more reliable sample of 3-D positions of rich clusters. The primary intent of this survey has been to constrain theoretical models for the formation of the structure we see in the universe today through 2-pt. spatial correlation function and other analyses of the large scale structures traced by these clusters. In addition, we have obtained enough redshifts per cluster to greatly improve the quality and size of the sample of reliable cluster velocity dispersions available for use in other studies of cluster properties. This new data has also allowed the construction of an updated and more reliable supercluster candidate catalog. Our efforts have resulted in effectively doubling the volume traced by these clusters. Presented here is the resulting 2-pt. spatial correlation function, as well as density plots and several other figures quantifying the large scale structure from this much deeper and complete sample. Also, with 10 or more redshifts in most of our cluster fields, we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.
Time fluctuation analysis of forest fire sequences
NASA Astrophysics Data System (ADS)
Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.
2013-04-01
Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value depends on the threshold which helps to understand the time pattern of the studied events. Our findings detected the presence of overdensity of events in particular time periods and showed that the forest fire sequences in Portugal can be considered as a multifractal process with a degree of time-clustering of the events. Key words: time sequences, Morisita index, fractals, multifractals, box-counting, Ripley's K-function, Allan Factor, variography, forest fires, point process. Acknowledgements This work was partly supported by the SNFS Project No. 200021-140658, "Analysis and Modelling of Space-Time Patterns in Complex Regions". References - Kanevski M. (Editor). 2008. Advanced Mapping of Environmental Data: Geostatistics, Machine Learning and Bayesian Maximum Entropy. London / Hoboken: iSTE / Wiley. - Telesca L. and Pereira M.G. 2010. Time-clustering investigation of fire temporal fluctuations in Portugal, Nat. Hazards Earth Syst. Sci., vol. 10(4): 661-666. - Vega Orozco C., Tonini M., Conedera M., Kanevski M. (2012) Cluster recognition in spatial-temporal sequences: the case of forest fires, Geoinformatica, vol. 16(4): 653-673.
NASA Technical Reports Server (NTRS)
De Martino, I.; Atrio-Barandela, F.; Da Silva, A.; Ebling, H.; Kashlinsky, A.; Kocevski, D.; Martins, C. J. A. P.
2012-01-01
We study the capability of Planck data to constrain deviations of the cosmic microwave background (CMB) blackbody temperature from adiabatic evolution using the thermal Sunyaev-Zeldovich anisotropy induced by clusters of galaxies. We consider two types of data sets depending on how the cosmological signal is removed: using a CMB template or using the 217 GHz map. We apply two different statistical estimators, based on the ratio of temperature anisotropies at two different frequencies and on a fit to the spectral variation of the cluster signal with frequency. The ratio method is biased if CMB residuals with amplitude approximately 1 microK or larger are present in the data, while residuals are not so critical for the fit method. To test for systematics, we construct a template from clusters drawn from a hydro-simulation included in the pre-launch Planck Sky Model. We demonstrate that, using a proprietary catalog of X-ray-selected clusters with measured redshifts, electron densities, and X-ray temperatures, we can constrain deviations of adiabatic evolution, measured by the parameter a in the redshift scaling T (z) = T0(1 + z)(sup 1-alpha), with an accuracy of sigma(sub alpha) = 0.011 in the most optimal case and with sigma alpha = 0.018 for a less optimal case. These results represent a factor of 2-3 improvement over similar measurements carried out using quasar spectral lines and a factor 6-20 with respect to earlier results using smaller cluster samples.
Alternatives to Multilevel Modeling for the Analysis of Clustered Data
ERIC Educational Resources Information Center
Huang, Francis L.
2016-01-01
Multilevel modeling has grown in use over the years as a way to deal with the nonindependent nature of observations found in clustered data. However, other alternatives to multilevel modeling are available that can account for observations nested within clusters, including the use of Taylor series linearization for variance estimation, the design…
ERIC Educational Resources Information Center
Huang, Francis L.; Cornell, Dewey G.
2016-01-01
Advances in multilevel modeling techniques now make it possible to investigate the psychometric properties of instruments using clustered data. Factor models that overlook the clustering effect can lead to underestimated standard errors, incorrect parameter estimates, and model fit indices. In addition, factor structures may differ depending on…
NASA Technical Reports Server (NTRS)
Battaglia, N.; Leauthaud, A.; Miyatake, H.; Hasseleld, M.; Gralla, M. B.; Allison, R.; Bond, J. R.; Calabrese, E.; Crichton, D.; Devlin, M. J.;
2016-01-01
Mass calibration uncertainty is the largest systematic effect for using clustersof galaxies to constrain cosmological parameters. We present weak lensing mass measurements from the Canada-France-Hawaii Telescope Stripe 82 Survey for galaxy clusters selected through their high signal-to-noise thermal Sunyaev-Zeldovich (tSZ) signal measured with the Atacama Cosmology Telescope (ACT). For a sample of 9 ACT clusters with a tSZ signal-to-noise greater than five, the average weak lensing mass is (4.8 plus or minus 0.8) times 10 (sup 14) solar mass, consistent with the tSZ mass estimate of (4.7 plus or minus 1.0) times 10 (sup 14) solar mass, which assumes a universal pressure profile for the cluster gas. Our results are consistent with previous weak-lensing measurements of tSZ-detected clusters from the Planck satellite. When comparing our results, we estimate the Eddington bias correction for the sample intersection of Planck and weak-lensing clusters which was previously excluded.
NASA Astrophysics Data System (ADS)
Yin, Gang; Zhang, Yingtang; Fan, Hongbo; Ren, Guoquan; Li, Zhining
2017-12-01
We have developed a method for automatically detecting UXO-like targets based on magnetic anomaly inversion and self-adaptive fuzzy c-means clustering. Magnetic anomaly inversion methods are used to estimate the initial locations of multiple UXO-like sources. Although these initial locations have some errors with respect to the real positions, they form dense clouds around the actual positions of the magnetic sources. Then we use the self-adaptive fuzzy c-means clustering algorithm to cluster these initial locations. The estimated number of cluster centroids represents the number of targets and the cluster centroids are regarded as the locations of magnetic targets. Effectiveness of the method has been demonstrated using synthetic datasets. Computational results show that the proposed method can be applied to the case of several UXO-like targets that are randomly scattered within in a confined, shallow subsurface, volume. A field test was carried out to test the validity of the proposed method and the experimental results show that the prearranged magnets can be detected unambiguously and located precisely.
Evidence for an extensive intracluster medium from radio observations of distant Abell clusters
NASA Technical Reports Server (NTRS)
Hanisch, R. J.; Ulmer, M. P.
1985-01-01
Observations have been made of 18 distance class 5 and 6 Abell clusters of galaxies using the VLA in its 'C' configuration at a frequency of 1460 MHz. Half of the clusters in the sample are confirmed or probable sources of X-ray emission. All the detected radio sources with flux densities above 10 mJy are reported, and information is provided concerning the angular extent of the sources, as well as the most likely optical identification. The existence of an extensive intracluster medium is inferred by identifying extended/distorted radio sources with galaxies whose apparent magnitudes are consistent with their being cluster members and that are at projected distances of 3-4 Abell radii (6-8 Mpc) from the nearest cluster center. By requiring that the radio sources are confined by the ambient medium, the ambient density is calculated and the total cluster mass is estimated. As a sample calculation, a wide-angle-tail radio source some 5 Mpc from the center of Abell 348 is used to estimate these quantities.
Estimation of satellite position, clock and phase bias corrections
NASA Astrophysics Data System (ADS)
Henkel, Patrick; Psychas, Dimitrios; Günther, Christoph; Hugentobler, Urs
2018-05-01
Precise point positioning with integer ambiguity resolution requires precise knowledge of satellite position, clock and phase bias corrections. In this paper, a method for the estimation of these parameters with a global network of reference stations is presented. The method processes uncombined and undifferenced measurements of an arbitrary number of frequencies such that the obtained satellite position, clock and bias corrections can be used for any type of differenced and/or combined measurements. We perform a clustering of reference stations. The clustering enables a common satellite visibility within each cluster and an efficient fixing of the double difference ambiguities within each cluster. Additionally, the double difference ambiguities between the reference stations of different clusters are fixed. We use an integer decorrelation for ambiguity fixing in dense global networks. The performance of the proposed method is analysed with both simulated Galileo measurements on E1 and E5a and real GPS measurements of the IGS network. We defined 16 clusters and obtained satellite position, clock and phase bias corrections with a precision of better than 2 cm.
Electrical Load Profile Analysis Using Clustering Techniques
NASA Astrophysics Data System (ADS)
Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.
2017-03-01
Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Astrophysical properties of star clusters in the Magellanic Clouds homogeneously estimated by ASteCA
NASA Astrophysics Data System (ADS)
Perren, G. I.; Piatti, A. E.; Vázquez, R. A.
2017-06-01
Aims: We seek to produce a homogeneous catalog of astrophysical parameters of 239 resolved star clusters, located in the Small and Large Magellanic Clouds, observed in the Washington photometric system. Methods: The cluster sample was processed with the recently introduced Automated Stellar Cluster Analysis (ASteCA) package, which ensures both an automatized and a fully reproducible treatment, together with a statistically based analysis of their fundamental parameters and associated uncertainties. The fundamental parameters determined for each cluster with this tool, via a color-magnitude diagram (CMD) analysis, are metallicity, age, reddening, distance modulus, and total mass. Results: We generated a homogeneous catalog of structural and fundamental parameters for the studied cluster sample and performed a detailed internal error analysis along with a thorough comparison with values taken from 26 published articles. We studied the distribution of cluster fundamental parameters in both Clouds and obtained their age-metallicity relationships. Conclusions: The ASteCA package can be applied to an unsupervised determination of fundamental cluster parameters, which is a task of increasing relevance as more data becomes available through upcoming surveys. A table with the estimated fundamental parameters for the 239 clusters analyzed is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/602/A89
Francoeur, Richard B
2015-01-01
Background The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Materials and methods Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Results Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain–fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain–fatigue/weakness–sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. Conclusion By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism involving depressive affect could account for the effectiveness of a cognitive behavioral intervention to reduce the severity of a pain, fatigue, and sleep disturbance cluster in a previous randomized trial. PMID:25565865
Francoeur, Richard B
2015-01-01
The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain-fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain-fatigue/weakness-sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism involving depressive affect could account for the effectiveness of a cognitive behavioral intervention to reduce the severity of a pain, fatigue, and sleep disturbance cluster in a previous randomized trial.
A systematic approach to the Kansei factors of tactile sense regarding the surface roughness.
Choi, Kyungmee; Jun, Changrim
2007-01-01
Designing products to satisfy customers' emotion requires the information gathered through the human senses, which are visual, auditory, olfactory, gustatory, or tactile senses. By controlling certain design factors, customers' emotion can be evaluated, designed, and satisfied. In this study, a systematic approach is proposed to study the tactile sense regarding the surface roughness. Numerous pairs of antonymous tactile adjectives are collected and clustered. The optimal number of adjective clusters is estimated based on the several criterion functions. The representative average preferences of the final clusters are obtained as the estimates of engineering parameters to control the surface roughness of the commercial polymer-based products.
Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.
Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G
2012-01-01
Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.
A Hidden Markov Model for Urban-Scale Traffic Estimation Using Floating Car Data.
Wang, Xiaomeng; Peng, Ling; Chi, Tianhe; Li, Mengzhu; Yao, Xiaojing; Shao, Jing
2015-01-01
Urban-scale traffic monitoring plays a vital role in reducing traffic congestion. Owing to its low cost and wide coverage, floating car data (FCD) serves as a novel approach to collecting traffic data. However, sparse probe data represents the vast majority of the data available on arterial roads in most urban environments. In order to overcome the problem of data sparseness, this paper proposes a hidden Markov model (HMM)-based traffic estimation model, in which the traffic condition on a road segment is considered as a hidden state that can be estimated according to the conditions of road segments having similar traffic characteristics. An algorithm based on clustering and pattern mining rather than on adjacency relationships is proposed to find clusters with road segments having similar traffic characteristics. A multi-clustering strategy is adopted to achieve a trade-off between clustering accuracy and coverage. Finally, the proposed model is designed and implemented on the basis of a real-time algorithm. Results of experiments based on real FCD confirm the applicability, accuracy, and efficiency of the model. In addition, the results indicate that the model is practicable for traffic estimation on urban arterials and works well even when more than 70% of the probe data are missing.
Ping, Qing; Yang, Christopher C; Marshall, Sarah A; Avis, Nancy E; Ip, Edward H
2016-06-01
Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients' functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. The present study seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved K-Medoid clustering. A total of 50,426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared to that of the research study data, making the social media data easier to partition. The proposed revised K-Medoid clustering helps to improve the clustering performance by re-assigning some of the negative-ASW (average silhouette width) symptoms to other clusters after initial K-Medoid clustering. This retains an overall non-decreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal (GI) related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment and pain-related symptoms. We recommend an integrative approach taking advantage of both data sources. Social media data could provide context for the interpretation of clustering results derived from research study data, while research study data could compensate for the risk of lower precision and recall found using social media data.
Ping, Qing; Yang, Christopher C.; Marshall, Sarah A.; Avis, Nancy E.; Ip, Edward H.
2017-01-01
Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients’ functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. The present study seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved K-Medoid clustering. A total of 50,426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared to that of the research study data, making the social media data easier to partition. The proposed revised K-Medoid clustering helps to improve the clustering performance by re-assigning some of the negative-ASW (average silhouette width) symptoms to other clusters after initial K-Medoid clustering. This retains an overall non-decreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal (GI) related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment and pain-related symptoms. We recommend an integrative approach taking advantage of both data sources. Social media data could provide context for the interpretation of clustering results derived from research study data, while research study data could compensate for the risk of lower precision and recall found using social media data. PMID:29152536
NASA Technical Reports Server (NTRS)
Sifon, Cristobal; Battaglia, Nick; Hasselfield, Matthew; Menanteau, Felipe; Barrientos, L. Felipe; Bond, J. Richard; Crichton, Devin; Devlin, Mark J.; Dunner, Rolando; Hilton, Matt;
2016-01-01
We present galaxy velocity dispersions and dynamical mass estimates for 44 galaxy clusters selected via the Sunyaev-Zeldovich (SZ) effect by the Atacama Cosmology Telescope. Dynamical masses for 18 clusters are reported here for the first time. Using N-body simulations, we model the different observing strategies used to measure the velocity dispersions and account for systematic effects resulting from these strategies. We find that the galaxy velocity distributions may be treated as isotropic, and that an aperture correction of up to 7 per cent in the velocity dispersion is required if the spectroscopic galaxy sample is sufficiently concentrated towards the cluster centre. Accounting for the radial profile of the velocity dispersion in simulations enables consistent dynamical mass estimates regardless of the observing strategy. Cluster masses M200 are in the range (1 - 15) times 10 (sup 14) Solar Masses. Comparing with masses estimated from the SZ distortion assuming a gas pressure profile derived from X-ray observations gives a mean SZ-to-dynamical mass ratio of 1:10 plus or minus 0:13, but there is an additional 0.14 systematic uncertainty due to the unknown velocity bias; the statistical uncertainty is dominated by the scatter in the mass-velocity dispersion scaling relation. This ratio is consistent with previous determinations at these mass scales.
Quantitative evidence of an intrinsic luminosity spread in the Orion nebula cluster
NASA Astrophysics Data System (ADS)
Reggiani, M.; Robberto, M.; Da Rio, N.; Meyer, M. R.; Soderblom, D. R.; Ricci, L.
2011-10-01
Aims: We study the distribution of stellar ages in the Orion nebula cluster (ONC) using accurate HST photometry taken from HST Treasury Program observations of the ONC utilizing the cluster distance estimated by Menten and collaborators. We investigate whether there is an intrinsic age spread in the region and whether the age depends on the spatial distribution. Methods: We estimate the extinction and accretion luminosity towards each source by performing synthetic photometry on an empirical calibration of atmospheric models using the package Chorizos of Maiz-Apellaniz. The position of the sources in the HR-diagram is compared with different theoretical isochrones to estimate the mean cluster age and age dispersion. On the basis of Monte Carlo simulations, we quantify the amount of intrinsic age spread in the region, taking into account uncertainties in the distance, spectral type, extinction, unresolved binaries, accretion, and photometric variability. Results: According to the evolutionary models of Siess and collaborators, the mean age of the Cluster is 2.2 Myr with a scatter of few Myr. With Monte Carlo simulations, we find that the observed age spread is inconsistent with that of a coeval stellar population, but in agreement with a star formation activity between 1.5 and 3.5 Myr. We also observe some evidence that ages depends on the spatial distribution.
Revealing nonergodic dynamics in living cells from a single particle trajectory
NASA Astrophysics Data System (ADS)
Lanoiselée, Yann; Grebenkov, Denis S.
2016-05-01
We propose the improved ergodicity and mixing estimators to identify nonergodic dynamics from a single particle trajectory. The estimators are based on the time-averaged characteristic function of the increments and can thus capture additional information on the process as compared to the conventional time-averaged mean-square displacement. The estimators are first investigated and validated for several models of anomalous diffusion, such as ergodic fractional Brownian motion and diffusion on percolating clusters, and nonergodic continuous-time random walks and scaled Brownian motion. The estimators are then applied to two sets of earlier published trajectories of mRNA molecules inside live Escherichia coli cells and of Kv2.1 potassium channels in the plasma membrane. These statistical tests did not reveal nonergodic features in the former set, while some trajectories of the latter set could be classified as nonergodic. Time averages along such trajectories are thus not representative and may be strongly misleading. Since the estimators do not rely on ensemble averages, the nonergodic features can be revealed separately for each trajectory, providing a more flexible and reliable analysis of single-particle tracking experiments in microbiology.
Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E
2015-01-01
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
Prediction models for clustered data: comparison of a random intercept and standard regression model
2013-01-01
Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436
Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne
2013-02-15
When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
THE STRUCTURE OF THE MERGING RCS 231953+00 SUPERCLUSTER AT z {approx} 0.9
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faloon, A. J.; Webb, T. M. A.; Geach, J. E.
2013-05-10
The RCS 2319+00 supercluster is a massive supercluster at z = 0.9 comprising three optically selected, spectroscopically confirmed clusters separated by <3 Mpc on the plane of the sky. This supercluster is one of a few known examples of the progenitors of present-day massive clusters (10{sup 15} M{sub Sun} by z {approx} 0.5). We present an extensive spectroscopic campaign carried out on the supercluster field resulting, in conjunction with previously published data, in 1961 high-confidence galaxy redshifts. We find 302 structure members spanning three distinct redshift walls separated from one another by {approx}65 Mpc ({Delta} z = 0.03). The componentmore » clusters have spectroscopic redshifts of 0.901, 0.905, and 0.905. The velocity dispersions are consistent with those predicted from X-ray data, giving estimated cluster masses of {approx}10{sup 14.5}-10{sup 14.9} M{sub Sun }. The Dressler-Shectman test finds evidence of substructure in the supercluster field and a friends-of-friends analysis identified five groups in the supercluster, including a filamentary structure stretching between two cluster cores previously identified in the infrared by Coppin et al. The galaxy colors further show this filamentary structure to be a unique region of activity within the supercluster, comprised mainly of blue galaxies compared to the {approx}43%-77% red-sequence galaxies present in the other groups and cluster cores. Richness estimates from stacked luminosity function fits result in average group mass estimates consistent with {approx}10{sup 13} M{sub Sun} halos. Currently, 22% of our confirmed members reside in {approx}> 10{sup 13} M{sub Sun} groups/clusters destined to merge onto the most massive cluster, in agreement with the massive halo galaxy fractions important in cluster galaxy pre-processing in N-body simulation merger tree studies.« less
NASA Technical Reports Server (NTRS)
Croft, R. A. C.; Dalton, G. B.; Efstathiou, G.; Sutherland, W. J.; Maddox, S. J.
1997-01-01
We analyze the spatial clustering properties of a new catalog of very rich galaxy clusters selected from the APM Galaxy Survey. These clusters are of comparable richness and space density to Abell Richness Class greater than or equal to 1 clusters, but selected using an objective algorithm from a catalog demonstrably free of artificial inhomogeneities. Evaluation of the two-point correlation function xi(sub cc)(r) for the full sample and for richer subsamples reveals that the correlation amplitude is consistent with that measured for lower richness APM clusters and X-ray selected clusters. We apply a maximum likelihood estimator to find the best fitting slope and amplitude of a power law fit to x(sub cc)(r), and to estimate the correlation length r(sub 0) (the value of r at which xi(sub cc)(r) is equal to unity). For clusters with a mean space density of 1.6 x 10(exp -6) h(exp 3) MpC(exp -3) (equivalent to the space density of Abell Richness greater than or equal to 2 clusters), we find r(sub 0) = 21.3(+11.1/-9.3) h(exp -1) Mpc (95% confidence limits). This is consistent with the weak richness dependence of xi(sub cc)(r) expected in Gaussian models of structure formation. In particular, the amplitude of xi(sub cc)(r) at all richnesses matches that of xi(sub cc)(r) for clusters selected in N-Body simulations of a low density Cold Dark Matter model.
Multiple imputation methods for bivariate outcomes in cluster randomised trials.
DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R
2016-09-10
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
González-González, Ana Isabel; Orrego, Carola; Perestelo-Perez, Lilisbeth; Bermejo-Caja, Carlos Jesús; Mora, Nuria; Koatz, Débora; Ballester, Marta; Del Pino, Tasmania; Pérez-Ramos, Jeannet; Toledo-Chavarri, Ana; Robles, Noemí; Pérez-Rivas, Francisco Javier; Ramírez-Puerta, Ana Belén; Canellas-Criado, Yolanda; Del Rey-Granado, Yolanda; Muñoz-Balsa, Marcos José; Becerril-Rojas, Beatriz; Rodríguez-Morales, David; Sánchez-Perruca, Luis; Vázquez, José Ramón; Aguirre, Armando
2017-10-30
Communities of practice are based on the idea that learning involves a group of people exchanging experiences and knowledge. The e-MPODERA project aims to assess the effectiveness of a virtual community of practice aimed at improving primary healthcare professional attitudes to the empowerment of patients with chronic diseases. This paper describes the protocol for a cluster randomized controlled trial. We will randomly assign 18 primary-care practices per participating region of Spain (Catalonia, Madrid and Canary Islands) to a virtual community of practice or to usual training. The primary-care practice will be the randomization unit and the primary healthcare professional will be the unit of analysis. We will need a sample of 270 primary healthcare professionals (general practitioners and nurses) and 1382 patients. We will perform randomization after professionals and patients are selected. We will ask the intervention group to participate for 12 months in a virtual community of practice based on a web 2.0 platform. We will measure the primary outcome using the Patient-Provider Orientation Scale questionnaire administered at baseline and after 12 months. Secondary outcomes will be the sociodemographic characteristics of health professionals, sociodemographic and clinical characteristics of patients, the Patient Activation Measure questionnaire for patient activation and outcomes regarding use of the virtual community of practice. We will calculate a linear mixed-effects regression to estimate the effect of participating in the virtual community of practice. This cluster randomized controlled trial will show whether a virtual intervention for primary healthcare professionals improves attitudes to the empowerment of patients with chronic diseases. ClicalTrials.gov, NCT02757781 . Registered on 25 April 2016. Protocol Version. PI15.01 22 January 2016.
Model-Free Reconstruction of Excitatory Neuronal Connectivity from Calcium Imaging Signals
Stetter, Olav; Battaglia, Demian; Soriano, Jordi; Geisel, Theo
2012-01-01
A systematic assessment of global neural network connectivity through direct electrophysiological assays has remained technically infeasible, even in simpler systems like dissociated neuronal cultures. We introduce an improved algorithmic approach based on Transfer Entropy to reconstruct structural connectivity from network activity monitored through calcium imaging. We focus in this study on the inference of excitatory synaptic links. Based on information theory, our method requires no prior assumptions on the statistics of neuronal firing and neuronal connections. The performance of our algorithm is benchmarked on surrogate time series of calcium fluorescence generated by the simulated dynamics of a network with known ground-truth topology. We find that the functional network topology revealed by Transfer Entropy depends qualitatively on the time-dependent dynamic state of the network (bursting or non-bursting). Thus by conditioning with respect to the global mean activity, we improve the performance of our method. This allows us to focus the analysis to specific dynamical regimes of the network in which the inferred functional connectivity is shaped by monosynaptic excitatory connections, rather than by collective synchrony. Our method can discriminate between actual causal influences between neurons and spurious non-causal correlations due to light scattering artifacts, which inherently affect the quality of fluorescence imaging. Compared to other reconstruction strategies such as cross-correlation or Granger Causality methods, our method based on improved Transfer Entropy is remarkably more accurate. In particular, it provides a good estimation of the excitatory network clustering coefficient, allowing for discrimination between weakly and strongly clustered topologies. Finally, we demonstrate the applicability of our method to analyses of real recordings of in vitro disinhibited cortical cultures where we suggest that excitatory connections are characterized by an elevated level of clustering compared to a random graph (although not extreme) and can be markedly non-local. PMID:22927808
An improved K-means clustering method for cDNA microarray image segmentation.
Wang, T N; Li, T J; Shao, G F; Wu, S X
2015-07-14
Microarray technology is a powerful tool for human genetic research and other biomedical applications. Numerous improvements to the standard K-means algorithm have been carried out to complete the image segmentation step. However, most of the previous studies classify the image into two clusters. In this paper, we propose a novel K-means algorithm, which first classifies the image into three clusters, and then one of the three clusters is divided as the background region and the other two clusters, as the foreground region. The proposed method was evaluated on six different data sets. The analyses of accuracy, efficiency, expression values, special gene spots, and noise images demonstrate the effectiveness of our method in improving the segmentation quality.
Improving performance through concept formation and conceptual clustering
NASA Technical Reports Server (NTRS)
Fisher, Douglas H.
1992-01-01
Research from June 1989 through October 1992 focussed on concept formation, clustering, and supervised learning for purposes of improving the efficiency of problem-solving, planning, and diagnosis. These projects resulted in two dissertations on clustering, explanation-based learning, and means-ends planning, and publications in conferences and workshops, several book chapters, and journals; a complete Bibliography of NASA Ames supported publications is included. The following topics are studied: clustering of explanations and problem-solving experiences; clustering and means-end planning; and diagnosis of space shuttle and space station operating modes.
Li, Xiaofang; Xu, Lizhong; Wang, Huibin; Song, Jie; Yang, Simon X.
2010-01-01
The traditional Low Energy Adaptive Cluster Hierarchy (LEACH) routing protocol is a clustering-based protocol. The uneven selection of cluster heads results in premature death of cluster heads and premature blind nodes inside the clusters, thus reducing the overall lifetime of the network. With a full consideration of information on energy and distance distribution of neighboring nodes inside the clusters, this paper proposes a new routing algorithm based on differential evolution (DE) to improve the LEACH routing protocol. To meet the requirements of monitoring applications in outdoor environments such as the meteorological, hydrological and wetland ecological environments, the proposed algorithm uses the simple and fast search features of DE to optimize the multi-objective selection of cluster heads and prevent blind nodes for improved energy efficiency and system stability. Simulation results show that the proposed new LEACH routing algorithm has better performance, effectively extends the working lifetime of the system, and improves the quality of the wireless sensor networks. PMID:22219670